High availability of containerized applications requires to perform robust storage of applications' state. Since basic replication techniques are extremely costly at scale, storage space requirements can be reduced by means of erasure and/or repairing codes. In this paper we address storage regeneration using repair codes, a robust distributed storage technique with no need to fully restore the whole state in case of failure. In fact, only the lost servers' content is replaced. To do so, new clean-slate storage units are made operational at a cost for activating new storage servers and a cost for the transfer of repair data. Our goal is to guarantee maximal availability of containers' state files by a given deadline. Upon a fault occurring at a subset of the storage servers, we aim at ensuring that they are repaired by a given deadline. We introduce a controlled fluid model and derive the optimal activation policy to replace servers under such correlated faults. The solution concept is the optimal control of regeneration via the Pontryagin minimum principle. We characterize feasibility conditions and we prove that the optimal policy is of threshold type. Numerical results describe how to apply the model for system dimensioning and show the tradeoff between activation of servers and communication cost.

Optimal Control of Storage Regeneration with Repair Codes

Francesco De Pellegrini;
2017-01-01

Abstract

High availability of containerized applications requires to perform robust storage of applications' state. Since basic replication techniques are extremely costly at scale, storage space requirements can be reduced by means of erasure and/or repairing codes. In this paper we address storage regeneration using repair codes, a robust distributed storage technique with no need to fully restore the whole state in case of failure. In fact, only the lost servers' content is replaced. To do so, new clean-slate storage units are made operational at a cost for activating new storage servers and a cost for the transfer of repair data. Our goal is to guarantee maximal availability of containers' state files by a given deadline. Upon a fault occurring at a subset of the storage servers, we aim at ensuring that they are repaired by a given deadline. We introduce a controlled fluid model and derive the optimal activation policy to replace servers under such correlated faults. The solution concept is the optimal control of regeneration via the Pontryagin minimum principle. We characterize feasibility conditions and we prove that the optimal policy is of threshold type. Numerical results describe how to apply the model for system dimensioning and show the tradeoff between activation of servers and communication cost.
2017
978-1-5386-0692-6
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/314993
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact