In the framework of queuing theory with multiclass jobs, trunk-reservation is an admission control technique to handle job class priority in an online fashion, and serve as many high priority jobs as possible. This is achieved by rejecting jobs with sufficiently low priority when the buffer space becomes a scarce resource, i.e., when the queue may soon overflow. Mathematically, the objective is to maximize the long-term reward of the admitted jobs, where each job is assigned a reward which is monotonic with respect to the priority class it belongs to. In this paper we study online learning of optimal trunk-reservation policies when the system parameters, i.e., class arrival rates and service time, are unknown. Starting from a Markov Decision Process (MDP) formulation, we leverage the stairway structure of the optimal policy to define Integer Gradient Ascent (IGA), a reinforcement learning (RL) algorithm based on policy-gradient methods, specifically tailored to the mathematical properties of the problem at hand. We provide theoretical results on the convergence properties of IGA. Extensive numerical experiments characterize its behavior and confirm that it outperforms standard RL techniques in terms of convergence rate.
Optimal Trunk-Reservation by Policy Learning
Antonio Massaro;Francesco de Pellegrini;
2019-01-01
Abstract
In the framework of queuing theory with multiclass jobs, trunk-reservation is an admission control technique to handle job class priority in an online fashion, and serve as many high priority jobs as possible. This is achieved by rejecting jobs with sufficiently low priority when the buffer space becomes a scarce resource, i.e., when the queue may soon overflow. Mathematically, the objective is to maximize the long-term reward of the admitted jobs, where each job is assigned a reward which is monotonic with respect to the priority class it belongs to. In this paper we study online learning of optimal trunk-reservation policies when the system parameters, i.e., class arrival rates and service time, are unknown. Starting from a Markov Decision Process (MDP) formulation, we leverage the stairway structure of the optimal policy to define Integer Gradient Ascent (IGA), a reinforcement learning (RL) algorithm based on policy-gradient methods, specifically tailored to the mathematical properties of the problem at hand. We provide theoretical results on the convergence properties of IGA. Extensive numerical experiments characterize its behavior and confirm that it outperforms standard RL techniques in terms of convergence rate.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.