Skip to yearly menu bar Skip to main content


Virtual presentation / poster accept

Achieving Sub-linear Regret in Infinite Horizon Average Reward Constrained MDP with Linear Function Approximation

Arnob Ghosh · Xingyu Zhou · Ness Shroff

Keywords: [ Reinforcement learning theory ] [ Linear MDP ] [ Theory of Constrained Reinforcement Learning ] [ Infinite horizon Average Reward ] [ Soft-max ] [ Model-free RL ] [ Theory ]


Abstract: We study the infinite horizon average reward constrained Markov Decision Process (CMDP). In contrast to existing works on model-based, finite state space, we consider the model-free linear CMDP setup. We first propose a computationally inefficient algorithm and show that O~(d3T) regret and constraint violation can be achieved, in which T is the number of interactions, and d is the dimension of the feature mapping. We also propose an efficient variant based on the primal-dual adaptation of the LSVI-UCB algorithm and show that O~((dT)3/4) regret and constraint violation can be achieved. This improves the known regret bound of O~(T5/6) for the finite state-space model-free constrained RL which was obtained under a stronger assumption compared to ours. We also develop an efficient policy-based algorithm via novel adaptation of the MDP-EXP2 algorithm to our primal-dual set up with O~(T) regret and even zero constraint violation bound under a stronger set of assumptions.

Chat is not available.