One of the most crucial challenges faced by the Li-ion battery community concerns the search for the minimum time charging without damaging the cells. This goal can be achieved by solving a large-scale constrained optimal control problem which relies on accurate electrochemical models. However, these models are limited by their high computational cost as well as identifiability and observability issues. As an alternative, simple output-feedback algorithms can be employed, but their performance strictly depends on trial and error tuning. Moreover, particular techniques have to be adopted to handle safety constraints.With the aim of overcoming these limitations, we propose an optimal-charging procedure based on deep reinforcement learning. In particular, we focus on a policy gradient method to cope with continuous sets of states and actions. First, we assume full state measurements from the Doyle-Fuller-Newman (DFN) model, which is projected to a lower-dimensional feature space via Principal Component Analysis. Subsequently, this assumption is removed and only output measurements are considered as the agent observations. Finally, we show the adaptability of the proposed policy to changes in the environment’s parameters. The results are compared with other methodologies presented in the literature, such as the reference governor and proportional-integral-derivative approach.