A central aspect for operating future quantum computers is quantum circuit optimization, i.e., the search for efficient realizations of quantum algorithms given the device capabilities. In recent years, powerful approaches have been developed which focus on optimizing the high-level circuit structure. However, these approaches do not consider and thus cannot optimize for the hardware details of the quantum architecture, which is especially important for near-term devices. To address this point, we present an approach to quantum circuit optimization based on reinforcement learning. We demonstrate how an agent, realized by a deep convolutional neural network, can autonomously learn generic strategies to optimize arbitrary circuits on a specific architecture, where the optimization target can be chosen freely by the user. We demonstrate the feasibility of this approach by training agents on 12-qubit random circuits, where we find on average a depth reduction by 27% and a gate count reduction by 15%. We examine the extrapolation to larger circuits than used for training, and envision how this approach can be utilized for near-term quantum devices.