Due to the indispensable role of merchant ships in cargo transport, voyage optimization makes a significant impact on the CO2 balance. Future ships will be equipped with modern technologies, which increases the complexity of the overall system and promotes intelligent control systems. Due to the limitation of the conventional models, sophisticated optimization of the overall strategy is not possible. In this sense, in the framework of this project, the application of Reinforcement Learning is investigated for this approach. The application of RL offers continuous action space, which is not possible in conventional methods. Furthermore, the possibility of cooperation of several RL agents leads to overcoming the complexity of the modern marine vehicle.