Entwicklung eines Reinforcement Learning basierten Flugzeugautopiloten unter der Verwendung von DeterministicPolicy Gradients


One of the most difficult challenges in reinforcement learning is the continuous control of systems in a continuous state and action space. This papers goal is to design and implement a reinforcement learning based airplane autopilot that controls an aircraft in continuous state and action space. Deterministic Policy Gradients define a framework for this purpose in the form of an actor-critic architecture that approximates a continuous action space and outputs a continuous action vector. The framework is accompanied by the implementation of a reward function that provides the autopilot with behavioral feedback. Finally, the feasibility and robustness of the implemented autopilot is tested inside a commercial flight simulator. For this purpose multiple scenarios are defined and the resulting data evaluated through statistical methods.