Robotics: Science and Systems XVII
HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing
Keiko Nagami, Mac SchwagerAbstract:
In this work we present a planning and control method for a quadrotor in an autonomous drone race. Our method combines the advantages of both model-based optimal control and model-free deep reinforcement learning. We consider a single drone racing on a track marked by a series of gates; through which it must maneuver in minimum time. Firstly we solve the discretized Hamilton-Jacobi-Bellman (HJB) equation to produce a closed-loop policy for a simplified; reduced order model of the drone. Next; we train a deep network policy in a supervised fashion to mimic the HJB policy. Finally; we further train this network using policy gradient reinforcement learning on the full drone dynamics model with a low-level feedback controller in the loop. This gives a deep network policy for controlling the drone to pass through a single gate. In a race course; this policy is applied successively to each new oncoming gate to guide the drone through the course. The resulting policy completes a high-fidelity AirSim drone race with 12 gates in 34.89s (on average); outracing a model-based HJB policy by 33.20s; a supervised learning policy by 1.24s; and a trajectory planning policy by 12.99s; while a model-free RL policy was never able to complete the race.
Bibtex:
@INPROCEEDINGS{Nagami-RSS-21, AUTHOR = {Keiko Nagami AND Mac Schwager}, TITLE = {{HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing}}, BOOKTITLE = {Proceedings of Robotics: Science and Systems}, YEAR = {2021}, ADDRESS = {Virtual}, MONTH = {July}, DOI = {10.15607/RSS.2021.XVII.062} }