Robotics: Science and Systems XVII

HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing

Keiko Nagami, Mac Schwager

Abstract:

In this work we present a planning and control method for a quadrotor in an autonomous drone race. Our method combines the advantages of both model-based optimal control and model-free deep reinforcement learning. We consider a single drone racing on a track marked by a series of gates; through which it must maneuver in minimum time. Firstly we solve the discretized Hamilton-Jacobi-Bellman (HJB) equation to produce a closed-loop policy for a simplified; reduced order model of the drone. Next; we train a deep network policy in a supervised fashion to mimic the HJB policy. Finally; we further train this network using policy gradient reinforcement learning on the full drone dynamics model with a low-level feedback controller in the loop. This gives a deep network policy for controlling the drone to pass through a single gate. In a race course; this policy is applied successively to each new oncoming gate to guide the drone through the course. The resulting policy completes a high-fidelity AirSim drone race with 12 gates in 34.89s (on average); outracing a model-based HJB policy by 33.20s; a supervised learning policy by 1.24s; and a trajectory planning policy by 12.99s; while a model-free RL policy was never able to complete the race.

Download:

Bibtex:

  
@INPROCEEDINGS{Nagami-RSS-21, 
    AUTHOR    = {Keiko Nagami AND Mac Schwager}, 
    TITLE     = {{HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2021}, 
    ADDRESS   = {Virtual}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2021.XVII.062} 
}