Safe Reinforcement Learning via Statistical Model Predictive Shielding

Osbert Bastani; Shuo Li

Robotics: Science and Systems XVII

Safe Reinforcement Learning via Statistical Model Predictive Shielding

Osbert Bastani, Shuo Li

Abstract:

Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety—e.g.; that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy—it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe; and switches to the backup policy on the boundary of this region. The key challenge is checking when the backup policy is guaranteed to be safe. Our algorithm; statistical model predictive shielding (SMPS); uses sampling-based verification and linear systems analysis to perform this check. We prove that SMPS ensures safety with high probability; and empirically evaluate its performance on several benchmarks.

Download:

Bibtex:

  
@INPROCEEDINGS{Bastani -RSS-21, 
    AUTHOR    = {Osbert Bastani AND Shuo Li}, 
    TITLE     = {{Safe Reinforcement Learning via Statistical Model Predictive Shielding}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2021}, 
    ADDRESS   = {Virtual}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2021.XVII.026} 
}