Robotics: Science and Systems XII
Combined Optimization and Reinforcement Learning for Manipulation Skills
Peter Englert, Marc ToussaintAbstract:
This work addresses the problem of how a robot can improve a manipulation skill in a sample-efficient and secure manner. As an alternative to the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic control cost function; 2) A black-box return function; and 3) A black-box binary success constraint. While the overall policy optimization problem is high- dimensional, in typical robot manipulation problems we can assume that the black-box return and constraint only depend on a lower-dimensional projection of the solution. With our formulation we can exploit this structure for a sample-efficient learning framework that iteratively improves the policy with respect to the objective functions under the success constraint. We employ efficient 2nd-order optimization methods to optimize the high-dimensional policy w.r.t. the analytic cost function while keeping the lower dimensional projection fixed. This is alternated with safe Bayesian optimization over the lower-dimensional projection to address the black-box return and success constraint. During both improvement steps the success constraint is used to keep the optimization in a secure region and to clearly distinguish between motions that lead to success or failure. The learning algorithm is evaluated on a simulated benchmark problem and a door opening task with a PR2.
Bibtex:
@INPROCEEDINGS{Englert-RSS-16, AUTHOR = {Peter Englert AND Marc Toussaint}, TITLE = {Combined Optimization and Reinforcement Learning for Manipulation Skills}, BOOKTITLE = {Proceedings of Robotics: Science and Systems}, YEAR = {2016}, ADDRESS = {AnnArbor, Michigan}, MONTH = {June}, DOI = {10.15607/RSS.2016.XII.033} }