Robotics: Science and Systems VI

Closing the Learning-Planning Loop with Predictive State Representations

B. Boots, S. Siddiqi and G. Gordon


A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. To evaluate the learner, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is nearoptimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then performing approximate point-based planning in the learned model. This experiment shows that the learned PSR captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons by working from real-valued features of observation sequences; and finally, our close-the-loop experiments provide an end-to-end practical test.



    AUTHOR    = {B. Boots AND S. Siddiqi AND G. Gordon},
    TITLE     = {Closing the Learning-Planning Loop with Predictive State Representations},
    BOOKTITLE = {Proceedings of Robotics: Science and Systems},
    YEAR      = {2010},
    ADDRESS   = {Zaragoza, Spain},
    MONTH     = {June},
    DOI       = {10.15607/RSS.2010.VI.036}