Robotics: Science and Systems XVII
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang*, Yunfan Lu*, Cunjun Yu, David Hsu, Xuguang Lan, Nanning Zheng* These authors contributed equally
Abstract:
This paper presents INVIGORATE; a robot system that interacts with humans through natural language and grasps a specified object in clutter. The objects may occlude; obstruct; or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects; from input language expressions and RGB images; (ii) infer object blocking relationships (OBRs) from the images; and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection; for visual grounding; for question generation; and for OBR detection and grasping. They allow for unrestricted object categories and language expressions; subject to the training datasets. However; errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robot’s performance. To overcome these uncertainties; we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning; the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available online: https://youtu.be/zYakh80SGcU.
Bibtex:
@INPROCEEDINGS{ZhangLu-RSS-21, AUTHOR = {Hanbo Zhang AND Yunfan Lu AND Cunjun Yu AND David Hsu AND Xuguang Lan AND Nanning Zheng}, TITLE = {{INVIGORATE: Interactive Visual Grounding and Grasping in Clutter}}, BOOKTITLE = {Proceedings of Robotics: Science and Systems}, YEAR = {2021}, ADDRESS = {Virtual}, MONTH = {July}, DOI = {10.15607/RSS.2021.XVII.020} }