Spotlight
Grounding Language Plans in Demonstrations Through Counter-Factual Perturbations
Yanwei Wang · Johnson (Tsun-Hsuan) Wang · Jiayuan Mao · Michael Hagenow · Julie Shah
Grounding the abstract knowledge captured by Large Language Models (LLMs) inphysical domains remains a pivotal yet unsolved problem. Whereas prior workshave largely focused on leveraging LLMs for generating abstract plans in symbolicspaces, this work uses LLMs to guide the learning for structures and constraintsin robot manipulation tasks. Specifically, we borrow from manipulation plan-ning literature the concept of mode families, defining specific types of motionconstraints among sets of objects, to serve as an intermediate layer that connectshigh-level language representations with low-level physical trajectories. By lo-cally perturbing a small set of successful human demonstrations, we augment thedataset with additional successful executions as well as counterfactuals that failthe task. Our explanation-based learning framework trains neural network-basedclassifiers to differentiate success task executions from failures and as a by-productlearns classifiers that ground low-level states into mode families without denselabeling. This further enables us to learn structured policies for the target task.Experimental validation in both 2D continuous-space and robotic manipulationenvironments demonstrates the robustness of our mode-basedimitation methods under external perturbations.