Reward engineering has long been a challenge in Reinforcement Learning research,
as it often requires extensive human effort.
In this paper, we propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks,
using only a text description of the task goal and the agent’s visual observations,
by leveraging feedback from vision language foundation models (VLMs).
The key to our approach is to query these models to give preferences over pairs of the agent’s image observations based on the text
description of the task goal, and then learn a reward function from the preference labels.
We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains — including classic control, as well as manipulation of rigid, articulated,
and deformable objects — without the need for human supervision, outperforming prior methods
that use large pretrained models for reward generation under the same assumptions.
Below we show the policy rollouts from our method and baselines on seven tasks including rigid, articulated, and deformable object manipulation.
For each task, we show a short text description of task goal, which, when combined with the template prompt below, forms the full prompt that we use to query the VLM for preferences.
task description: "to fold the cloth diagonally from top left corner to bottom right corner"
task description: "to straighten the blue rope"
task description: "to move the container, which holds water, to be as close to the red circle as possible without causing too many water droplets to spill"
task description: "to move the soccer ball into the goal"
task description: "to open the drawer"
task description: "to minimize the distance between the green cube and the hole"
task description: "to balance the brown pole on the black cart to be upright"
@InProceedings{wang2024, title = {RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback}, author = {Wang, Yufei and Sun, Zhanyi and Zhang, Jesse and Xian, Zhou and Biyik, Erdem and Held, David and Erickson, Zackory}, booktitle = {Proceedings of the 41th International Conference on Machine Learning}, year = {2024} }