Google and TU Berlin collaborate on language model for robotics control

Controlling robotics using a language model: Google and TU Berlin present PaLM-E

Google Robotics and the Technical University of Berlin have created PaLM-E, a visual language model that has the ability to take commands and forward them to various components of a robot. PaLM-E stands for “Pathways Language Model” and “Embodied.” The Pathways Language Model is currently the latest language model developed by Google and is the biggest visual language model with 562 billion parameters. Embodied means that this version of the Pathways Language Model was created specifically for robots.

PaLM-E was trained with text and image data and can navigate through a room to reach a target object. All navigation instructions come from the language model itself. Unlike its predecessor PaLM, PaLM-E’s input comes from “multimodal sentences” that combine text and image data from the robot sensor. PaLM-E is the currently highest reported scorer on the benchmark test OK-VQA, which measures the accuracy of a language model through open-ended questions about image files.

PaLM-E is applicable to multiple robot types and multiple modalities, such as image data from a camera or the positioning of a mounted gripper arm. Positive Transfer and Emergent Capabilities are two of PaLM-E’s special features. Positive Transfer means that PaLM-E has the ability to apply knowledge from a learned task to an unfamiliar task. Emergent Capabilities are capabilities that may not have come from the relationships and patterns in the previous training data.

PaLM-E is an important development in robotics as it strives to create systems that do not require task-specific training. This will result in robots that can navigate through unstructured, changing environments to complete everyday tasks such as cleaning. PaLM-E’s positive transfer ability and emergent capabilities make it a valuable tool for future robot development.

Leave a Reply