Robot teams that talk to humans


SRI scientists created a new AI-based approach to robotics that enables mobile robots to effectively communicate with each other — and with their human operators.


In the past several years, robotics researchers have recognized that the large language models (LLMs) responsible for generative AI applications will also have a profound impact on robots.

“Before large language models, robots could move and perform tasks, but they couldn’t explain what they saw and how they did things,” says Han-Pang Chiu, technical director in SRI’s Vision and Robotics Laboratory. “We’re moving from command-based robotics to conversational collaboration. That’s a fundamental shift — and it’s going to change how we work with machines.”

“With LLMs, robots can describe what they see, what they’re doing, and, most importantly, why they are doing it.” — Han-Pang Chiu

Chiu’s recent SUWAC (Shared Understanding for Wide-Area human-robot Collaboration) project aims to take full advantage of the emergent capabilities of LLMs to advance human-robot collaboration. The premise: to equip a multi-talented team of robots with an LLM-based framework that allows researchers to communicate with the robots in natural spoken or typed language and locate any object more efficiently and effectively.

Why robot communication matters

Chiu says that the implications of projects like SUWAC are far-reaching and could lead to teams of search-and-rescue robots that explore scenes of natural disasters, household robots whose owners can speak commands like “wash the dishes,” or even factories of robots who chat as they work.

“With LLMs, robots can describe what they see, what they’re doing, and, most importantly, why they are doing it — and, in turn, researchers can understand and communicate with the robots,” says Chiu. “It builds trust and improves collaboration.”

In one of Chiu’s demos, a canine-like quadruped robot and a wheeled robot find themselves in a room that looks like a small high school theater. Four steps descend from a dais to the main floor where Robodog and Roborover await.

The researcher then gives a prompt in simple, colloquial English: “I don’t remember where I left my backpack and laptop. Can you two find them?” The two robots begin exchanging information about the room, divvying up the work. The quadruped knows that the wheeled robot will find the stairs challenging and offers to search the dais. The wheeled robot offers to search the rest of the room. Soon, both the backpack and the laptop are found.

The problem of unfamiliar environments

To help robot teams better navigate unfamiliar environments, SUWAC takes a new approach to robot-to-robot communication. The primary challenge is always one of data: What data should the robots exchange between each other, and how can they minimize the amount of data exchange required to do their job effectively? In many areas where robot teams can be most useful — think search-and-rescue or mine disposal — network constraints are inevitable.

The key breakthrough, Chiu explains, is a “3D scene graph,” a more efficient way of capturing and categorizing information than the data-rich “point clouds” often used in vision-based robotics. Chiu calls the scene graph the “missing link:” A way for robots to categorize and label visual data in a way that’s easy to exchange with other robots and explain to human operators. These scene graphs are readily interpretable by LLMs, enabling the robots to understand what is nearby and what actions might be appropriate. Chiu says that this combination of LLMs and scene graphs also helps create a “shared understanding” between humans and machines.

“Instead of relying on humans to describe every detail, we let the robot perceive, interpret, and explain its world in language,” Chiu says. “That saves time and allows for much more natural collaboration. It’s unlike anything that has gone before.”

Thanks to the efficiency and legibility of scene graphs, Chiu’s SUWAC-equipped robots can distinguish a kitchen from a bedroom by the objects they see and can use sophisticated reasoning to search the most likely locations of the objects being sought. The robots can, for example, decide that a spoon is probably going to be found in the kitchen and not the bedroom.

In another demo, Chiu shows a robot navigating a large room filled with couches and chairs, searching for a human hiding nearby. Intuitively, the robot understands that even a small human could not hide behind a small chair and instead immediately begins searching behind furniture big enough to conceal a person.

The robot is situationally aware and able to deduce from its surroundings. It is, in short, using the kind of common-sense reasoning that helps humans quickly interpret new environments. It’s a type of reasoning that was beyond the reach of machines until very recently.

“Common sense is what separates humans from most AI,” Chiu explains. “We aim to bring that ability into robotic planning.”

A fundamental advance in robotics

SUWAC, explains Chiu, is the first use of an LLM for wide-area robotic search. It also draws on state-of-the-art perception systems (based on technologies like LIDAR, stereoscopic vision, and object recognition) developed by SRI’s Center for Vision Technologies.

Currently, according to a paper related to the SUWAC project, SRI has achieved a 95% success rate in object search within unfamiliar environments, and SUWAC also much more data- and energy-efficient than previous wide-area search models.

Chiu expects the use of LLMs in wide-area search to accelerate quickly as he and other researchers continue down this path. He imagines teams of variously capable robots — on foot, on wheels or tracks, in the air, and even under the waves — being directed by humans from great distances using everyday language. Recently, SRI licensed the SUWAC technology to robotics startup Avsr AI with the aim of commercializing SUWAC’s novel capabilities.

“Talking to a static machine is different from talking to a mobile robot,” Chiu emphasizes. “We’re demonstrating that generative AI, already adept at answering chat and voice prompts on computer systems, has a massive role to play in robotics and 3D navigation.

To learn more about SUWAC or SRI’s Center for Vision Technologies, contact us.


Read more from SRI

OSZAR »