
To chart the future of human-machine teaming, SRI’s COLLEAGUE project is building an AI-based system designed to act as a true collaborative partner.
In 2023, artificial intelligence (AI) applications were scaling across industries and job roles like never before. At the same time, SRI researcher Melinda Gervasio and her colleagues began to observe a fundamental limitation: With the arrival of generative AI (GenAI) and large language models (LLMs), humans and machines were collaborating more than ever, but mostly in a simple back-and-forth manner. The human specified a task and the AI completed it, but it didn’t feel like real teamwork.
For Gervasio, a technical director in SRI’s Artificial Intelligence Center, the ultimate vision of an AI collaborator isn’t an arduous back-and-forth interaction with a text-based chatbot. Instead, a true AI collaborator would be able to advance workflows much like a human co-worker. It wouldn’t be so rigidly “turn-based.” It would understand context clues. It would be able to perform tasks jointly with humans rather than just chat. Like any good teammate, it might even ask unexpected questions and offer surprising options. Ultimately, it would improve the planning process — and the performance of its human collaborators — rather than simply execute a predefined plan.
To bring this kind of AI-based teammate closer to reality, Gervasio and her counterparts in SRI’s Robotics Lab and Speech, Technology, and Research (STAR) Lab launched a project called COLLEAGUE (COLlaborative Language-Enabled Agents Grounded in Understanding and Explanation). The aim is to create a new framework for human-machine collaboration. Accelerated by SRI’s internal R&D funding, the project has established important proofs-of-concept and aims to fully quantify its advancements later this year. “We hope to turn AI-based autonomous agents into true teammates that can jointly decide on goals and plans, coordinate actions, respond to requests, and communicate proactively in natural language, much like human teammates do,” summarizes Gervasio.
How COLLEAGUE improves human-machine collaboration
COLLEAGUE focuses on situations where humans and AI agents work together on a joint goal. Success, says Gervasio, will require thinking about AI agents in fundamentally new ways. Most agentic architectures today, she observes, involve components (LLM-based agents) that operate within simple workflows or rely on emergent coordination. COLLEAGUE, by contrast, takes a deliberate approach to managing the interaction between its various AI agents and software tools. For example, in COLLEAGUE, a Collaboration Manager decides when to invoke the different communication and action agents to provide status updates, ask for clarifications, develop plans, monitor execution, and so on.
“How does this compare with a human-human interaction? That is the holy grail of human-AI teaming.” — Melinda Gervasio
COLLEAGUE also emphasizes “prosody-aware natural language understanding,” meaning that it gives careful consideration to acoustic and prosodic features of human speech such as pitch, volume, and pauses. The way something is said — rather than what is said — often contains important clues about the urgency of a task. Combining LLMs with prosody-aware conversational intelligence allows the team of AI agents to better infer the human intent behind an utterance.
Other aspects of the COLLEAGUE framework include a flexible approach to memory management and retrieval and a focus on Theory of Mind, which can equip machines with stronger insights into the unobservable mental states and beliefs of their human counterparts. Each of these levers aims to align AI intelligence with the realities of how human speech and mental processes function when solving problems, iterating when faced with challenges, and collaboratively executing solutions in dynamic environments.
What will AI colleagues do?
“Human-machine teaming will be a key theme of future defense-related research,” says Gervasio. “SRI is well positioned to continue improving how soldiers interact with AI-infused cyberphysical systems to keep us safe. Projects like COLLEAGUE are also highly relevant to commercial applications. For example, Pfizer is currently testing SRI’s XRGo robotic teleoperation system in their lab. But what if we got to a point where bioscience companies had humans and numerous autonomous laboratory robots working together in a completely seamless fashion? Those possibilities are what make this area of inquiry so exciting.”
Gervasio and her team have proposed several methods of evaluating the performance of the COLLEAGUE system — no easy task, given the inevitable subjectivity of human reactions to any collaborative experience, whether with a robot or another human. The first step will be to be to examine the performance of individual system components: “For example, with the planner, we could compare our planner and our hybrid planning approach to a pure LLM-based approach or a pure automated planner approach.”
The next step, she says, would be evaluating the impact of each of the pieces of the system by replacing them one at a time with baseline systems, and seeing how each action affects system performance as a whole.
The team is currently developing COLLEAGUE as a hardware robot partner that works with a human lab technician to develop and execute experimental protocols in a lab setting.
The more important test, she adds, will be watching humans try to solve real-world problems alongside an AI system. “How often do they succeed? How quickly do they succeed? But beyond these measures of task performance, we also want to measure collaboration quality. How natural is their communication? Can they form joint intents? Are their actions coordinated? Measuring task success is relatively simple and standard surveys exist for evaluating usability and usefulness. But coming up with meaningful, practical metrics for human-machine collaboration remains a challenge.” To examine the quality of the COLLEAGUE approach to human-machine collaboration, the team is currently developing COLLEAGUE as a hardware robot partner that works with a human lab technician to develop and execute experimental protocols in a lab setting.
And finally, there’s the ultimate test: “How does this compare with a human-human interaction?” Gervasio asks. “That is the holy grail of human-AI teaming: Enabling humans to take full advantage of AI capabilities by enabling a partnership between humans and AI agents and feels as seamless as the best human-human teamwork.”
Learn more about our work in AI or contact us.