Designing Conversational AI to Provide Medical Assistance on the Battlefield

In battle, soldiers with no specialized medical knowledge often find themselves having to care for injured comrades for prolonged periods of time. Naturally, they need all the help they can get.

Researchers at the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, are working on a proof of concept for a conversational artificial intelligence (AI) agent that will be able to provide medical guidance to untrained soldiers in plain English, by applying knowledge gleaned from established care procedures.

The project, known as Clinical Practice Guideline-driven AI (CPG-AI), is based on a type of AI known as a large language model (LLM) — the best-known example of which is the now-famous ChatGPT. (CPG-AI is not affiliated with ChatGPT in any way, nor is APL.)

The Power of Large Language Models

Methods of providing clinical support using AI tend to be highly structured, requiring precisely calibrated rules and meticulously labeled training data. That approach is well suited to providing alerts and reminders to experts in a relatively calm environment. But coaching untrained novices, or even trained medics, as they provide medical care in a chaotic environment is a different story.

“There might be 20 or 30 individual components running behind the scenes to enable a conversational agent to help soldiers assist their buddies on the battlefield — everything from search components, to deciding which information from the search is relevant, to managing the structure of the dialogue,” said Sam Barham, a computer scientist in APL’s Research and Exploratory Development Department, who is leading the CPG-AI project, which also includes Arun Reddy, Michael Kelbaugh and Caitlyn Bishop. “In the past, to enable a system like this, you’d have had to train a bespoke neural network on each very specific task.”

An LLM, on the other hand, is trained on vast amounts of unlabeled data — text, in this case — and not specialized for any particular task. That means it can theoretically adapt to any situation that can be described in words, using text prompts that provide the situational context and relevant information.

“LLMs have this incredible ability to adapt to whatever task you set for them, virtually anything that’s in the realm of natural language,” said Barham. “So instead of training a neural network on all these different capabilities, you can train a single neural network to respond fluidly to the situation.”

Building Better Apps

Until recently, LLMs were far too slow and computing-power-intensive to be of any practical use in this operational context. However, recent advances in computing power and in LLMs themselves have made the prospect realistic. CPG-AI draws on a wider APL-developed software ecosystem for developing apps that take advantage of LLMs, known internally as RALF, or Reconfigurable APL Language model Framework.

RALF was developed by APL’s Intelligent Systems Center (ISC) as part of a strategic initiative centered on LLMs.

“LLMs are having a transformative impact on the AI community, and that impact extends to the missions of APL’s sponsors,” said ISC Chief Bart Paulhamus. “The ISC needs to explore all aspects of LLMs — to become experts at creating, training and using them. RALF is an exciting new technology that accelerates adoption of LLMs for our scientists and engineers.”

RALF comprises two sets of tools: The first allows users to build apps using LLMs, and the second allows users to build conversational agents that can take advantage of those apps. CPG-AI integrates both.

From Care Algorithm to AI Tool

While using LLMs makes formal training unnecessary — in the sense of manually labeling data and tweaking and calibrating all kinds of interrelated variables and parameters — a lot of work goes into transforming a basic LLM into a capability like CPG-AI. When all you have to work with is text, choosing your words becomes very important. As anyone who’s used AI text-generation tools knows, they can produce some comically wrong results that, to say the least, would not be funny on the battlefield.

“An LLM is like a precocious 2-year-old that’s very good at some things and extremely bad at others, and you don’t know in advance which is which,” Barham said. “So there are two big pieces that go into creating a tool like this: first, we have to carefully, precisely engineer text prompts, and second, we’ve injected some ground truth in the form of care algorithms.”

Specifically, Barham and his team applied a care algorithm — essentially, a protocol for how to respond to a medical event — taken from Tactical Combat Casualty Care (TCCC). The TCCC is a set of guidelines and care algorithms developed by the U.S. Department of Defense Joint Trauma System to help relative novices provide trauma care on the battlefield. Conveniently, the TCCC care algorithms exist in the form of flowcharts that lend themselves to being translated into a machine-readable form.

In addition, the researchers found and converted more than 30 clinical practice guidelines from the Department of Defense Joint Trauma System to be ingested as text by their model, including guidelines for treating burns, blunt trauma and other common conditions encountered by warfighters on the battlefield.

In the project’s first phase, Barham and his team produced a prototype model that can infer a patient’s condition based on conversational input, answer questions accurately and without jargon, and guide the user through the care algorithms for tactical field care — a category of care that encompasses the most common injuries encountered on the battlefield, including breathing issues, burns and bleeding.

Thanks to the capabilities of RALF, CPG-AI can also switch smoothly between stepping through a care algorithm and answering any questions the user may have along the way.

In the next phase, the team plans to expand the range of conditions CPG-AI is capable of addressing. They also intend to improve CPG-AI by crafting more effective prompts, as well as improve the model’s ability to correctly categorize and retrieve information drawn from the practice guidelines.

“It’s not battle-ready by any means, but it’s a step in the right direction,” Barham said.

Leveraging Conversational AI to Save Lives

Amanda Galante, who oversees the Assured Care research portfolio at APL, said this work is timelier and more important than ever, given that it connects an exciting emerging technology with an urgent military need.

“[Barham] and his team are applying models like ChatGPT to solve sponsor problems, which presents considerable challenges,” said Galante. “How can we harness these powerful tools, while also ensuring accuracy, as well as transparency — both in terms of the reasoning underlying the AI’s responses, and the uncertainty of those responses? If we want to enable relative novices to provide complex medical care at scale, we’ll need a capability like this that can provide the relevant knowledge in a usable manner.”