Johns Hopkins APL Trains AI to Adapt Through Video Games

A researcher can use Meta Arcade to control the sizes, speeds and colors of everything in the game, or even create new ones. The ease of modifying games allows researchers to focus on a specific capability of an algorithm and measure how well an AI agent can handle those changes.

Credit: Johns Hopkins APL/Kai Stone,​ Tom Wach

The more artificial intelligence agents are deployed in operational scenarios, the more reliably and quickly they will have to navigate unpredictable environments. Researchers at the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, have created Meta Arcade, a suite of arcade games that can be configured and used as training tasks for artificial intelligence systems. Initially developed under the Defense Advanced Research Projects Agency’s Lifelong Learning Machines program, Meta Arcade trains AI agents to quickly adapt to new and changing scenarios.

Arcade Games for Critical AI Research

The games in Meta Arcade are modeled on classics like Pong and Breakout, common benchmarks in the deep reinforcement learning (DRL) community — experts focused on ways to improve how AI systems train and learn. Unlike a typical game, where settings and features are fixed, a researcher can use Meta Arcade to control the sizes, speeds and colors of game entities, or even create new games. The ease of modifying games through Meta Arcade allows researchers to focus on an algorithm’s specific capability and measure how well an AI agent can handle changes.

The core team behind Meta Arcade includes DRL researcher Ted Staley, AI engineer Chace Ashcraft and researcher Ben Stoler, all from APL’s Research and Exploratory Development Department (REDD). The tool is available to the public through the development platform GitHub, and the team hopes it sparks conversation about other potential tools the DRL community is currently missing. Meta Arcade was also shared at NeurIPS 2021, the Conference and Workshop on Neural Information Processing Systems.

“We needed to develop a tool like Meta Arcade to study and advance our AI research,” said Bart Paulhamus, chief of APL’s Intelligent Systems Center (ISC), which supported the development of Meta Arcade. “By releasing it to the public, APL is accelerating the development of trusted AI for our nation’s most critical challenges. Now, AI researchers can focus their time on AI research, not tool development.”

Pushing the State of the Art

When it comes to DRL training, an agent is given the freedom to play a game repeatedly, making and learning from its own decisions. Each time the agent makes a decision, it is given a signal that describes how successful it was. Those signals allow the agent to learn through trial and error: Strategies that seem to produce positive signals are reinforced, and behaviors that lead to bad outcomes are used less and less, explained Staley.

But how well can the agent solve circular mazes? Or mazes displayed in a different color? Additionally, what tools and methods can be used to train an agent that solves mazes in general? Should it be shown many colors and then different maze types? Or maybe the other way around?

“Those are difficult questions to answer because the expertise of the agent is entirely measured against the training problem itself,” Staley said. “What my colleagues and I realized when studying these topics is that we rarely have the tools to properly ask these research questions. That’s what prompted Meta Arcade.”

The tool’s name reflects its objective: Meta Arcade not only allows researchers to train AI agents through gaming but also prompts researchers to evaluate the games themselves. By creating new gaming environments through Meta Arcade, researchers can create problems and therefore benchmarks to evaluate algorithm performance, Ashcraft explained. This enables DRL researchers to create rich problem sets and compare one algorithm’s problem-solving capabilities to those of another.

“The value in creating new environments and setting new benchmarks,” Ashcraft said, “is that it helps us push the state of the art.”

The “Fruit Fly” for Lifelong Learning Research

Genetic research on fruit flies set the path for research on more complex organisms, and AI techniques developed for chess playing were foundational to solving problems like data mining and molecular dynamics, explained Mike Wolmetz, who manages APL’s Human and Machine Intelligence program.

So, similar to how computer chess was once called the “fruit fly of AI,” Wolmetz said Meta Arcade is the fruit fly for lifelong machine-learning research — a critical mechanism through which more complex problems can be solved.

“Meta Arcade is helping the Lab solve problems related to agent adaptability, including maritime overhead imagery recognition and missile defense in unpredictable contexts,” he said.

Meta Arcade was developed with support from an APL team that includes Wolmetz as well as DARPA Lifelong Learning Machines project manager and technical lead Gautam Vallabha, robotics software engineer Kapil Katyal, electrical engineer Chris Ratto and AI researcher Cash Costello.

Meta Arcade Applied

In work funded by the Office of Naval Research (ONR), APL researchers are using Meta Arcade to study strategies for producing agents steeled for perception and task changes.

Jared Markowitz, an AI researcher in REDD who leads the ONR-funded project, said that insights gained from the arcade’s testing environments are being used to produce more versatile maritime platform defense agents capable of handling different fleet geometries, threat types and countermeasures. “Meta Arcade is also helping to refine algorithms that can classify overhead ocean imagery collected under variable viewing conditions,” he noted.

Tamim Sookoor, a computer scientist in APL’s Asymmetric Operations Sector, and former staff member Christina Selby applied Meta Arcade while leading a project for the Johns Hopkins University Institute for Assured Autonomy. The project, RADICS (Runtime Assurance of Distributed Intelligent Control Systems), sought to understand and predict how DRL models will fail in a given scenario. Meta Arcade enabled the team to observe and quantify a DRL model’s uncertainty with respect to specific changes in the environment, like game background color and ball speed.

As APL’s sponsors look to deploy AI agents in unpredictable real-world environments, the Lab’s DRL community will continue to develop intelligent agents with the ability to quickly and reliably adapt their strategies to changing conditions in the field, according to Ratto, who leads the ISC’s Artificial Intelligence Group.

“Meta Arcade will challenge the larger AI research community to develop better tools that improve AI robustness and strengthen trust in an AI agent’s decision-making,” he said.