With Basestack, Johns Hopkins APL Enables Crucial Genomics Work Around the World
Peter Thielen, a molecular biologist in Johns Hopkins APL’s Research and Exploratory Development Department, works to sequence SARS-CoV-2 at Johns Hopkins Hospital in early March 2020.
Credit: Johns Hopkins APL/Ed Whitman
Thu, 09/30/2021 - 14:22
Genomics work has historically been the province of small, highly specialized laboratories with access to top-of-the-line hardware, infrastructure support and technical expertise. Now, a software platform developed largely by the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, is making advanced genomics tools accessible and user-friendly for scientists and public health workers around the world.
The platform, called Basestack, is a modular, open-source software suite for complex informatics work that also serves as a pipeline for sharing the innovative tools developed at APL with the global community. Applications that previously were difficult to set up and unwieldy to use and required powerful hardware and high-speed internet can now be run locally, on off-the-shelf laptops, by way of a clean and intuitive interface.
Now used in more than 20 countries for genomic infectious disease surveillance, Basestack grew from a series of international workshops, developed and hosted by APL researchers in collaboration with the National Institutes of Health (NIH) Fogarty International Center, that focused on teaching public health workers a particular method for sequencing DNA.
These workshops highlighted that genomics data analysis has typically been done using the command line — typing long strings of obscure instructions into an intimidating black window for the computer to execute. This approach is commonly used by software developers but is poorly suited to public health employees looking to quickly analyze data.
“Within the first few sessions with front-line public health groups, it was clear that a lot of participants were not familiar with the command line,” said Brian Merritt, a bioinformatician and software developer in APL’s Research and Exploratory Development Department (REDD) and the primary developer of Basestack. “We spent a lot of time just debugging and setting up software environments, and then once it was set up, people were uncomfortable running commands within the terminal window. We needed a better solution.”
From those humble beginnings, Basestack has developed into a platform that enables large-scale collaborations between the NIH, APL and nations around the world, including groups in Africa, Asia and South America.
These collaborations would not be possible without the platform. By integrating software tools that can perform complex genomic operations in a way that’s simple to understand — more like navigating a modern operating system than typing in the command line — Basestack shrinks the time it takes to use software pipelines, and to teach them to newcomers, from weeks or months to days.
Peter Thielen, who leads a number of APL-NIH collaborations focused on genomic epidemiology (including the international engagements from which Basestack emerged), cited APL and NIH’s work with the nation of Chile in sequencing SARS-CoV-2 (the virus causing COVID-19) as an example of this radical shift in efficiency.
“Basestack is enabling Chile and others to mount a unified pandemic response over a broad geographic area by reducing the technical ‘lift’ required to establish local sequencing capacity,” Thielen said. “The major advantage is that we no longer have to train people directly — we can have quick interactions remotely to get them fully up to speed in their own labs, enabling reproducible sample analysis in an independent manner. Compared to current standard practice, this decentralized approach enables broader global knowledge of SARS-CoV-2 sequence diversity.”
Thielen noted that working in international settings can present a major challenge in terms of data collection, sharing and analysis. “Sequencing runs can generate hundreds of gigabytes of data, and concerns over information sovereignty and patient privacy concerns make it very difficult to share sensitive DNA sequencing data, even if high-speed internet access was always available to connect to remote servers,” he said. “Basestack solves those problems by bypassing data transfer from the primary computer, so the groups that generate the data can store it locally and transmit the polished data products with trusted partners.”
It’s also had a direct impact at APL. Jared Evans, a project manager in REDD, said Basestack has helped streamline the work his teams are doing in viral genomics.
“We’re incorporating Basestack into two sequencing projects, including the NIH-funded Centers of Excellence for Influenza Research and Response, recently awarded to Johns Hopkins University and APL, because it works really well and is relatively easy to use,” Evans said. “It takes out a lot of the manual curation work we’ve had to do with data in the past.”
Evans added that, as an end user, he sees potential to scale up the capabilities of Basestack to take advantage of advanced hardware when available, without compromising its utility for users without access to those resources.
That’s compatible with Merritt’s vision for Basestack — to be a modular, open-source system that can be adapted to any workflow and customized by the user, regardless of whether the work is in biology, materials science or any field with a highly specialized pipeline of complex work. The team is already seeing that vision become reality as efforts are underway to expand Basestack beyond its original suite of biological applications.
For example, Thielen leads a project to develop a Basestack module for environmental DNA characterization to monitor the ecological status of marine mammals, with the ultimate objective of enabling fully autonomous sequencing aboard autonomous vehicles in the ocean.
And that is only the beginning of what the developers believe the tool is capable of.
“The ability to characterize genetic data in real time opens a new set of operational genomics capabilities in multiple domains, ranging from ecological monitoring in the ocean to monitoring the prevalence of COVID-19 in communities by analyzing wastewater,” Thielen said.
“The development process is stabilized enough that it’s very easy to deliver a new module or feature, and we’re ready to start churning out more and more custom pipelines,” Merritt added. “We’re at a good point to rapidly expand Basestack’s capabilities in biology and beyond.”
The Applied Physics Laboratory, a not-for-profit division of The Johns Hopkins University, meets critical national challenges through the innovative application of science and technology. For more information, visit www.jhuapl.edu.