July 22, 2021
Five years ago, researchers at the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, created the Brain Observatory Storage Service & Database (BossDB) — a scalable, cloud-native data ecosystem for high-resolution volumetric neuroscience datasets — with an eye toward making data open, accessible and easy to use.
Now, an expanded collaboration between BossDB and the Amazon Web Services (AWS) Open Data Sponsorship Program will further enable the storage and accessibility of ever-expanding neuroimaging datasets generated by the neuroscience research community.
“This relationship will catalyze the democratization of data access and accelerate scientific exploration by researchers and members of the public,” said Will Gray Roncal, a co-investigator in the Neuroscience Group in APL’s Research and Exploratory Development Department.
Sandy Hider, the lead APL developer for the BossDB team agreed. “With this collaboration, APL hopes to provide an avenue for more individuals and organizations to participate in creative research in neuroscience, with potential downstream benefits to us all.”
BossDB was initially developed to facilitate data sharing as part of the Machine Intelligence from Cortical Networks (MICrONS) Program, funded by the Intelligence Advanced Research Projects Activity (IARPA) to reverse engineer brain algorithms. The team is currently supported by the National Institutes of Health (NIH) as part of its BRAIN Initiative Informatics Program, created to build communities and infrastructure around shared data.
The BossDB ecosystem was designed with scale in mind to support increasingly larger and larger contiguous electron microscopy (EM) datasets. It lives in the AWS ecosystem and utilizes numerous AWS resources and server-less components such as S3, DynamoDB, Lambda and SQS that enable high ingest speeds, as well as a variety of on-demand data access tools to support visualization, image processing and annotation, and analysis.
Integration with community tools and resources has been a key enabler for data sharing and follow-on discoveries. To facilitate collaboration, BossDB provides a scalable Application Program Interface (API) and Python-based software development kit (SDK) called intern. It also offers data visualization through tools like Neuroglancer and syGlass, and leverages the Scalable Analytics for Brain Exploration Research analytics platform for image processing and annotation.
“With the cloud-based ecosystem provided by BossDB, we can easily work with teams over distributed regions to share insights and collaboratively process data to accelerate scientific discovery,” said Eva Dyer, an assistant professor in the Department of Biomedical Engineering at Georgia Tech, and director of the Neural Data Science (NerDS) Lab.
The technology underlying the BossDB ecosystem originated as part of the NeuroData project in a collaboration with Joshua Vogelstein and Randal Burns, researchers at the Johns Hopkins Whiting School of Engineering. Since that time, the data stored within BossDB has tripled in size, and continues to grow at a rapid pace.
“With the support of computer scientists, engineers and neuroscientists at APL, BossDB currently hosts over 10 petavoxels of data consisting of dozens of public and private datasets, including large amounts of complex multidimensional data from over 30 collaborators,” said Brock Wester, the principal investigator for the APL BossDB team.
“This enables anyone with internet access to visualize image data from different technologies to generate hypotheses or plan new experiments,” he continued. “If investigators wanted to download data or code, they are able to access and analyze disparate data with the same functionality and syntax, which allows for faster comparisons and scientific discoveries.”
“The power of the BossDB ecosystem is the diversity of our community datasets and passion of our scientists — all leveraged within a common ecosystem,” said APL’s Jordan Matelsky, a ‘big data’ computational neuroscientist working on the team.
BossDB currently supports dozens of geographically distributed academic partners in the neuroscience community, and hosts data from a wide variety of imaging modalities, including X‑ray, MRI, light microscopy and electron microscopy. The BossDB technology enables the community to take part in new research on shared data, such as:
The AWS Open Data Sponsorship Program covers the cost of storage for publicly available, high‑value cloud-optimized datasets, and the program’s vision aligns closely with the APL BossDB team’s commitment to make neuroscience data and tools available to the world.
“Many of the world’s most important datasets are open source and hosted on platforms like the AWS Cloud,” said Wester. “As neuroscience continues to advance with anticipated exponential growth of shared datasets over the next few years, cloud-native data ecosystems like BossDB will be critical for neuroscientists to scale their work, driving new scientific discoveries.”
Media contact: Paulette Campbell, 240-228-6792, Paulette.Campbell@jhuapl.edu
The Applied Physics Laboratory, a not-for-profit division of The Johns Hopkins University, meets critical national challenges through the innovative application of science and technology. For more information, visit www.jhuapl.edu.