UCLA computer scientists and their collaborators have devised a plan for the use of cloud computing and big data analysis to allow scientists in developing countries to jumpstart bioinformatics research programs.
Bioinformatics is the computational analysis of biological data. Research in this emerging area has broad applications for diagnosing and treating diseases and preventing their spread; and in developing public health strategies and new drugs. The team’s proposal was published in Nature Biotechnology. The scientists have also created an online educational resource guide.
“A computer and a high-speed internet connection are all the infrastructure that’s required for good bioinformatics studies, and these resources are often already at universities in lower-income countries,” said study co-author Serghei Mangul, a UCLA postdoctoral scholar in computer science who specializes in biosciences.
That investment is much cheaper, Mangul said, than building a state-of-the-art life sciences lab, often called a “wet” lab because of chemicals and fluids that are used and analyzed. The cost of those facilities can start at hundreds of thousands of dollars to set up and maintain. There are also safety regulations and policy issues to consider, he added.
A startup bioinformatics program doesn’t necessarily have to gather the data.
“There is already a lot of publicly available data in genomics and in related fields that could yield impactful insights that would be locally relevant,” Mangul said.
For example, existing bioinformatics research on tropical countries could lead to new ways to prevent the spread of malaria, dengue fever, Chagas disease and other diseases that are prevalent in those regions, he said. Additionally, previously published bioinformatics data may offer more insight on a “second pass,” something that trained bioinformatics researchers could do.
“I really believe in the secondary analysis of the data; it can be just as important as the first pass,” said Mangul, who is also a fellow in The Collaboratory at UCLA’s Institute for Quantitative and Computational Biosciences.
Mangul was born and raised in Moldova, a lower-middle-income country in Eastern Europe. He came to the U.S. for his doctoral studies. He said that helping developing countries, such as his own, is a particular passion for him.
The same is true for Lana Martin, the study’s other co-author and the programs manager at the UCLA institute. She said working on this idea was partly motivated by growing up in a lower-income region of Texas, and partly by experiences conducting field research in Panama while completing her doctorate in anthropology at UCLA.
“There are already good scientists in those regions that only need some training to quickly get them up to speed on state-of-the-art data analysis techniques,” Martin said.
Their online educational resource guide, which is available at the software platform site GitHub, includes examples of bioinformatics codes and datasets, as well as a way to access datasets; and cloud computing-based resources. Mangul and Martin are working with one of their co-authors in Panama to build a strong bioinformatics research program there.
The researchers plan to develop university-level curricula and build a networking platform to connect bioinformatics scientists around the world.
“There will be an even greater demand for analysis of bioinformatics data in coming years,” Martin said. “With that in mind, we believe that establishing a global bioinformatics training and support consortium, with unified platforms and materials, will encourage scientists in lower-income countries and institutions to participate in cutting-edge, and more importantly, locally beneficial STEM research.”
“Ultimately, this would increase domestic scientific research and publication productivity,” she added.
Eleazar Eskin, a UCLA professor of computer science and human genetics, was also an author of the paper. Other authors are from Johns Hopkins University, Technological University of Panama, the George Washington University School of Medicine and Health Sciences, and UC San Diego.
The study was primarily supported by the National Institutes of Health, with additional support from the National Science Foundation and other educational and scientific organizations.