UCLA assistant professors of computer science Sriram Sankararaman and Ameet Talwalkar have received a $718,000 grant from the National Science Foundation to apply machine-learning tools to analyze massively large sets of human genomic data. Ultimately, this research could lead to developing better treatments for genetic diseases.
The pair will focus on genome-wide association studies, which examine the entire genetic code of anywhere from thousands to millions of people in order to find genetic variations associated with a certain disease. These genetic variations can be used to develop better strategies to detect, treat and prevent diseases. They will focus on providing the tools and software needed to analyze large-scale genomic data sets
“Currently, there’s a good set of tools to deal with moderately sized genomic data sets,” Sankararaman said. “Naturally, the next generation of tools should target modern, massive data sets. Ultimately, we want to enable the next set of discoveries.”
In the previous decade, landmark studies such as the Human Genome Project, which sequenced the entire human genome, and the International HapMap Project, which aimed to describe common patterns of human genetic variation involved in human health and disease, helped make genomic sequencing data readily available.
However, with these vast quantities of data available, a current problem now lies in the statistical and computational challenges of processing and analyzing these large amounts of data. Sankararaman and Talwalkar hope to utilize their expertise in computational genomics and machine learning in order to develop methods for performing statistical analyses on these large-scale genomic data sets.
“We’re very excited to be working on this,” Talwalkar said.
Sankararaman’s research interests focus on developing novel statistical models and algorithms to analyze large-scale genomic data, with the aim of understanding evolutionary processes and the genetic basis of complex phenotypes, for example, identifying how genetic changes affect risk for a disease.
Talwalkar’s main research interests include problems related to scalability and ease-of-use in the field of statistical machine learning, with applications in computational genomics.
While this research project is one of their first collaborations, Sankararaman and Talwalkar have known each other for a long time. In 2010, both were at UC Berkeley — Sankararaman as a fourth-year Ph.D. student and Talwalkar as a postdoctoral scholar. And both were advised by Michael I. Jordan, a leading researcher in machine learning, statistics and artificial intelligence.
Given the current bottleneck on the computation and analyses of these massive data sets, Sankararaman and Talwalkar’s research has enormous implications for facilitating current and future research efforts within the human genetics community.