Researchers studying the complex mechanisms of gene regulation involved in healthy and disordered biological processes now have a new tool in their toolkit. Researchers at the University of California, San Diego (UCSD) and elsewhere have developed deep learning software that they say can be adapted to work in various genomics projects. Details of the software, called Genomic Elements with Neural Networks or EUGENe, are provided in Nature Computational Science in a paper entitled “Predictive analysis of regulatory sequences with EUGENe”.
According to the article, EUGENe includes various modules and sub-packages for extracting and transforming sequence data, instantiating and training computational models, and evaluating and interpreting model behavior after training. “The main goal of EUGENe is to streamline the end-to-end execution of these three steps to promote the efficient design, implementation, validation and interpretation of deep learning solutions in regulatory genomics” , the scientists wrote.
Deep learning is certainly not new to the genomics community. As an example, the technology has been successfully used to detect DNA and RNA protein binding motifs and to make predictions about chromatin states and transcriptional activity. But designing and deploying deep learning-based workflows for genomic studies has always been a challenge, even for experienced researchers. This is at least in part because “the nuances specific to genomic data create a particularly high learning curve for performing analyzes in this space.” Furthermore, the heterogeneity of implementations of most of the codes associated with publications greatly hinders extensibility and reproducibility,” the authors write.
Adam Klie, a doctoral student at UCSD School of Medicine and first author of the study, designed the software to alleviate challenges he also encountered in his own work. “Many existing platforms require many hours of coding and data processing to use,” he said. EUGENe is much simpler to use. “You give an algorithm a DNA sequence and ask it to make predictions about whatever you expect the DNA to be, such as whether a particular DNA sequence is functional or whether it regulates a gene in a certain context biological. » Scientists can use the software to explore the different properties of the sequence in question and what happens when things are changed.
Researchers put EUGENe to the test by attempting to replicate the results of three regulatory genomic studies using different types of sequencing data. These datasets come from a plant promoter assay, RNA binding protein specificity data, and ChIP sequencing data from the ENCODE project. Analyzing different types of data typically requires mixing and matching multiple technology platforms. However, the scientists managed to adapt EUGENe to each type of data and reproduce the results of each study.
The ability to perform this type of reproducible analysis is essential in scientific research, but can prove difficult for studies using deep learning, noted Hannah Carter, PhD, associate professor at the University’s School of Medicine. UCSD and one of the authors of the article. “EUGENe is already showing great promise in terms of adaptability to different types of DNA sequencing data and support for many different deep learning models. We hope that it will evolve into a platform capable of supporting the development of collaborative tools by the research community and accelerating genomics research.
For now, the solution works with DNA and RNA data but “does not have dedicated functions for handling protein sequences or multimodal inputs,” the researchers wrote. They plan to expand it to include new types of data such as single-cell sequencing.
They will also make the solution more widely accessible to the scientific community. “Deep learning can provide valuable insights into the biological machinery behind this variety, but it can be difficult to implement for researchers without deep computational expertise,” Carter said. “We wanted to create a platform that could help genomics researchers streamline their analysis of deep learning data to make predictions from raw data. »