Machine Learning for Drug Development
A variety of machine learning methods are demonstrating their utility at all stages of drug development. These methods use big datasets created from high-throughput screening data and allow prediction of bioactivities for targets and molecular properties, identification of new molecules and repurposing of old drugs with increased levels of accuracy.
We have only just begun to realize the potential of these techniques. If methods were available for all aspects of drug development, they could be used seamlessly to predict whether a chemical compound is likely to ultimately become a drug used in patients. Much research needs to be done before this vision can be realized, modern machine learning may have a fundamental impact on the way drug development is done.
The general process of drug development involves five steps. In short, molecular compounds are filtered through a progressive series of tests, which determine their properties, toxicity, and effectiveness for later stages. Machine learning is increasingly being used to accelerate each of the steps, creating opportunities for reducing resources and time needed to develop new drugs. In this tutorial, we cover key problems in drug development that are amenable to machine learning. In doing so, we present a toolbox of AI algorithms for end-to-end drug development.
This tutorial was presented at the International Joint Conference on Artificial Intelligence (IJCAI).
Deep Learning for Network Biology
Biological networks are powerful resources for the discovery of interactions and emergent properties in biological systems, ranging from single-cell to population level. Network approaches have been used many times to combine and amplify signals from individual genes, and have led to remarkable discoveries in biology, including drug discovery, protein function prediction, disease diagnosis, and precision medicine. Furthermore, these approaches have shown broad utility in uncovering new biology and have contributed to new discoveries in wet laboratory experiments.
Mathematical machinery that is central to these approaches is machine learning on networks. The main challenge in machine learning on networks is to find a way to extract information about interactions between nodes and to incorporate that information into a machine learning model. To extract this information from networks, classic machine learning approaches often rely on summary statistics (e.g., degrees or clustering coefficients) or carefully engineered features to measure local neighborhood structures (e.g., network motifs). These classic approaches can be limited because these hand-engineered features are inflexible, they often do not generalize to networks derived from other organisms, tissues and experimental technologies, and can fail on datasets with low experimental coverage.
Recent years have seen a surge in graph neural network (GNN) approaches that automatically learn to encode network structure into low-dimensional representations, using transformation techniques based on deep learning and nonlinear dimensionality reduction. The idea behind these representation learning approaches is to learn a data transformation function that maps nodes to points in a low-dimensional vector space, also termed embeddings. Representation learning methods have revolutionized the state-of-the-art in network science and the goal of this tutorial is to open the door for these methods to computational biology and bioinformatics.
This tutorial was presented at the International Conference on Intelligent Systems for Molecular Biology (ISMB).
Biomedical Data Fusion
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches.
The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation.
This tutorial was presented at the International Engineering in Medicine and Biology Conference (EMBC) and at the Basel Compuational Biology Conference ([BC]^2).
International Workshops and Conferences
AI for Science
Gap 1: Unrealistic methodological assumptions. While ML researchers strive for methodology advances, they often make unrealistic assumptions that limit real-world adoption. For example, most state-of-the-art molecule generation ML models generate molecules that have low synthesizability.
Gap 2: Overlooked scientific questions. Scientific communities contend with crucial and unsolved problems, but they are not yet formulated as solvable ML tasks and are thus overlooked by the ML community.
Gap 3: Limited exploration at the intersection of multiple disciplines. Solutions to grand challenges often stretch across multiple disciplines. For example, protein structure prediction requires collaboration across physics, chemistry and biology.
Gap 4: Science of science. Core principles of the scientific method have not changed since the 17th century. Can AI reason about the organizing principles of our world in a way that is complementary to the hypothesis-experiment cycle to understand a phenomenon?
Gap 5: Responsible use and development of AI for science. Interest in ML across scientific disciplines has surged, but few ML models have transitioned into practical scientific applications. We plan to present a roadmap and ultimately guidelines for accelerating the translation of ML in science. Translation requires a team of engaged stakeholders and a systematic process from the beginning (problem formulation) to the end (widespread deployment) of ML-based research lifecycle.
This workshop was presented at the International Conference on Neural Information Processing Systems (NeurIPS).
National Symposium on Drug Repurposing for Future Pandemics
The symposium brings together leading experts in computer science, biology, statistics, medicine, automation, and regulation. While these areas of expertise are necessary for rapid therapeutic innovation, there is seldom an opportunity for these experts to interact with each other.
Bearing in mind new opportunities and pressing challenges, the symposium provides a roadmap and put forward recommendations on transforming today’s tools into ready-to-use solutions to fight future pathogens.
Representation Learning on Graphs and Manifolds
The need for new optimization methods and neural network architectures that can accommodate these relational and non-Euclidean structures is becoming increasingly clear. In parallel, there is a growing interest in how we can leverage insights from these domains to incorporate new kinds of relational and non-Euclidean inductive biases into deep learning.
This workshop was presented at the International Conference on Learning Representations (ICLR).
Graph Representation Learning and Beyond
This workshop was presented at the International Conference on Machine Learning (ICML).
Trustworthy AI for Healthcare
This workshop was presented at the AAAI Conference on Artificial Intelligence (AAAI).
AI in Health: Transferring and Integrating Knowledge for Better Health
This workshop was presented at the Web Conference (WWW).
Research and Scholarship Meetings
This meeting took place at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD).