Tutorials, Workshops, and Symposia

Research Tutorials

Machine Learning for Drug Development

Machine learning methods leverage big datasets to support decision-making in all stages of drug development, predict how drugs affect the human body and how they interact with each other, and seek ways to boost clinical trials and detect unwanted side effects. This tutorial covers generative modeling, reinforcement learning, and representation learning with a focus on theoretical foundations of methods and their use for key drug-related problems.

A variety of machine learning methods are demonstrating their utility at all stages of drug development. These methods use big datasets created from high-throughput screening data and allow prediction of bioactivities for targets and molecular properties, identification of new molecules and repurposing of old drugs with increased levels of accuracy.

We have only just begun to realize the potential of these techniques. If methods were available for all aspects of drug development, they could be used seamlessly to predict whether a chemical compound is likely to ultimately become a drug used in patients. Much research needs to be done before this vision can be realized, modern machine learning may have a fundamental impact on the way drug development is done.

The general process of drug development involves five steps. In short, molecular compounds are filtered through a progressive series of tests, which determine their properties, toxicity, and effectiveness for later stages. Machine learning is increasingly being used to accelerate each of the steps, creating opportunities for reducing resources and time needed to develop new drugs. In this tutorial, we cover key problems in drug development that are amenable to machine learning. In doing so, we present a toolbox of AI algorithms for end-to-end drug development.

This tutorial was presented at the International Joint Conference on Artificial Intelligence (IJCAI).

drugml-ijcai

Deep Learning for Network Biology

Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. This tutorial investigates key advancements in representation learning for networks over the last few years, with an emphasis on fundamentally new opportunities in network biology enabled by these advancements.

Biological networks are powerful resources for the discovery of interactions and emergent properties in biological systems, ranging from single-cell to population level. Network approaches have been used many times to combine and amplify signals from individual genes, and have led to remarkable discoveries in biology, including drug discovery, protein function prediction, disease diagnosis, and precision medicine. Furthermore, these approaches have shown broad utility in uncovering new biology and have contributed to new discoveries in wet laboratory experiments.

Mathematical machinery that is central to these approaches is machine learning on networks. The main challenge in machine learning on networks is to find a way to extract information about interactions between nodes and to incorporate that information into a machine learning model. To extract this information from networks, classic machine learning approaches often rely on summary statistics (e.g., degrees or clustering coefficients) or carefully engineered features to measure local neighborhood structures (e.g., network motifs). These classic approaches can be limited because these hand-engineered features are inflexible, they often do not generalize to networks derived from other organisms, tissues and experimental technologies, and can fail on datasets with low experimental coverage.

Recent years have seen a surge in graph neural network (GNN) approaches that automatically learn to encode network structure into low-dimensional representations, using transformation techniques based on deep learning and nonlinear dimensionality reduction. The idea behind these representation learning approaches is to learn a data transformation function that maps nodes to points in a low-dimensional vector space, also termed embeddings. Representation learning methods have revolutionized the state-of-the-art in network science and the goal of this tutorial is to open the door for these methods to computational biology and bioinformatics.

This tutorial was presented at the International Conference on Intelligent Systems for Molecular Biology (ISMB).

deepnetbio-ismb

Biomedical Data Fusion

Because of the complex and interconnected nature of biomedical systems, any single model trained on any single dataset can touch only a small part of the entire biomedical knowledge. It is thus critical to integrate diverse sources of information to gain a comprehensive understanding of the system.

New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches.

The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation.

This tutorial was presented at the International Engineering in Medicine and Biology Conference (EMBC) and at the Basel Compuational Biology Conference ([BC]^2).

biomedical-data-fusion


International Workshops

Representation Learning on Graphs and Manifolds

Many scientific fields study data with an underlying graph or manifold structure—such as social networks, sensor networks, biomedical knowledge graphs, and meshed surfaces in computer graphics. Recent years have seen a surge in research on these problems—often under the umbrella terms of graph representation learning and geometric deep learning.

The need for new optimization methods and neural network architectures that can accommodate these relational and non-Euclidean structures is becoming increasingly clear. In parallel, there is a growing interest in how we can leverage insights from these domains to incorporate new kinds of relational and non-Euclidean inductive biases into deep learning.

This workshop was presented at the International Conference on Learning Representations (ICLR).

decagon-architecture

Graph Representation Learning and Beyond

Recent years have seen a surge in research on graph representation learning, including techniques for deep graph embeddings, generalizations of CNNs to graph-structured data, and neural message-passing approaches. These advances in graph neural networks and related techniques have led to new state-of-the-art results in numerous domains: chemical synthesis, 3D-vision, recommender systems, question answering, continuous control, self-driving, and social network analysis.

This workshop was presented at the International Conference on Machine Learning (ICML).

Trustworthy AI for Healthcare

Artificial intelligence for healthcare has emerged as an active research area that has made considerable progress, including achieving human-level performance for skin cancer classification, diabetic eye disease detection, chest radiograph diagnosis, and sepsis treatment. While the trends are encouraging, many open challenges prevent us from directly deploying AI solutions in hospitals and clinical environments. A major open problem is the lack of trust of biomedical practitioners in AI methods. Many AI methods make predictions in a black-box way, making decisions challenging to understand and interpret. Further, today's methods are sensitive to small perturbations and adversarial attacks, raising numerous security and privacy concerns. Finally, AI methods learn to make decisions based on training data, which can include biased human decisions or reflect historical or social inequities. These challenges raise numerous trustworthy issues that we need to address to realize the potential of AI in healthcare.

This workshop was presented at the AAAI Conference on Artificial Intelligence (AAAI).


Research and Scholarship Meetings

PhD Forum

PhD Forum provides an environment for junior PhD students to exchange ideas and experiences with peers in an interactive atmosphere and to get constructive feedback from senior researchers in data science, machine learning, and related areas.

This meeting took place at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD).

Latest News

Sep 2020:   Four Papers Accepted at NeurIPS

Thrilled that our lab has 4 papers accepted at NeurIPS 2020! Congratulations to fantastic students and collaborators, Michelle, Xiang, Kexin, Sam, and Emily.

Sep 2020:   MITxHarvard Women in AI Interview

The MITxHarvard Women in AI initiative talked with Marinka about AI, machine learning, and the role of new technologies in biomedical research.

Aug 2020:   Trustworthy AI for Healthcare

We are excited to be co-organizing a workshop at AAAI 2021 on Trustworthy AI for Healthcare! We have a stellar lineup of speakers. Details to follow soon!

Aug 2020:   Network Drugs for COVID-19

What are network drugs? Drugs for COVID-19 predicted by network medicine, our graph neural networks (GNNs), and our rank aggregation algorithms, followed by experimental confirmations. The full paper is finally out!

Jul 2020:   Podcast on ML for Drug Development

Tune in to the podcast with Marinka about machine learning to drug development. The discussion focuses on open research questions in the field, including how to limit the search space of high-throughput screens, design drugs entirely from scratch, and identify likely side-effects of combining drugs in novel ways.

Jul 2020:   Postdoctoral Research Fellowship

We have a new opening for a postdoctoral research fellow in novel machine learning methods to combat COVID-19! Submit your application by September 1, 2020.

Jul 2020:   DeepPurpose Library

DeepPurpose is a deep learning library for drug-target interaction prediction and applications to drug repurposing and virtual screening.

Jun 2020:   Subgraph Neural Networks

Subgraph neural networks learn powerful subgraph representations that create fundamentally new opportunities for predictions beyond nodes, edges, and entire graphs.

Jun 2020:   Defense Against Adversarial Attacks

GNNGuard can defend graph neural networks against a variety of training-time attacks. Remarkably, GNNGuard can restore state-of-the-art performance of any GNN in the face of adversarial attacks.

Jun 2020:   Graph Meta Learning via Subgraphs

G-Meta is a meta-learning approach for graphs that quickly adapts to new prediction tasks using only a handful of data points. G-Meta works in most challenging, few-shot learning settings and scales to massive interactomics data as we show on our new Networks of Life dataset comprising of 1,840 networks.

May 2020:   The Open Graph Benchmark

A new paper introducing the Open Graph Benchmark, a diverse set of challenging and realistic benchmark datasets for graph machine learning.

May 2020:   Special Issue on AI for COVID-19

Marinka is co-editing a special issue of IEEE Big Data on AI for COVID-19. In light of the urgent need for data-driven solutions to mitigate the COVID-19 pandemic, the special issue will aim for a fast-track peer review.

May 2020:   Multiscale Interactome

May 2020:   Molecular Interaction Networks

A new preprint describing a graph neural network approach for the prediction of molecular interactions, including drug-drug, drug-target, protein-protein, and gene-disease interactions.

Apr 2020:   Submit to PhD Forum at ECML

The call for ECML-PKDD 2020 PhD Forum Track is now online. If you are a PhD student, submit your work on machine learning and knowledge discovery.

Apr 2020:   Drug Repurposing for COVID-19

We are excited to share our latest results on how networks and graph machine-learning help us search for a cure for COVID-19.

Mar 2020:   AI Cures

We are joining AI Cures initiative at MIT! We will develop machine learning methods for finding promising antiviral molecules for COVID-19 and other emerging pathogens.

Mar 2020:   COVID-19 Task Force

We are excited to be working with László Barabási and his amazing team of scientists as we search for a cure for COVID-19.

Mar 2020:   Graph ML Workshop at ICML 2020

We will co-organize a workshop on Graph Representation Learning and Beyond at ICML 2020. Submit your finest work!

Mar 2020:   Accepted Tutorial at IJCAI!

We will present a tutorial on Machine Learning for Drug Development at IJCAI 2020! Stay tuned for details.

Zitnik Lab  ·  Harvard  ·  Department of Biomedical Informatics