Feb 2025: MedTok: Unlocking Medical Codes for GenAI
Meet MedTok, a multimodal medical code tokenizer that transforms how AI understands structured medical data. By integrating textual descriptions and relational contexts, MedTok enhances tokenization for transformer-based models—powering everything from EHR foundation models to medical QA. [Project website]
Feb 2025: What If You Could Rewrite Biology? Meet CLEF
What if we could anticipate molecular and medical changes before they happen? Introducing CLEF, an approach for counterfactual generation in biological and medical sequence models. [Project website]
Feb 2025: Digital Twins as Global Health and Disease Models
New paper on the role of digital twins as global health and disease learning models for preventive and personalized medicine.
Jan 2025: LLM and KG+LLM agent papers at ICLR
New papers on test-time interventions in language models and knowledge graph based LLM agents accepted to ICLR. [KGARevion]
Jan 2025: Artificial Intelligence in Medicine 2
Excited to share our new graduate course on Artificial Intelligence in Medicine 2.
Jan 2025: ProCyon AI Highlighted by Kempner
Thanks to Kempner Institute for highlighting our latest research, ProCyon, our protein-text foundation model for modeling protein functions.
Jan 2025: AI Design of Proteins for Therapeutics
New Voices piece in Cell Systems: How will computational protein design change biotechnology and therapeutic development?
Dec 2024: Foundation Model for Protein Phenotypes
Dec 2024: Unified Clinical Vocabulary Embeddings
New paper: A unified resource provides a new representation of clinical knowledge by unifying medical vocabularies. (1) Phenotype risk score analysis across 4.57 million patients, (2) Inter-institutional clinician panels evaluate alignment with clinical knowledge across 90 diseases and 3,000 clinical codes.
Dec 2024: SPECTRA in Nature Machine Intelligence
Are biomedical AI models truly as smart as they seem? SPECTRA is a framework that evaluates models by considering the full spectrum of cross-split overlap: train-test similarity. SPECTRA reveals gaps in benchmarks for molecular sequence data across 19 models, including LLMs, GNNs, diffusion models, and conv nets.
Nov 2024: Ayush Noori Selected as a Rhodes Scholar
Congratulations to Ayush Noori on being named a Rhodes Scholar! Such an incredible achievement!
Nov 2024: PocketGen in Nature Machine Intelligence
Nov 2024: Biomedical AI Agents in Cell
Oct 2024: Activity Cliffs in Molecular Properties
Oct 2024: Knowledge Graph Agent for Medical Reasoning
New paper introducing a knowledge graph agent for complex, knowledge-intensive medical reasoning.
Sep 2024: Three Papers Accepted to NeurIPS
Exciting projects include a unified multi-task time series model, a flow-matching approach for generating protein pockets using geometric priors, and a tokenization method that produces invariant molecular representations for integration into large language models.
Sep 2024: TxGNN Published in Nature Medicine
Graph foundation model for drug repurposing published in Nature Medicine. [Harvard Gazette] [Harvard Medicine News] [Forbes] [NVIDIA] [Kempner Institute] [Harvard Crimson]
Aug 2024: Graph AI in Medicine
Excited to share a new perspective on Graph Artificial Intelligence in Medicine in Annual Reviews.
Aug 2024: How Proteins Behave in Context
Harvard Medicine News on our new AI tool that captures how proteins behave in context. Kempner Institute on how context matters for foundation models in biology.
Jul 2024: PINNACLE in Nature Methods
PINNACLE contextual AI model is published in Nature Methods. Paper. Research Briefing. Project website.
Jul 2024: Digital Twins as Global Health and Disease Models of Individuals
Paper on digitial twins outlining strategies to leverage molecular and computational techniques to construct dynamic digital twins on the scale of populations to individuals.
Jul 2024: Graph Diffusion Convolutions at ICML
Graph diffusion convolution is a geometric deep learning architecture that aggregates information from higher-order network neighbors through a generalized graph diffusion to enhance model robustness to noisy and incomplete datasets. Paper at ICML.
Jul 2024: Three Papers: TrialBench, 3D Structure Design, LLM Editing
Jun 2024: TDC-2: Multimodal Foundation for Therapeutics
The Commons 2.0 (TDC-2) is an overhaul of Therapeutic Data Commons to catalyze research in multimodal models for drug discovery by unifying single-cell biology of diseases, biochemistry of molecules, and effects of drugs through multimodal datasets, AI-powered API endpoints, new tasks and benchmarks. Our paper.
May 2024: Broad MIA: Protein Language Models
Check out our Broad’s seminars on Multimodal protein language models for deciphering protein function.
May 2024: On Knowing a Gene in Cell Systems
Apr 2024: Biomedical AI Agents
We envision ‘AI scientists’ as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate machine learning tools with experimental platforms.
Mar 2024: Efficient ML Seminar Series
We started a Harvard University Efficient ML Seminar Series. Congrats to Jonathan for spearheading this initiative. Harvard Magazine covered the first meeting focusing on LLMs.
Mar 2024: UniTS - Unified Time Series Model
UniTS is a unified time series model that can process classification, forecasting, anomaly detection and imputation tasks within a single model with no task-specific modules. UniTS has zero-shot, few-shot, and prompt learning capabilities. Project website.
Mar 2024: Weintraub Graduate Student Award
Michelle receives the 2024 Harold M. Weintraub Graduate Student Award. The award recognizes exceptional achievement in graduate studies in biological sciences. News Story. Congratulations!
Mar 2024: PocketGen - Generating Full-Atom Ligand-Binding Protein Pockets
PocketGen is a deep generative model that generates residue sequence and full-atom structure of protein pockets, maximizing binding to ligands. Project website.
Feb 2024: SPECTRA - Generalizability of Molecular AI
SPECTRA is an approach for holistic evaluation of how AI models generalize to new molecular datasets. Project website.
Feb 2024: Kaneb Fellowship Award
The lab receives the John and Virginia Kaneb Fellowship Award at Harvard Medical School to enhance research progress in the lab.
Feb 2024: NSF CAREER Award
The lab receives the NSF CAREER Award for our research in geometric deep learning to facilitate algorithmic and scientific advances in therapeutics.
Feb 2024: Dean’s Innovation Award in AI
Jan 2024: AI's Prospects in Nature Machine Intelligence
We discussed AI’s 2024 prospects with Nature Machine Intelligence, covering LLM progress, multimodal AI, multi-task agents, and how to bridge the digital divide across communities and world regions.
Jan 2024: ChatGPT and the Future of Science in Nature
Prof. Zitnik discussed the transformative role of ChatGPT and emerging AI models in advancing science and biomedical research with Nature.
Jan 2024: Combinatorial Therapeutic Perturbations
New paper introducing PDGrapher for combinatorial prediction of chemical and genetic perturbations using causally-inspired neural networks.
Nov 2023: Next Generation of Therapeutics Commons
We are building the next generation of Therapeutics Commons! We are seeking outstanding fellows who will lead AI research to advance molecular drug design and clinical drug development.
Oct 2023: Structure-Based Drug Design
Geometric deep learning has emerged as a valuable tool for structure-based drug design, to generate and refine biomolecules by leveraging detailed three-dimensional geometric and molecular interaction information.
Oct 2023: Graph AI in Medicine
Graph AI models in medicine integrate diverse data modalities through pre-training, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way to clinically meaningful predictions.
Sep 2023: New papers accepted at NeurIPS
Congratulations to Owen and Zaixi for having their papers accepted as spotlights at NeurIPS! These papers introduce techniques for explaining time series models through self-supervised learning and co-designing protein pocket sequences & 3D structures.
Sep 2023: Future Directions in Network Biology
Excited to share our perspectives on current and future directions in network biology.
Aug 2023: Scientific Discovery in the Age of AI
New paper on the role of artificial intelligence in scientific discovery is published in Nature.
Jul 2023: PINNACLE - Contextual AI protein model
PINNACLE is a contextual AI model for protein understanding that dynamically adjusts its outputs based on biological contexts in which it operates. Project website.
Jun 2023: Our Group is Joining the Kempner Institute
Excited to join Kempner’s inaugural cohort of associate faculty to advance Kempner’s mission of studying the intersection of natural and artificial intelligence.
Jun 2023: Welcoming a New Postdoctoral Fellow
An enthusiastic welcome to Shanghua Gao who is joining our group as a postdoctoral research fellow.
Jun 2023: On Pretraining in Nature Machine Intelligence
Excited to share our new study on language model pretraining and general-purpose methods for biological sequences. Project website.
May 2023: Congratulations to Ada and Michelle
Congrats to PhD student Michelle on being selected as the 2023 Albert J. Ryan Fellow and also to participate in the Heidelberg Laureate Forum. Congratulations to PhD student Ada for being selected as the Kempner Institute Graduate Fellow!
Apr 2023: Universal Domain Adaptation at ICML 2023
New paper introducing the first model for closed-set and universal domain adaptation on time series accepted at ICML 2023. Raincoat addresses feature and label shifts and can detect private labels. Project website.