Generalized Protein Pocket Generation with Prior-Informed Flow Matching - Zitnik Lab

Generalized Protein Pocket Generation with Prior-Informed Flow Matching

Designing ligand-binding proteins, such as enzymes and biosensors, is essential in bioengineering and protein biology. One critical step in this process involves designing protein pockets, the protein interface binding with the ligand. Current approaches to pocket generation often suffer from time-intensive physical computations or template-based methods, as well as compromised generation quality due to the overlooking of domain knowledge. To tackle these challenges, we propose PocketFlow, a generative model that incorporates protein-ligand interaction priors based on flow matching. During training, PocketFlow learns to model key types of protein-ligand interactions, such as hydrogen bonds. In the sampling, PocketFlow leverages multi-granularity guidance (overall binding affinity and interaction geometry constraints) to facilitate generating high-affinity and valid pockets. Experiments show that PocketFlow outperforms baselines on multiple benchmarks, e.g., achieving an average improvement of 1.29 in Vina Score and 0.05 in scRMSD. Moreover, modeling interactions make PocketFlow a generalized generative model across multiple ligand modalities, including small molecules, peptides, and RNA.

Publication

Generalized Protein Pocket Generation with Prior-Informed Flow Matching
Zaixi Zhang, Marinka Zitnik*, Qi Liu*
NeurIPS 2024 [NeurIPS Spotlight]

@article{zhang2024pocketflow,
  title={Generalized Protein Pocket Generation with Prior-Informed Flow Matching},
  author={Zhang, Zaixi and Zitnik, Marinka and Liu, Qi},
  journal={NeurIPS},
  url={https://arxiv.org/abs/2409.19520},
  year={2024}
}

Code Availability

Pytorch implementation of PocketFlow is available in the GitHub repository.

Authors

Latest News

Feb 2026: DNA-Conditioned Models of Single-Cell Perturbations

Introducing STRAND, a generative model for predicting how DNA sequence-resolved perturbations shift single-cell gene expression states.

Feb 2026: Context Switching AI in Nature Medicine

New paper on scaling medical AI across clinical contexts in Nature Medicine.

Jan 2026: Zoom-Out and Zoom-In Retrieval for LLMs

Much of the world’s knowledge lies outside public web text accessible to LLMs, including internal ontologies, curated catalogs, drug safety tables, patient health data, and lab knowledge bases. ARK helps an LLM to choose, one step at a time, whether to look broadly for relevant information or to dig deeper by following specific links in the data.

Jan 2026: AI Scientist for Therapeutic Discovery

New preprint on Medea, an omics AI agent for therapeutic discovery. [Project website]

Jan 2026: AI Scientists - LLMs Using Scientific Tools

Excited about this academic collaboration with Anthropic on adding connectors to ToolUniverse to make Claude even more powerful for scientific discovery.

Dec 2025: AI + Validation in Molecular, Organoid, and Clinical Systems

Pairing AI with experiments: Our latest PROTON AI generates neurological hypotheses that we validated across molecular, organoid, and clinical systems. [Project website]

Dec 2025: Greater than the Sum of Its Parts

New preprint on Greater than the Sum of Its Parts: Building Substructure into Protein Encoding Models. [Project website]

Dec 2025: Digital Twinning

A piece in Harvard Gazette on digital twins, cellular chatbots, and building digital twins at a cellular scale.

Dec 2025: Virtual Cells and Instruments

We are excited to meet hundreds of researchers attending our AI Virtual Cells and Instruments: A New Era in Drug Discovery and Development workshop at NeurIPS 2025.

Dec 2025: CUREBench

Excited to see 1,622 researchers from around the world entering our CUREBench Challenge with 398 participating teams that made 3,383 submissions to the competition and submitted 8,457,500+ AI reasoning traces for therapeutics. Join us at the Award Ceremony at NeurIPS.

Dec 2025: AI For Science at NeurIPS

Join us and hundreds of other scientists at the 6th AI for Science workshop at NeurIPS.

Nov 2025: Protein Structure Tokenization

New preprint introducing GeoBPE - Protein structure tokenization via geometric byte pair encoding.

Nov 2025: Generative AI Model for Spatial Biology

New preprint introducing CONCERT, a niche-aware generative model that predicts perturbation effects across spatial tissue contexts.

Nov 2025: AI Cell Models

A piece in Science explores how AI cell models could transform biomedicine (if they work as promised) and highlights ToolUniverse. ToolUniverse lets AI co-scientists test, analyze, and build on AI cell models.

Oct 2025: Is AI sycophancy holding science back?

A piece in Nature explores how AI sycophancy, in which models agree too much with users instead of reasoning on its own, could affect the use of AI in medical research.

Oct 2025: Our research featured by Kempner and Crimson

A news story about PDGrapher in Harvard Crimson. ToolUniverse featured on the Kempner Institute blog.

Oct 2025: A Scientist's Guide to AI Agents in Nature

A piece on AI agents in Nature highlights ongoing projects in our group, including methods for evaluating scientific hypotheses, challenges in benchmarking AI agents, and the open ToolUniverse ecosystem.

Sep 2025: ToolUniverse: AI Agents for Science and Medicine

New paper: ToolUniverse introduces an open ecosystem for building AI scientists with 600+ scientific and biomedical tools. Build your AI co-scientists at https://aiscientist.tools.

Sep 2025: Democratizing "AI Scientists" with ToolUniverse

Our new initiative: Use Tool Universe to build an AI scientist for yourself from any language or reasoning model, whether open or closed. https://aiscientist.tools

Sep 2025: InfEHR in Nature Communications

Collaboration with Ben and Girish on clinical phenotype resolution through deep geometric learning on electronic health records published in Nature Communications.

Tweets

Tweets by marinkazitnik

Zitnik Lab · Artificial Intelligence in Medicine and Science · Harvard · Department of Biomedical Informatics