Similarity search in knowledge graphs using meta paths

Relational data in biological systems – such as the cellular interactome, single cell similarity graphs, gene co-expression networks, and patient interaction networks – can be represented by graph structures. Biological networks are often comprised of diverse data modalities; thus, they are poorly modeled by homogenously typed networks. Instead, interconnected objects from various modalities can be represented as a single multigraph with heterogeneous knowledge-informed node and edge types. We develop metapaths, an R software package to implement meta paths and perform meta path-based similarity search in biological knowledge graphs.

Meta paths are a general graph-theoretic approach for flexible similarity search in large networks. While they are widely used in biomedical network analysis, there is currently no package available in R that would offer a wide range of support for meta paths.

Meta paths are sequences of node types that define a walk from the origin node to the destination node. Informative metapaths in knowledge graphs (KGs) are often engineered by hand based on domain knowledge or expertise (e.g., the meta path DRS is clinically meaningful, since it describes associations between a disease and the side effects of its treatments, whereas the meta path PSF would not be). Alternatively, optimal meta paths can be discovered in an unsupervised fashion by feature selection metrics (e.g., maximal spanning tree, Laplacian score, or ranking based on meta path frequency or uniqueness), among other approaches. Once informative meta paths for a given KG have been defined, these meta paths define the semantics of the relationships between nodes in the KG, enabling down-stream machine learning analyses such as link prediction, node classification, and subgraph prediction.

Although various algorithms exist to model meta path-based node simi-larities in a KG, a unifying framework is lacking to compute and compare these similarity scores. We introduce metapaths/ which introduces meta paths in the R ecosystem. The metapaths package enables the computation of meta-path-based similarity search in heterogeneous KGs.

Publication

metapaths: similarity search in heterogene-ous knowledge graphs via meta paths
Ayush Noori, Amelia L.M. Tan, Michelle M. Li, and Marinka Zitnik
Bioinformatics 2023

@article{noori2023metapaths,
  title={metapaths: similarity search in heterogeneous knowledge graphs via meta paths},
  author={Noori, Ayush and Li, Michelle M and Tan, Amelia LM and Zitnik, Marinka},
  journal={Bioinformatics},
  pages={btad297},
  year={2023},
  publisher={Oxford University Press}
}

Code

Implementation in R together with documentation and examples of usage is available in the GitHub repository.

Authors

Latest News

Oct 2025:   Is AI sycophancy holding science back?

A piece in Nature explores how AI sycophancy, in which models agree too much with users instead of reasoning on its own, could affect the use of AI in medical research.

Oct 2025:   Our research featured by Kempner and Crimson

A news story about PDGrapher in Harvard Crimson. ToolUniverse featured on the Kempner Institute blog.

Oct 2025:   A Scientist's Guide to AI Agents in Nature

A piece on AI agents in Nature highlights ongoing projects in our group, including methods for evaluating scientific hypotheses, challenges in benchmarking AI agents, and the open ToolUniverse ecosystem.

Sep 2025:   ToolUniverse: AI Agents for Science and Medicine

New paper: ToolUniverse introduces an open ecosystem for building AI scientists with 600+ scientific and biomedical tools. Build your AI co-scientists at https://aiscientist.tools.

Sep 2025:   Democratizing "AI Scientists" with ToolUniverse

Our new initiative: Use Tool Universe to build an AI scientist for yourself from any language or reasoning model, whether open or closed. https://aiscientist.tools

Sep 2025:   InfEHR in Nature Communications

Collaboration with Ben and Girish on clinical phenotype resolution through deep geometric learning on electronic health records published in Nature Communications.

Sep 2025:   PDGrapher in Nature Biomedical Engineering

New paper in Nature Biomedical Engineering introducing PDGrapher, a model for phenotype-based target discovery. [Harvard Medicine News]

Sep 2025:   AI and Net Medicine: Path to Precision Medicine

Aug 2025:   CUREBench - Reasoning for Therapeutics

Update from CUREBench: 650+ entrants, 100+ teams and 500+ submissions. Thank you to the CUREBench community. Working on AI for drug discovery and reasoning in medicine? New teams welcome. Tasks, rules, and leaderboard: https://curebench.ai.

Aug 2025:   Drug Discovery Workshop at NeurIPS 2025

Excited to organize a NeurIPS workshop on Virtual Cells and Digital Instruments. Submit your papers.

Aug 2025:   AI for Science Workshop at NeurIPS

Excited to organize a NeurIPS workshop on AI for Science. This is our 6th workshop in the AI for Science series. Submit your papers.

Jul 2025:   Launching CUREBench

Launched CUREBench, the first competition in AI reasoning for therapeutics. Colocated with NeurIPS 2025. Start at https://curebench.ai.

Jul 2025:   Launching TxAgent Evaluation Portal

Launched TxAgent evaluation portal, our global evaluation of AI for drug decision-making and therapeutic reasoning. Participate in TxAgent evaluations! [TxAgent project]

Jul 2025:   SPATIA Model of Spatial Cell Phenotypes

Jul 2025:   AI-Enabled Drug Discovery Reaches Clinical Milestone

Jun 2025:   Knowledge Tracing for Biomedical AI Education

New preprint on biologically inspired architecture for knowledge tracing. The study on the use of generative AI in education with prospective evaluation of knowledge tracing in the classroom.

Jun 2025:   Few shot learning for rare disease diagnosis

Jun 2025:   One Patient, Many Contexts: Scaling Medical AI

Jun 2025:   ToolUniverse - 211+ Tools for "AI Scientist" Agents

ToolUniverse now offers access to over 211 cutting-edge biological and medical tools, all integrated with Model Context Protocol (MCP). Any “AI Scientist” agent can tap into these tools for biomedical research. [Tutorial] [ToolUniverse] [TxAgent]

May 2025:   What Perturbation Can Reverse Disease Effects?

In press at Nature Biomedical Engineering: PDGrapher AI predicts chemicals to reverse disease phenotypic effects — with applications to drug target identification.

Zitnik Lab  ·  Artificial Intelligence in Medicine and Science  ·  Harvard  ·  Department of Biomedical Informatics