Similarity search in knowledge graphs using meta paths

Relational data in biological systems – such as the cellular interactome, single cell similarity graphs, gene co-expression networks, and patient interaction networks – can be represented by graph structures. Biological networks are often comprised of diverse data modalities; thus, they are poorly modeled by homogenously typed networks. Instead, interconnected objects from various modalities can be represented as a single multigraph with heterogeneous knowledge-informed node and edge types. We develop metapaths, an R software package to implement meta paths and perform meta path-based similarity search in biological knowledge graphs.

Meta paths are a general graph-theoretic approach for flexible similarity search in large networks. While they are widely used in biomedical network analysis, there is currently no package available in R that would offer a wide range of support for meta paths.

Meta paths are sequences of node types that define a walk from the origin node to the destination node. Informative metapaths in knowledge graphs (KGs) are often engineered by hand based on domain knowledge or expertise (e.g., the meta path DRS is clinically meaningful, since it describes associations between a disease and the side effects of its treatments, whereas the meta path PSF would not be). Alternatively, optimal meta paths can be discovered in an unsupervised fashion by feature selection metrics (e.g., maximal spanning tree, Laplacian score, or ranking based on meta path frequency or uniqueness), among other approaches. Once informative meta paths for a given KG have been defined, these meta paths define the semantics of the relationships between nodes in the KG, enabling down-stream machine learning analyses such as link prediction, node classification, and subgraph prediction.

Although various algorithms exist to model meta path-based node simi-larities in a KG, a unifying framework is lacking to compute and compare these similarity scores. We introduce metapaths/ which introduces meta paths in the R ecosystem. The metapaths package enables the computation of meta-path-based similarity search in heterogeneous KGs.


metapaths: similarity search in heterogene-ous knowledge graphs via meta paths
Ayush Noori, Amelia L.M. Tan, Michelle M. Li, and Marinka Zitnik
Bioinformatics 2023

  title={metapaths: similarity search in heterogeneous knowledge graphs via meta paths},
  author={Noori, Ayush and Li, Michelle M and Tan, Amelia LM and Zitnik, Marinka},
  publisher={Oxford University Press}


Implementation in R together with documentation and examples of usage is available in the GitHub repository.


Latest News

Jul 2024:   Digital Twins as Global Health and Disease Models of Individuals

Paper on digitial twins outlining strategies to leverage molecular and computational techniques to construct dynamic digital twins on the scale of populations to individuals.

Jul 2024:   Three Papers: TrialBench, 3D Structure Design, LLM Editing

Jun 2024:   TDC-2: Multimodal Foundation for Therapeutics

The Commons 2.0 (TDC-2) is an overhaul of Therapeutic Data Commons to catalyze research in multimodal models for drug discovery by unifying single-cell biology of diseases, biochemistry of molecules, and effects of drugs through multimodal datasets, AI-powered API endpoints, new tasks and benchmarks. Our paper.

May 2024:   Broad MIA: Protein Language Models

Apr 2024:   Biomedical AI Agents

Mar 2024:   Efficient ML Seminar Series

We started a Harvard University Efficient ML Seminar Series. Congrats to Jonathan for spearheading this initiative. Harvard Magazine covered the first meeting focusing on LLMs.

Mar 2024:   UniTS - Unified Time Series Model

UniTS is a unified time series model that can process classification, forecasting, anomaly detection and imputation tasks within a single model with no task-specific modules. UniTS has zero-shot, few-shot, and prompt learning capabilities. Project website.

Mar 2024:   Weintraub Graduate Student Award

Michelle receives the 2024 Harold M. Weintraub Graduate Student Award. The award recognizes exceptional achievement in graduate studies in biological sciences. News Story. Congratulations!

Mar 2024:   PocketGen - Generating Full-Atom Ligand-Binding Protein Pockets

PocketGen is a deep generative model that generates residue sequence and full-atom structure of protein pockets, maximizing binding to ligands. Project website.

Feb 2024:   SPECTRA - Generalizability of Molecular AI

Feb 2024:   Kaneb Fellowship Award

The lab receives the John and Virginia Kaneb Fellowship Award at Harvard Medical School to enhance research progress in the lab.

Feb 2024:   NSF CAREER Award

The lab receives the NSF CAREER Award for our research in geometric deep learning to facilitate algorithmic and scientific advances in therapeutics.

Feb 2024:   Dean’s Innovation Award in AI

Jan 2024:   AI's Prospects in Nature Machine Intelligence

We discussed AI’s 2024 prospects with Nature Machine Intelligence, covering LLM progress, multimodal AI, multi-task agents, and how to bridge the digital divide across communities and world regions.

Jan 2024:   Combinatorial Therapeutic Perturbations

New paper introducing PDGrapher for combinatorial prediction of chemical and genetic perturbations using causally-inspired neural networks.

Nov 2023:   Next Generation of Therapeutics Commons

Oct 2023:   Structure-Based Drug Design

Geometric deep learning has emerged as a valuable tool for structure-based drug design, to generate and refine biomolecules by leveraging detailed three-dimensional geometric and molecular interaction information.

Oct 2023:   Graph AI in Medicine

Graph AI models in medicine integrate diverse data modalities through pre-training, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way to clinically meaningful predictions.

Zitnik Lab  ·  Artificial Intelligence in Medicine and Science  ·  Harvard  ·  Department of Biomedical Informatics