PINNACLE Protein-Protein Interaction ToolΒΆ

OverviewΒΆ

The PINNACLE tool provides access to cell-type-specific protein-protein interaction embeddings. These embeddings capture functional relationships between proteins in different cellular contexts, enabling advanced analysis for drug discovery, disease research, and systems biology.

PINNACLE generates dense vector representations of proteins that encode both direct physical interactions and functional associations within specific cell types. This contextualization allows for more accurate modeling of biological processes in tissue-specific environments.

Data AcquisitionΒΆ

1. Download PINNACLE EmbeddingsΒΆ

The PINNACLE embeddings are hosted on Hugging Face at: https://huggingface.co/datasets/mims-harvard/ToolSpace

Use the following shell commands to download only the PINNACLE files from the pinnacle_cge directory:

# Install CLI if not already
uvx --from huggingface_hub hf

# Download only the pinnacle_cge folder
uvx --from huggingface_hub hf download mims-harvard/ToolSpace \
  --repo-type dataset \
  --include "pinnacle_cge/*" \
  --local-dir ./path/to/your/pinnacle/

2. Set Environment VariableΒΆ

After downloading, set the PINNACLE_DATA_PATH environment variable:

export PINNACLE_DATA_PATH="/path/to/ToolSpace"

Tool Input and OutputΒΆ

Input ParametersΒΆ

Parameter

Type

Required

Description

cell_type

string

Yes

Target cell type for embedding retrieval

embed_path

string

No

Custom path to embedding file (optional)

The tool performs fuzzy matching to handle various naming conventions, spaces, hyphens, and capitalization differences.

Output FormatΒΆ

The tool returns a JSON object with the following structure:

Successful ResponseΒΆ

{
  "embeddings": {
    "TP53": [0.1234, -0.5678, 0.9012, ...],
    "EGFR": [-0.2345, 0.6789, -0.1234, ...],
    "BRCA1": [0.3456, -0.7890, 0.2345, ...],
    "...": "..."
  },
  "context_info": [
    "Successfully retrieved embeddings for 15234 proteins/genes.",
    "Embedding dimensionality: 256 features per protein.",
    "Cell type context: b_cell (matched and processed)."
  ]
}

Embedding PropertiesΒΆ

  • Dimensionality: A 328-dimensional vector (200 structure-based protein representation + 128 contextaware/-free protein representation)

  • Coverage: 394,760 protein representations from 156 cell type contexts across 24 tissues

  • Format: Dense numerical vectors (list of floats)

MCP Server SetupΒΆ

PrerequisitesΒΆ

# create a uv virtual enviroment for COMPASS setup
uv venv pinnacle --python 3.10
source pinnacle/bin/activate
uv pip install -r requirements.txt

ConfigurationΒΆ

  1. Set up the environment:

# Ensure PINNACLE_DATA_PATH points to your ToolSpace directory
export PINNACLE_DATA_PATH="/path/to/ToolSpace"
  1. Verify embedding files exist:

ls -la $PINNACLE_DATA_PATH/pinnacle_embeds/ppi_embed_dict.pth
ls -la $PINNACLE_DATA_PATH/pinnacle_cge/

Running the MCP ServerΒΆ

# Run the MCP server
python pinnacle_tool.py

Server ConfigurationΒΆ

  • Host: 0.0.0.0 (accepts connections from any IP)

  • Port: 7001 (configured to avoid conflicts)

  • Transport: streamable-http

  • Mode: Stateless HTTP for scalability