PINNACLE Protein-Protein Interaction ToolΒΆ
OverviewΒΆ
The PINNACLE tool provides access to cell-type-specific protein-protein interaction embeddings. These embeddings capture functional relationships between proteins in different cellular contexts, enabling advanced analysis for drug discovery, disease research, and systems biology.
PINNACLE generates dense vector representations of proteins that encode both direct physical interactions and functional associations within specific cell types. This contextualization allows for more accurate modeling of biological processes in tissue-specific environments.
Data AcquisitionΒΆ
1. Download PINNACLE EmbeddingsΒΆ
The PINNACLE embeddings are hosted on Hugging Face at: https://huggingface.co/datasets/mims-harvard/ToolSpace
Use the following shell commands to download only the PINNACLE files from the pinnacle_cge
directory:
# Install CLI if not already
uvx --from huggingface_hub hf
# Download only the pinnacle_cge folder
uvx --from huggingface_hub hf download mims-harvard/ToolSpace \
--repo-type dataset \
--include "pinnacle_cge/*" \
--local-dir ./path/to/your/pinnacle/
2. Set Environment VariableΒΆ
After downloading, set the PINNACLE_DATA_PATH
environment variable:
export PINNACLE_DATA_PATH="/path/to/ToolSpace"
Tool Input and OutputΒΆ
Input ParametersΒΆ
Parameter |
Type |
Required |
Description |
---|---|---|---|
|
string |
Yes |
Target cell type for embedding retrieval |
|
string |
No |
Custom path to embedding file (optional) |
The tool performs fuzzy matching to handle various naming conventions, spaces, hyphens, and capitalization differences.
Output FormatΒΆ
The tool returns a JSON object with the following structure:
Successful ResponseΒΆ
{
"embeddings": {
"TP53": [0.1234, -0.5678, 0.9012, ...],
"EGFR": [-0.2345, 0.6789, -0.1234, ...],
"BRCA1": [0.3456, -0.7890, 0.2345, ...],
"...": "..."
},
"context_info": [
"Successfully retrieved embeddings for 15234 proteins/genes.",
"Embedding dimensionality: 256 features per protein.",
"Cell type context: b_cell (matched and processed)."
]
}
Embedding PropertiesΒΆ
Dimensionality: A 328-dimensional vector (200 structure-based protein representation + 128 contextaware/-free protein representation)
Coverage: 394,760 protein representations from 156 cell type contexts across 24 tissues
Format: Dense numerical vectors (list of floats)
MCP Server SetupΒΆ
PrerequisitesΒΆ
# create a uv virtual enviroment for COMPASS setup
uv venv pinnacle --python 3.10
source pinnacle/bin/activate
uv pip install -r requirements.txt
ConfigurationΒΆ
Set up the environment:
# Ensure PINNACLE_DATA_PATH points to your ToolSpace directory
export PINNACLE_DATA_PATH="/path/to/ToolSpace"
Verify embedding files exist:
ls -la $PINNACLE_DATA_PATH/pinnacle_embeds/ppi_embed_dict.pth
ls -la $PINNACLE_DATA_PATH/pinnacle_cge/
Running the MCP ServerΒΆ
# Run the MCP server
python pinnacle_tool.py
Server ConfigurationΒΆ
Host:
0.0.0.0
(accepts connections from any IP)Port:
7001
(configured to avoid conflicts)Transport:
streamable-http
Mode: Stateless HTTP for scalability