ESM Cambrian (ESMC) Protein Embedding Tool

Overview

The ESM Cambrian (ESMC) tool from EvolutionaryScale provides contextualized protein embeddings using protein language models. ESM-C generates 960-dimensional embeddings (mean-pooled across tokens) that capture information about protein structure and function.

Setup

Step 1: Install and Start the ESM Server

Prerequisites:

  • Python 3.10+

  • Sufficient disk space for model weights

Installation:

# Clone the ToolUniverse repository
git clone https://github.com/mims-harvard/ToolUniverse.git
cd ToolUniverse

# Create a virtual environment
uv venv esm --python 3.10
source esm/bin/activate

# Install ToolUniverse package and ESM dependencies
uv pip install -e .
uv pip install -r src/tooluniverse/remote/esm/requirements.txt

Start the server:

python src/tooluniverse/remote/esm/esm_tool.py

The server will start on port 8008. Leave it running in this terminal. If running locally, you are done here. If using a remote server, ask the server administrator to run these steps and provide you with the server’s IP address.

In a new terminal, navigate to the ToolUniverse directory and activate your virtual environment:

cd ToolUniverse  # Go back to the same ToolUniverse directory
source esm/bin/activate

Then follow one of the Usage Options below.

Usage Options

Option 1: Use ESM via LLM with MCP Support

Connect any LLM client that supports MCP by pointing it to ToolUniverse with your server location:

export ESM_MCP_SERVER_HOST=localhost  # or your server's IP if remote

Then configure your LLM client to use ToolUniverse as an MCP server with the ESM_MCP_SERVER_HOST environment variable set.

Example: Using with Claude Code

Here’s how to use ESM through Claude Code (an example of Option 1):

1. Add the MCP Server to Claude:

claude mcp add tooluniverse --env ESM_MCP_SERVER_HOST=$ESM_MCP_SERVER_HOST -- uvx tooluniverse

2. Start Claude and use the tool:

claude

Ask Claude:

Give me the embedding for the protein sequence: MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV

Claude will automatically use the esm_embed_sequence tool to generate the embedding.

Option 2: Use ESM via ToolUniverse Script (Direct Python)

Use ESM directly in Python scripts by setting the server location:

export ESM_MCP_SERVER_HOST=localhost  # or your server's IP if remote

Python script example:

from tooluniverse import ToolUniverse

tu = ToolUniverse()
tu.load_tools()

embedding = tu.run_one_function({
    "name": "esm_embed_sequence",
    "arguments": {
        "sequence": "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVV"
    }
})

print(embedding)

The tool will return:

{
  "model": "esmc_300m",
  "embedding_dim": 960,
  "embedding": [0.123, -0.456, 0.789, ...]
}

Advanced Configuration

Change Model Size

Edit get_client() in esm_tool.py:

def get_client():
    global _ESM_CLIENT
    if _ESM_CLIENT is None:
        _ESM_CLIENT = ESMC.from_pretrained("esmc_600m")  # or esmc_6b
        _ESM_CLIENT.eval()
    return _ESM_CLIENT

Change Server Port

Edit the @register_mcp_tool decorator in esm_tool.py:

mcp_config={"host": "0.0.0.0", "port": 8009}  # Change port number

Then update src/tooluniverse/data/mcp_auto_loader_esm.json:

{
  "server_url": "http://localhost:8009/mcp"
}

References

Citation

For information on how to cite ESM-C, please refer to the official EvolutionaryScale announcement and ESM repository.