Esm Tools

Configuration File: esm_tools.json Tool Type: Local Tools Count: 10

This page contains all tools defined in the esm_tools.json configuration file.

Available Tools

ESM_describe_sae_feature (Type: ESMTool)

Label a single SAE feature_id with its dominant biological category by aggregating UniProt featur…

ESM_describe_sae_feature tool specification

Tool Information:

  • Name: ESM_describe_sae_feature

  • Type: ESMTool

  • Description: Label a single SAE feature_id with its dominant biological category by aggregating UniProt feature-type overlap at SAE-activating residues across a curated 10-protein panel. Returns one of: catalytic, ligand-binding, ptm, domain, motif, structural-stability, secondary-structure, transmembrane, signal-peptide, propeptide, or ‘uncategorized’ (no informative UniProt overlap found). Use when ESM_get_sae_features / ESM_score_variant_sae_disruption returned a feature_id that needs biological interpretation. Cost: 10 Forge credits FIRST time (1 per panel protein). Subsequent calls for the same feature_id are FREE (filesystem cache at ~/.cache/tooluniverse/sae_labels/{sae_model}/feature_{id}.json). Latency: ~30-60s first call (panel of 10 SAE inferences), <1s on cache hit. Prerequisites: same as ESM_get_sae_features. License: Cambrian Inference License — non-commercial.

Parameters:

  • feature_id (integer) (required) SAE feature index in [0, 16383]. Use the values returned by ESM_get_sae_features.active_features[].feature_id.

  • sae_model (string) (optional) SAE checkpoint to label. Cache is keyed on this — different SAEs produce different labels for the same id.

  • model (string) (optional) ESMC backbone.

  • n_proteins (integer) (optional) Number of panel proteins to run SAE on. Default 10 (the full curated panel). Lower for cheaper exploration at the cost of confidence.

  • top_residues_per_protein (integer) (optional) For each protein, take the top-K residues where the target feature activates most strongly, then check their UniProt annotations. Default 3.

  • use_cache (boolean) (optional) If true (default) and a cached label exists at ~/.cache/tooluniverse/sae_labels/…/feature_{id}.json, return it instead of rerunning the panel. Set false to force a recompute.

Example Usage:

query = {
    "name": "ESM_describe_sae_feature",
    "arguments": {
        "feature_id": 10
    }
}
result = tu.run(query)

ESM_explain_variant_mechanism (Type: ESMTool)

One-call composite for variant mechanism: runs ESMC-6B SAE variant disruption + describe_sae_feat…

ESM_explain_variant_mechanism tool specification

Tool Information:

  • Name: ESM_explain_variant_mechanism

  • Type: ESMTool

  • Description: One-call composite for variant mechanism: runs ESMC-6B SAE variant disruption + describe_sae_feature on each top affected feature + composes a category-level mechanism summary (e.g. ‘Disrupted feature categories (lost): catalytic=2, ligand-binding=1’). Use when a variant-interpretation skill wants the mechanism in a single tool call rather than orchestrating disruption + N description calls. Set include_descriptions=false to skip the category labeling step (2 Forge calls only). Otherwise: 2 disruption calls + up to (2 * top_k_features) describe calls (most cached after first use). Prerequisites: pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’; set ESM_API_KEY. Outputs governed by EvolutionaryScale Cambrian Inference License — non-commercial use only unless covered by a separate commercial agreement.

Parameters:

  • sequence (string) (required) Reference (wild-type) protein amino acid sequence in single-letter code. Up to ~2700 AA.

  • position (integer) (required) 1-indexed residue position of the variant

  • ref_aa (string) (required) Single-letter wild-type amino acid (must match sequence[position-1])

  • alt_aa (string) (required) Single-letter substituted amino acid

  • window (integer) (optional) Residue window centered on the mutation for activation summation.

  • top_k_features (integer) (optional) Number of top lost / top gained features to describe and include in the category summary.

  • include_descriptions (boolean) (optional) If true (default), call ESM_describe_sae_feature on each top feature to get category labels. If false, return raw feature IDs without categories (cheaper).

  • model (string) (optional) ESMC base model

  • sae_model (string) (optional) SAE codebook identifier

Example Usage:

query = {
    "name": "ESM_explain_variant_mechanism",
    "arguments": {
        "sequence": "example_value",
        "position": 10,
        "ref_aa": "example_value",
        "alt_aa": "example_value"
    }
}
result = tu.run(query)

ESM_fold_protein (Type: ESMTool)

Predict protein 3D structure from sequence using ESM3, returning pTM (predicted TM-score), per-re…

ESM_fold_protein tool specification

Tool Information:

  • Name: ESM_fold_protein

  • Type: ESMTool

  • Description: Predict protein 3D structure from sequence using ESM3, returning pTM (predicted TM-score), per-residue pLDDT confidence scores, and backbone atom coordinates. ESM3 performs structure generation via iterative structure track decoding. pTM > 0.7 indicates confident prediction. Coordinates are (L, 37, 3) backbone atom positions in Angstroms (standard 37-atom representation). Use for: rapid structure prediction, confidence-based filtering, structure-guided design. Prerequisites: pip install esm; set ESM_API_KEY (from https://forge.evolutionaryscale.ai).

Parameters:

  • sequence (string) (required) Protein amino acid sequence in single-letter code to fold (e.g. ‘MKTAYIAKQRQISFVKSHFSRQ’).

  • model (string) (optional) ESM3 model to use for structure prediction.

  • num_steps (integer) (optional) Number of iterative structure decoding steps (default: 8). More steps may improve accuracy.

Example Usage:

query = {
    "name": "ESM_fold_protein",
    "arguments": {
        "sequence": "example_value"
    }
}
result = tu.run(query)

ESM_generate_protein_sequence (Type: ESMTool)

Generate or complete a protein sequence using ESM3, EvolutionaryScale’s generative protein langua…

ESM_generate_protein_sequence tool specification

Tool Information:

  • Name: ESM_generate_protein_sequence

  • Type: ESMTool

  • Description: Generate or complete a protein sequence using ESM3, EvolutionaryScale’s generative protein language model. Provide a prompt_sequence with ‘_’ characters marking positions to generate (masked positions), and ESM3 will fill them in via iterative masked language modeling. Can generate entire sequences from scratch (all ‘_’) or complete partial sequences. Use for: protein engineering, de novo protein design, completing truncated sequences, exploring sequence space around a template. Prerequisites: pip install esm; set ESM_API_KEY (from https://forge.evolutionaryscale.ai). Example: ‘MKTAY_____QRQIS’ generates 5 residues in the masked region.

Parameters:

  • prompt_sequence (string) (required) Protein sequence template with ‘_’ at positions to generate. Use standard amino acid letters for fixed positions and ‘_’ for masked/generated positions. Example: ‘MKTAY_____QRQISFVK’ generates 5 residues in the middle.

  • model (string) (optional) ESM3 model to use for generation.

  • num_steps (integer) (optional) Number of iterative decoding steps (default: 8). More steps = slower but potentially better quality.

  • temperature (number) (optional) Sampling temperature (default: 1.0). Lower values (0.1-0.5) produce more conservative sequences; higher values (1.0-2.0) more diverse.

Example Usage:

query = {
    "name": "ESM_generate_protein_sequence",
    "arguments": {
        "prompt_sequence": "example_value"
    }
}
result = tu.run(query)

ESM_get_protein_embedding (Type: ESMTool)

Get protein sequence embeddings from EvolutionaryScale ESMC (ESM Cambrian) models via the Forge A…

ESM_get_protein_embedding tool specification

Tool Information:

  • Name: ESM_get_protein_embedding

  • Type: ESMTool

  • Description: Get protein sequence embeddings from EvolutionaryScale ESMC (ESM Cambrian) models via the Forge API. Returns a mean-pooled embedding vector (320-dim for esmc-300m, 1152-dim for esmc-600m) representing the entire protein, and optionally per-residue embeddings. ESMC is a fast, high-quality protein language model suitable for downstream tasks: similarity search, clustering, property prediction. Prerequisites: pip install esm; set ESM_API_KEY environment variable (token from https://forge.evolutionaryscale.ai). Use for: encoding protein sequences for ML, computing protein-protein similarity, featurizing sequences.

Parameters:

  • sequence (string) (required) Protein amino acid sequence in single-letter code (e.g. ‘MKTAYIAKQRQISFVKSHFSRQ’). Standard 20 amino acids; avoid gaps or non-standard characters.

  • model (string) (optional) ESMC model to use. ‘esmc-300m-2024-12’ (faster, 300M params) or ‘esmc-600m-2024-12’ (more accurate, 600M params).

  • return_per_residue (boolean) (optional) If true, also return per-residue embedding vectors (one vector per amino acid). Default false to reduce response size.

Example Usage:

query = {
    "name": "ESM_get_protein_embedding",
    "arguments": {
        "sequence": "example_value"
    }
}
result = tu.run(query)

ESM_get_region_sae_features (Type: ESMTool)

Aggregate ESMC-6B SAE features over a contiguous residue range to characterize the region’s biolo…

ESM_get_region_sae_features tool specification

Tool Information:

  • Name: ESM_get_region_sae_features

  • Type: ESMTool

  • Description: Aggregate ESMC-6B SAE features over a contiguous residue range to characterize the region’s biological identity. Returns features ranked by total |activation| across the region, with each feature’s per-residue hit pattern. Use for: interpreting a known domain (residues X-Y), profiling an antibody epitope, characterizing a binding pocket, comparing two regions of the same protein, or seeding ESM_describe_sae_feature calls on the top-K to get category labels. Max region length 500 residues. Prerequisites: pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’; set ESM_API_KEY. Outputs governed by EvolutionaryScale Cambrian Inference License — non-commercial use only unless covered by a separate commercial agreement.

Parameters:

  • sequence (string) (required) Protein amino acid sequence in single-letter code. Up to ~2700 AA.

  • start_position (integer) (required) 1-indexed inclusive start of the region of interest.

  • end_position (integer) (required) 1-indexed inclusive end of the region of interest. Must be >= start_position and within sequence length. Region length cap: 500.

  • top_k_features (integer) (optional) Number of top features (by total |activation| over region) to return.

  • model (string) (optional) ESMC base model

  • sae_model (string) (optional) SAE codebook identifier

Example Usage:

query = {
    "name": "ESM_get_region_sae_features",
    "arguments": {
        "sequence": "example_value",
        "start_position": 10,
        "end_position": 10
    }
}
result = tu.run(query)

ESM_get_sae_features (Type: ESMTool)

Run a protein sequence through an ESMC Sparse Autoencoder (SAE) and return sparse feature activat…

ESM_get_sae_features tool specification

Tool Information:

  • Name: ESM_get_sae_features

  • Type: ESMTool

  • Description: Run a protein sequence through an ESMC Sparse Autoencoder (SAE) and return sparse feature activations per residue. SAEs decompose the protein language model’s hidden states into a 16,384-feature sparse codebook (top-k=64 active per residue). Each feature is an interpretable latent — many activate on specific biological categories (catalytic site, ligand-binding region, PTM sequon, structural motif, etc.) that can be looked up via ESM_describe_sae_feature. Use for: variant interpretation (compare ref vs mutant activations), mechanism inference (which functional features the protein engages), protein-LM interpretability research. Prerequisites: (1) pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’ — the PyPI esm release does NOT yet include SAE support, which lives on an upstream feature branch. (2) ESM_API_KEY env var with a Forge token (https://forge.evolutionaryscale.ai). License note: SAE outputs are governed by the EvolutionaryScale Cambrian Inference Clickthrough License — non-commercial / academic use only.

Parameters:

  • sequence (string) (required) Protein amino acid sequence in single-letter code (e.g. ‘MEEPQSDPSVEPPLSQETFSDLWKLLPENN’). Standard 20 amino acids; avoid gaps or non-standard characters. Practical length cap ~2,700 AA.

  • model (string) (optional) ESMC backbone model. Currently SAE is only released for esmc-6b-2024-12 (default).

  • sae_model (string) (optional) SAE checkpoint name. Default is the layer-60 SAE matching esmc-6b backbone.

  • position ([‘integer’, ‘null’]) (optional) Optional 1-indexed residue position. If set, only activations within +/- window residues of this position are returned (use for variant analysis to keep response compact). If null, all residues are returned (large response for long sequences).

  • window (integer) (optional) Residue window radius around position (only used when position is set). Default 8 = +/- 8 residues = 17-residue window centered on the variant.

  • top_k_per_residue (integer) (optional) Cap features returned per residue, sorted by absolute activation. Default 64 = full sparsity (no cap). Set lower (e.g. 16) to reduce response size when no position filter is used.

Example Usage:

query = {
    "name": "ESM_get_sae_features",
    "arguments": {
        "sequence": "example_value"
    }
}
result = tu.run(query)

ESM_score_sequence (Type: ESMTool)

Score a protein sequence using ESMC logits to compute per-residue log-probabilities and mean pseu…

ESM_score_sequence tool specification

Tool Information:

  • Name: ESM_score_sequence

  • Type: ESMTool

  • Description: Score a protein sequence using ESMC logits to compute per-residue log-probabilities and mean pseudo-log-likelihood. Higher (less negative) mean log-probability indicates the sequence is more natural/likely according to the model’s learned protein distribution. Use for: evaluating mutant fitness, comparing sequence variants, identifying low-confidence residues. Mean log-probability near 0 is highest confidence; values below -4 suggest unusual or low-quality residues. Prerequisites: pip install esm; set ESM_API_KEY (from https://forge.evolutionaryscale.ai).

Parameters:

  • sequence (string) (required) Protein amino acid sequence in single-letter code to score (e.g. ‘MKTAYIAKQRQISFVKSHFSRQ’).

  • model (string) (optional) ESMC model to use for scoring.

Example Usage:

query = {
    "name": "ESM_score_sequence",
    "arguments": {
        "sequence": "example_value"
    }
}
result = tu.run(query)

ESM_score_variant_sae_batch (Type: ESMTool)

Score many missense variants against one reference protein using ESMC-6B SAE. More Forge-efficien…

ESM_score_variant_sae_batch tool specification

Tool Information:

  • Name: ESM_score_variant_sae_batch

  • Type: ESMTool

  • Description: Score many missense variants against one reference protein using ESMC-6B SAE. More Forge-efficient than calling ESM_score_variant_sae_disruption N times: runs the reference SAE once and reuses it across every variant, giving N+1 Forge calls instead of 2N. Use for saturation mutagenesis at a single position (all 19 alternative residues), DMS-style sweeps, clinical variant panels, or any workflow scoring multiple variants on the same protein. Max 100 variants per call. Prerequisites: pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’ (SAE support is on an unmerged feature branch — PyPI esm 3.2.x does NOT include SAEConfig). Set ESM_API_KEY environment variable. Outputs governed by EvolutionaryScale Cambrian Inference License — non-commercial use only unless covered by a separate commercial agreement.

Parameters:

  • sequence (string) (required) Reference (wild-type) protein amino acid sequence in single-letter code. Up to ~2700 AA.

  • variants (array) (required) List of variants to score. Each variant: {position: 1-indexed int, ref_aa: single letter matching sequence[position-1], alt_aa: single letter substitution}.

  • window (integer) (optional) Residue window centered on each variant for activation summation. Default 8 (i.e. positions [pos-8, pos+8]).

  • top_k_features (integer) (optional) Number of top features to return per variant (top-K lost + top-K gained).

  • model (string) (optional) ESMC base model (must be the 6B model — SAEs are trained against it)

  • sae_model (string) (optional) SAE codebook identifier

Example Usage:

query = {
    "name": "ESM_score_variant_sae_batch",
    "arguments": {
        "sequence": "example_value",
        "variants": ["item1", "item2"]
    }
}
result = tu.run(query)

ESM_score_variant_sae_disruption (Type: ESMTool)

Composite SAE-based variant scoring. Given a protein sequence and a missense variant (position + …

ESM_score_variant_sae_disruption tool specification

Tool Information:

  • Name: ESM_score_variant_sae_disruption

  • Type: ESMTool

  • Description: Composite SAE-based variant scoring. Given a protein sequence and a missense variant (position + ref_aa + alt_aa), runs ESMC-6B SAE on both reference and mutant sequences, computes per-feature activation deltas summed over a window centered on the mutation site, and returns ranked top-K features LOST and GAINED in the mutant. This is the convenience layer over ESM_get_sae_features — use this for one-shot variant interpretation; use ESM_get_sae_features directly only when you need raw per-residue features for non-variant analyses. Validates ref_aa matches the position in the supplied sequence (returns clear error if not, so you can detect transcript / isoform mismatches). Cost: 2 Forge API credits (1 ref + 1 mut). Latency: ~3-6s for typical-length human proteins. Prerequisites: same as ESM_get_sae_features (pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’, ESM_API_KEY env var). License: SAE outputs governed by Cambrian Inference License — non-commercial / academic only.

Parameters:

  • sequence (string) (required) Reference protein sequence (canonical isoform). Single-letter codes, no gaps. The mutant sequence is built internally by substituting alt_aa at position.

  • position (integer) (required) 1-indexed mutation position. The amino acid at sequence[position-1] must equal ref_aa; otherwise the tool returns an error (the wrong sequence was supplied or the variant notation references a different isoform).

  • ref_aa (string) (required) Reference amino acid at the mutation position, single-letter code (e.g. ‘R’). Tool validates this matches the supplied sequence.

  • alt_aa (string) (required) Mutant amino acid, single-letter code (e.g. ‘H’ for R175H).

  • window (integer) (optional) Residue window radius around the mutation. Per-feature activations are summed across this window before computing the delta. Default 8 = +/- 8 residues = 17-residue window.

  • top_k_features (integer) (optional) Number of top LOST and top GAINED features to return. Default 10.

  • model (string) (optional) ESMC backbone (default esmc-6b-2024-12).

  • sae_model (string) (optional) SAE checkpoint (default layer-60 6B SAE).

Example Usage:

query = {
    "name": "ESM_score_variant_sae_disruption",
    "arguments": {
        "sequence": "example_value",
        "position": 10,
        "ref_aa": "example_value",
        "alt_aa": "example_value"
    }
}
result = tu.run(query)