Esm Tools¶
Configuration File: esm_tools.json
Tool Type: Local
Tools Count: 10
This page contains all tools defined in the esm_tools.json configuration file.
Available Tools¶
ESM_describe_sae_feature (Type: ESMTool)¶
Label a single SAE feature_id with its dominant biological category by aggregating UniProt featur…
ESM_describe_sae_feature tool specification
Tool Information:
Name:
ESM_describe_sae_featureType:
ESMToolDescription: Label a single SAE feature_id with its dominant biological category by aggregating UniProt feature-type overlap at SAE-activating residues across a curated 10-protein panel. Returns one of: catalytic, ligand-binding, ptm, domain, motif, structural-stability, secondary-structure, transmembrane, signal-peptide, propeptide, or ‘uncategorized’ (no informative UniProt overlap found). Use when ESM_get_sae_features / ESM_score_variant_sae_disruption returned a feature_id that needs biological interpretation. Cost: 10 Forge credits FIRST time (1 per panel protein). Subsequent calls for the same feature_id are FREE (filesystem cache at ~/.cache/tooluniverse/sae_labels/{sae_model}/feature_{id}.json). Latency: ~30-60s first call (panel of 10 SAE inferences), <1s on cache hit. Prerequisites: same as ESM_get_sae_features. License: Cambrian Inference License — non-commercial.
Parameters:
feature_id(integer) (required) SAE feature index in [0, 16383]. Use the values returned by ESM_get_sae_features.active_features[].feature_id.sae_model(string) (optional) SAE checkpoint to label. Cache is keyed on this — different SAEs produce different labels for the same id.model(string) (optional) ESMC backbone.n_proteins(integer) (optional) Number of panel proteins to run SAE on. Default 10 (the full curated panel). Lower for cheaper exploration at the cost of confidence.top_residues_per_protein(integer) (optional) For each protein, take the top-K residues where the target feature activates most strongly, then check their UniProt annotations. Default 3.use_cache(boolean) (optional) If true (default) and a cached label exists at ~/.cache/tooluniverse/sae_labels/…/feature_{id}.json, return it instead of rerunning the panel. Set false to force a recompute.
Example Usage:
query = {
"name": "ESM_describe_sae_feature",
"arguments": {
"feature_id": 10
}
}
result = tu.run(query)
ESM_explain_variant_mechanism (Type: ESMTool)¶
One-call composite for variant mechanism: runs ESMC-6B SAE variant disruption + describe_sae_feat…
ESM_explain_variant_mechanism tool specification
Tool Information:
Name:
ESM_explain_variant_mechanismType:
ESMToolDescription: One-call composite for variant mechanism: runs ESMC-6B SAE variant disruption + describe_sae_feature on each top affected feature + composes a category-level mechanism summary (e.g. ‘Disrupted feature categories (lost): catalytic=2, ligand-binding=1’). Use when a variant-interpretation skill wants the mechanism in a single tool call rather than orchestrating disruption + N description calls. Set include_descriptions=false to skip the category labeling step (2 Forge calls only). Otherwise: 2 disruption calls + up to (2 * top_k_features) describe calls (most cached after first use). Prerequisites: pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’; set ESM_API_KEY. Outputs governed by EvolutionaryScale Cambrian Inference License — non-commercial use only unless covered by a separate commercial agreement.
Parameters:
sequence(string) (required) Reference (wild-type) protein amino acid sequence in single-letter code. Up to ~2700 AA.position(integer) (required) 1-indexed residue position of the variantref_aa(string) (required) Single-letter wild-type amino acid (must match sequence[position-1])alt_aa(string) (required) Single-letter substituted amino acidwindow(integer) (optional) Residue window centered on the mutation for activation summation.top_k_features(integer) (optional) Number of top lost / top gained features to describe and include in the category summary.include_descriptions(boolean) (optional) If true (default), call ESM_describe_sae_feature on each top feature to get category labels. If false, return raw feature IDs without categories (cheaper).model(string) (optional) ESMC base modelsae_model(string) (optional) SAE codebook identifier
Example Usage:
query = {
"name": "ESM_explain_variant_mechanism",
"arguments": {
"sequence": "example_value",
"position": 10,
"ref_aa": "example_value",
"alt_aa": "example_value"
}
}
result = tu.run(query)
ESM_fold_protein (Type: ESMTool)¶
Predict protein 3D structure from sequence using ESM3, returning pTM (predicted TM-score), per-re…
ESM_fold_protein tool specification
Tool Information:
Name:
ESM_fold_proteinType:
ESMToolDescription: Predict protein 3D structure from sequence using ESM3, returning pTM (predicted TM-score), per-residue pLDDT confidence scores, and backbone atom coordinates. ESM3 performs structure generation via iterative structure track decoding. pTM > 0.7 indicates confident prediction. Coordinates are (L, 37, 3) backbone atom positions in Angstroms (standard 37-atom representation). Use for: rapid structure prediction, confidence-based filtering, structure-guided design. Prerequisites: pip install esm; set ESM_API_KEY (from https://forge.evolutionaryscale.ai).
Parameters:
sequence(string) (required) Protein amino acid sequence in single-letter code to fold (e.g. ‘MKTAYIAKQRQISFVKSHFSRQ’).model(string) (optional) ESM3 model to use for structure prediction.num_steps(integer) (optional) Number of iterative structure decoding steps (default: 8). More steps may improve accuracy.
Example Usage:
query = {
"name": "ESM_fold_protein",
"arguments": {
"sequence": "example_value"
}
}
result = tu.run(query)
ESM_generate_protein_sequence (Type: ESMTool)¶
Generate or complete a protein sequence using ESM3, EvolutionaryScale’s generative protein langua…
ESM_generate_protein_sequence tool specification
Tool Information:
Name:
ESM_generate_protein_sequenceType:
ESMToolDescription: Generate or complete a protein sequence using ESM3, EvolutionaryScale’s generative protein language model. Provide a prompt_sequence with ‘_’ characters marking positions to generate (masked positions), and ESM3 will fill them in via iterative masked language modeling. Can generate entire sequences from scratch (all ‘_’) or complete partial sequences. Use for: protein engineering, de novo protein design, completing truncated sequences, exploring sequence space around a template. Prerequisites: pip install esm; set ESM_API_KEY (from https://forge.evolutionaryscale.ai). Example: ‘MKTAY_____QRQIS’ generates 5 residues in the masked region.
Parameters:
prompt_sequence(string) (required) Protein sequence template with ‘_’ at positions to generate. Use standard amino acid letters for fixed positions and ‘_’ for masked/generated positions. Example: ‘MKTAY_____QRQISFVK’ generates 5 residues in the middle.model(string) (optional) ESM3 model to use for generation.num_steps(integer) (optional) Number of iterative decoding steps (default: 8). More steps = slower but potentially better quality.temperature(number) (optional) Sampling temperature (default: 1.0). Lower values (0.1-0.5) produce more conservative sequences; higher values (1.0-2.0) more diverse.
Example Usage:
query = {
"name": "ESM_generate_protein_sequence",
"arguments": {
"prompt_sequence": "example_value"
}
}
result = tu.run(query)
ESM_get_protein_embedding (Type: ESMTool)¶
Get protein sequence embeddings from EvolutionaryScale ESMC (ESM Cambrian) models via the Forge A…
ESM_get_protein_embedding tool specification
Tool Information:
Name:
ESM_get_protein_embeddingType:
ESMToolDescription: Get protein sequence embeddings from EvolutionaryScale ESMC (ESM Cambrian) models via the Forge API. Returns a mean-pooled embedding vector (320-dim for esmc-300m, 1152-dim for esmc-600m) representing the entire protein, and optionally per-residue embeddings. ESMC is a fast, high-quality protein language model suitable for downstream tasks: similarity search, clustering, property prediction. Prerequisites: pip install esm; set ESM_API_KEY environment variable (token from https://forge.evolutionaryscale.ai). Use for: encoding protein sequences for ML, computing protein-protein similarity, featurizing sequences.
Parameters:
sequence(string) (required) Protein amino acid sequence in single-letter code (e.g. ‘MKTAYIAKQRQISFVKSHFSRQ’). Standard 20 amino acids; avoid gaps or non-standard characters.model(string) (optional) ESMC model to use. ‘esmc-300m-2024-12’ (faster, 300M params) or ‘esmc-600m-2024-12’ (more accurate, 600M params).return_per_residue(boolean) (optional) If true, also return per-residue embedding vectors (one vector per amino acid). Default false to reduce response size.
Example Usage:
query = {
"name": "ESM_get_protein_embedding",
"arguments": {
"sequence": "example_value"
}
}
result = tu.run(query)
ESM_get_region_sae_features (Type: ESMTool)¶
Aggregate ESMC-6B SAE features over a contiguous residue range to characterize the region’s biolo…
ESM_get_region_sae_features tool specification
Tool Information:
Name:
ESM_get_region_sae_featuresType:
ESMToolDescription: Aggregate ESMC-6B SAE features over a contiguous residue range to characterize the region’s biological identity. Returns features ranked by total |activation| across the region, with each feature’s per-residue hit pattern. Use for: interpreting a known domain (residues X-Y), profiling an antibody epitope, characterizing a binding pocket, comparing two regions of the same protein, or seeding ESM_describe_sae_feature calls on the top-K to get category labels. Max region length 500 residues. Prerequisites: pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’; set ESM_API_KEY. Outputs governed by EvolutionaryScale Cambrian Inference License — non-commercial use only unless covered by a separate commercial agreement.
Parameters:
sequence(string) (required) Protein amino acid sequence in single-letter code. Up to ~2700 AA.start_position(integer) (required) 1-indexed inclusive start of the region of interest.end_position(integer) (required) 1-indexed inclusive end of the region of interest. Must be >= start_position and within sequence length. Region length cap: 500.top_k_features(integer) (optional) Number of top features (by total |activation| over region) to return.model(string) (optional) ESMC base modelsae_model(string) (optional) SAE codebook identifier
Example Usage:
query = {
"name": "ESM_get_region_sae_features",
"arguments": {
"sequence": "example_value",
"start_position": 10,
"end_position": 10
}
}
result = tu.run(query)
ESM_get_sae_features (Type: ESMTool)¶
Run a protein sequence through an ESMC Sparse Autoencoder (SAE) and return sparse feature activat…
ESM_get_sae_features tool specification
Tool Information:
Name:
ESM_get_sae_featuresType:
ESMToolDescription: Run a protein sequence through an ESMC Sparse Autoencoder (SAE) and return sparse feature activations per residue. SAEs decompose the protein language model’s hidden states into a 16,384-feature sparse codebook (top-k=64 active per residue). Each feature is an interpretable latent — many activate on specific biological categories (catalytic site, ligand-binding region, PTM sequon, structural motif, etc.) that can be looked up via ESM_describe_sae_feature. Use for: variant interpretation (compare ref vs mutant activations), mechanism inference (which functional features the protein engages), protein-LM interpretability research. Prerequisites: (1) pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’ — the PyPI esm release does NOT yet include SAE support, which lives on an upstream feature branch. (2) ESM_API_KEY env var with a Forge token (https://forge.evolutionaryscale.ai). License note: SAE outputs are governed by the EvolutionaryScale Cambrian Inference Clickthrough License — non-commercial / academic use only.
Parameters:
sequence(string) (required) Protein amino acid sequence in single-letter code (e.g. ‘MEEPQSDPSVEPPLSQETFSDLWKLLPENN’). Standard 20 amino acids; avoid gaps or non-standard characters. Practical length cap ~2,700 AA.model(string) (optional) ESMC backbone model. Currently SAE is only released for esmc-6b-2024-12 (default).sae_model(string) (optional) SAE checkpoint name. Default is the layer-60 SAE matching esmc-6b backbone.position([‘integer’, ‘null’]) (optional) Optional 1-indexed residue position. If set, only activations within +/- window residues of this position are returned (use for variant analysis to keep response compact). If null, all residues are returned (large response for long sequences).window(integer) (optional) Residue window radius around position (only used when position is set). Default 8 = +/- 8 residues = 17-residue window centered on the variant.top_k_per_residue(integer) (optional) Cap features returned per residue, sorted by absolute activation. Default 64 = full sparsity (no cap). Set lower (e.g. 16) to reduce response size when no position filter is used.
Example Usage:
query = {
"name": "ESM_get_sae_features",
"arguments": {
"sequence": "example_value"
}
}
result = tu.run(query)
ESM_score_sequence (Type: ESMTool)¶
Score a protein sequence using ESMC logits to compute per-residue log-probabilities and mean pseu…
ESM_score_sequence tool specification
Tool Information:
Name:
ESM_score_sequenceType:
ESMToolDescription: Score a protein sequence using ESMC logits to compute per-residue log-probabilities and mean pseudo-log-likelihood. Higher (less negative) mean log-probability indicates the sequence is more natural/likely according to the model’s learned protein distribution. Use for: evaluating mutant fitness, comparing sequence variants, identifying low-confidence residues. Mean log-probability near 0 is highest confidence; values below -4 suggest unusual or low-quality residues. Prerequisites: pip install esm; set ESM_API_KEY (from https://forge.evolutionaryscale.ai).
Parameters:
sequence(string) (required) Protein amino acid sequence in single-letter code to score (e.g. ‘MKTAYIAKQRQISFVKSHFSRQ’).model(string) (optional) ESMC model to use for scoring.
Example Usage:
query = {
"name": "ESM_score_sequence",
"arguments": {
"sequence": "example_value"
}
}
result = tu.run(query)
ESM_score_variant_sae_batch (Type: ESMTool)¶
Score many missense variants against one reference protein using ESMC-6B SAE. More Forge-efficien…
ESM_score_variant_sae_batch tool specification
Tool Information:
Name:
ESM_score_variant_sae_batchType:
ESMToolDescription: Score many missense variants against one reference protein using ESMC-6B SAE. More Forge-efficient than calling ESM_score_variant_sae_disruption N times: runs the reference SAE once and reuses it across every variant, giving N+1 Forge calls instead of 2N. Use for saturation mutagenesis at a single position (all 19 alternative residues), DMS-style sweeps, clinical variant panels, or any workflow scoring multiple variants on the same protein. Max 100 variants per call. Prerequisites: pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’ (SAE support is on an unmerged feature branch — PyPI esm 3.2.x does NOT include SAEConfig). Set ESM_API_KEY environment variable. Outputs governed by EvolutionaryScale Cambrian Inference License — non-commercial use only unless covered by a separate commercial agreement.
Parameters:
sequence(string) (required) Reference (wild-type) protein amino acid sequence in single-letter code. Up to ~2700 AA.variants(array) (required) List of variants to score. Each variant: {position: 1-indexed int, ref_aa: single letter matching sequence[position-1], alt_aa: single letter substitution}.window(integer) (optional) Residue window centered on each variant for activation summation. Default 8 (i.e. positions [pos-8, pos+8]).top_k_features(integer) (optional) Number of top features to return per variant (top-K lost + top-K gained).model(string) (optional) ESMC base model (must be the 6B model — SAEs are trained against it)sae_model(string) (optional) SAE codebook identifier
Example Usage:
query = {
"name": "ESM_score_variant_sae_batch",
"arguments": {
"sequence": "example_value",
"variants": ["item1", "item2"]
}
}
result = tu.run(query)
ESM_score_variant_sae_disruption (Type: ESMTool)¶
Composite SAE-based variant scoring. Given a protein sequence and a missense variant (position + …
ESM_score_variant_sae_disruption tool specification
Tool Information:
Name:
ESM_score_variant_sae_disruptionType:
ESMToolDescription: Composite SAE-based variant scoring. Given a protein sequence and a missense variant (position + ref_aa + alt_aa), runs ESMC-6B SAE on both reference and mutant sequences, computes per-feature activation deltas summed over a window centered on the mutation site, and returns ranked top-K features LOST and GAINED in the mutant. This is the convenience layer over ESM_get_sae_features — use this for one-shot variant interpretation; use ESM_get_sae_features directly only when you need raw per-residue features for non-variant analyses. Validates ref_aa matches the position in the supplied sequence (returns clear error if not, so you can detect transcript / isoform mismatches). Cost: 2 Forge API credits (1 ref + 1 mut). Latency: ~3-6s for typical-length human proteins. Prerequisites: same as ESM_get_sae_features (pip install ‘esm @ git+https://github.com/evolutionaryscale/esm@ee891c52’, ESM_API_KEY env var). License: SAE outputs governed by Cambrian Inference License — non-commercial / academic only.
Parameters:
sequence(string) (required) Reference protein sequence (canonical isoform). Single-letter codes, no gaps. The mutant sequence is built internally by substituting alt_aa at position.position(integer) (required) 1-indexed mutation position. The amino acid at sequence[position-1] must equal ref_aa; otherwise the tool returns an error (the wrong sequence was supplied or the variant notation references a different isoform).ref_aa(string) (required) Reference amino acid at the mutation position, single-letter code (e.g. ‘R’). Tool validates this matches the supplied sequence.alt_aa(string) (required) Mutant amino acid, single-letter code (e.g. ‘H’ for R175H).window(integer) (optional) Residue window radius around the mutation. Per-feature activations are summed across this window before computing the delta. Default 8 = +/- 8 residues = 17-residue window.top_k_features(integer) (optional) Number of top LOST and top GAINED features to return. Default 10.model(string) (optional) ESMC backbone (default esmc-6b-2024-12).sae_model(string) (optional) SAE checkpoint (default layer-60 6B SAE).
Example Usage:
query = {
"name": "ESM_score_variant_sae_disruption",
"arguments": {
"sequence": "example_value",
"position": 10,
"ref_aa": "example_value",
"alt_aa": "example_value"
}
}
result = tu.run(query)