Cellxgene Census Tools¶
Configuration File: cellxgene_census_tools.json
Tool Type: Local
Tools Count: 7
This page contains all tools defined in the cellxgene_census_tools.json configuration file.
Available Tools¶
CELLxGENE_download_h5ad (Type: CELLxGENECensusTool)¶
Download original H5AD (HDF5-based AnnData) files from CELLxGENE datasets or get their URIs. Acce…
CELLxGENE_download_h5ad tool specification
Tool Information:
Name:
CELLxGENE_download_h5adType:
CELLxGENECensusToolDescription: Download original H5AD (HDF5-based AnnData) files from CELLxGENE datasets or get their URIs. Access unprocessed data from individual studies with complete metadata and analysis results. Requires dataset_id from CELLxGENE Discover. Note: Files can be large (GBs). Use for: accessing raw data, reproducibility, offline analysis, custom processing pipelines.
Parameters:
operation(string) (required) Operation typedataset_id(string) (required) CELLxGENE dataset identifier (required)output_path(string) (optional) Local path to save H5AD file (omit to get URI only)census_version(string) (optional) Census version to query. ‘stable’ (recommended, Long-Term Support release), ‘latest’ (newest data, may change), or specific date ‘YYYY-MM-DD’ (for reproducibility). Default ‘stable’ is best for production analyses.
Example Usage:
query = {
"name": "CELLxGENE_download_h5ad",
"arguments": {
"operation": "example_value",
"dataset_id": "example_value"
}
}
result = tu.run(query)
CELLxGENE_get_cell_metadata (Type: CELLxGENECensusTool)¶
Query cell metadata from CELLxGENE Census (50M+ human/mouse single cells). CRITICAL: obs_value_fi…
CELLxGENE_get_cell_metadata tool specification
Tool Information:
Name:
CELLxGENE_get_cell_metadataType:
CELLxGENECensusToolDescription: Query cell metadata from CELLxGENE Census (50M+ human/mouse single cells). CRITICAL: obs_value_filter is REQUIRED - unfiltered queries will timeout. Common filters: tissue_general (brain, heart, lung, liver, blood, kidney, skin, eye, intestine), cell_type (T cell, B cell, neuron, macrophage, fibroblast), disease (normal, COVID-19, Alzheimer disease, cancer). Start with: obs_value_filter=’tissue_general == “lung” and disease == “normal”’. Returns cell type, tissue, disease, donor info. Use for: finding cells by criteria, cohort selection, exploring cell populations.
Parameters:
operation(string) (required) Operation typeorganism(string) (optional) Organism nameobs_value_filter(string) (required) REQUIRED - filter cells using SQL-like syntax. Unfiltered queries timeout (50M+ cells). Format: ‘field == “value”’. Operators: ==, !=, in, <, >, <=, >=. Combine with ‘and’/’or’. Common fields and values: tissue_general (brain, heart, lung, liver, blood, kidney, skin, eye, intestine, pancreas, bone marrow), cell_type (T cell, B cell, neuron, macrophage, fibroblast, epithelial cell, endothelial cell), disease (normal, COVID-19, Alzheimer disease, lung adenocarcinoma), assay (10x 3’ v3, Smart-seq2), sex (male, female). Examples: ‘tissue_general == “lung”’, ‘disease == “COVID-19” and tissue_general == “blood”’, ‘cell_type in [“T cell”, “B cell”] and disease == “normal”’.column_names(array) (optional) Specific columns to return (default: all columns)census_version(string) (optional) Census version to query. ‘stable’ (recommended, Long-Term Support release), ‘latest’ (newest data, may change), or specific date ‘YYYY-MM-DD’ (for reproducibility). Default ‘stable’ is best for production analyses.
Example Usage:
query = {
"name": "CELLxGENE_get_cell_metadata",
"arguments": {
"operation": "example_value",
"obs_value_filter": "example_value"
}
}
result = tu.run(query)
CELLxGENE_get_census_versions (Type: CELLxGENECensusTool)¶
Get list of available CELLxGENE Census versions with release dates and descriptions. The Census c…
CELLxGENE_get_census_versions tool specification
Tool Information:
Name:
CELLxGENE_get_census_versionsType:
CELLxGENECensusToolDescription: Get list of available CELLxGENE Census versions with release dates and descriptions. The Census contains single-cell RNA-seq data from 50M+ cells (human, mouse, non-human primates). Latest LTS release: 2025-11-08. Prerequisites: Requires ‘cellxgene-census’ package (install: pip install tooluniverse[singlecell]). Use for: checking available data versions, selecting stable vs latest builds, understanding data updates.
Parameters:
operation(string) (optional) Operation type
Example Usage:
query = {
"name": "CELLxGENE_get_census_versions",
"arguments": {
}
}
result = tu.run(query)
CELLxGENE_get_embeddings (Type: CELLxGENECensusTool)¶
Access pre-calculated cell embeddings (scVI, Geneformer) from CELLxGENE Census. Returns available…
CELLxGENE_get_embeddings tool specification
Tool Information:
Name:
CELLxGENE_get_embeddingsType:
CELLxGENECensusToolDescription: Access pre-calculated cell embeddings (scVI, Geneformer) from CELLxGENE Census. Returns available embedding names or specific embedding data. Embeddings enable rapid visualization and analysis without recomputation. Use for: UMAP/t-SNE visualization, cell similarity analysis, transfer learning.
Parameters:
operation(string) (required) Operation typeorganism(string) (optional) Organism nameembedding_name(string) (optional) Name of specific embedding to retrieve (omit to list available embeddings)census_version(string) (optional) Census version to query. ‘stable’ (recommended, Long-Term Support release), ‘latest’ (newest data, may change), or specific date ‘YYYY-MM-DD’ (for reproducibility). Default ‘stable’ is best for production analyses.
Example Usage:
query = {
"name": "CELLxGENE_get_embeddings",
"arguments": {
"operation": "example_value"
}
}
result = tu.run(query)
CELLxGENE_get_expression_data (Type: CELLxGENECensusTool)¶
Query gene expression data from CELLxGENE Census (50M+ cells, 60K+ genes). CRITICAL: At least one…
CELLxGENE_get_expression_data tool specification
Tool Information:
Name:
CELLxGENE_get_expression_dataType:
CELLxGENECensusToolDescription: Query gene expression data from CELLxGENE Census (50M+ cells, 60K+ genes). CRITICAL: At least one filter REQUIRED - unfiltered queries timeout. Start with: obs_value_filter=’tissue_general == “lung” and disease == “normal”’ AND var_value_filter=’feature_name in [“TP53”, “EGFR”]’. Returns AnnData summary (dimensions, metadata). For full matrix data, use Python API with sufficient memory.
Parameters:
operation(string) (required) Operation typeorganism(string) (optional) Organism nameobs_value_filter(string) (optional) REQUIRED (or use var_value_filter) - Cell filter. Common values: tissue_general (brain, lung, heart, blood, liver), cell_type (T cell, B cell, neuron, macrophage), disease (normal, COVID-19, Alzheimer disease). Example: ‘tissue_general == “lung” and disease == “normal”’var_value_filter(string) (optional) REQUIRED (or use obs_value_filter) - Gene filter by symbol or Ensembl ID. Examples: ‘feature_name in [“TP53”, “BRCA1”, “EGFR”]’, ‘feature_name == “CD4”’. Common genes: TP53, EGFR, BRCA1, CD4, CD8A, IL6, TNF, GAPDH.obs_column_names(array) (optional) Cell metadata columns to includevar_column_names(array) (optional) Gene metadata columns to includecensus_version(string) (optional) Census version to query. ‘stable’ (recommended, Long-Term Support release), ‘latest’ (newest data, may change), or specific date ‘YYYY-MM-DD’ (for reproducibility). Default ‘stable’ is best for production analyses.
Example Usage:
query = {
"name": "CELLxGENE_get_expression_data",
"arguments": {
"operation": "example_value"
}
}
result = tu.run(query)
CELLxGENE_get_gene_metadata (Type: CELLxGENECensusTool)¶
Query gene (variable) metadata from CELLxGENE Census. Returns gene symbols, Ensembl IDs, feature …
CELLxGENE_get_gene_metadata tool specification
Tool Information:
Name:
CELLxGENE_get_gene_metadataType:
CELLxGENECensusToolDescription: Query gene (variable) metadata from CELLxGENE Census. Returns gene symbols, Ensembl IDs, feature types, and summary statistics. Filter by gene name, feature type, or expression characteristics. Use for: finding genes of interest, validating gene symbols, exploring feature presence across datasets.
Parameters:
operation(string) (required) Operation typeorganism(string) (optional) Organism namevar_value_filter(string) (optional) Filter genes using SQL-like syntax. Format: ‘field == “value”’. Common fields: feature_name (gene symbol), feature_id (Ensembl ID), feature_biotype. Examples: ‘feature_name == “TP53”’, ‘feature_name in [“CD4”, “CD8A”]’, ‘feature_biotype == “protein_coding”’.column_names(array) (optional) Specific columns to return (default: all columns)census_version(string) (optional) Census version to query. ‘stable’ (recommended, Long-Term Support release), ‘latest’ (newest data, may change), or specific date ‘YYYY-MM-DD’ (for reproducibility). Default ‘stable’ is best for production analyses.
Example Usage:
query = {
"name": "CELLxGENE_get_gene_metadata",
"arguments": {
"operation": "example_value"
}
}
result = tu.run(query)
CELLxGENE_get_presence_matrix (Type: CELLxGENECensusTool)¶
Get feature presence matrix showing which genes are measured in which datasets. Returns sparse ma…
CELLxGENE_get_presence_matrix tool specification
Tool Information:
Name:
CELLxGENE_get_presence_matrixType:
CELLxGENECensusToolDescription: Get feature presence matrix showing which genes are measured in which datasets. Returns sparse matrix dimensions and density. Useful for understanding data completeness and selecting datasets with genes of interest. Use for: checking gene coverage, dataset selection, quality assessment.
Parameters:
operation(string) (required) Operation typeorganism(string) (optional) Organism namecensus_version(string) (optional) Census version to query. ‘stable’ (recommended, Long-Term Support release), ‘latest’ (newest data, may change), or specific date ‘YYYY-MM-DD’ (for reproducibility). Default ‘stable’ is best for production analyses.
Example Usage:
query = {
"name": "CELLxGENE_get_presence_matrix",
"arguments": {
"operation": "example_value"
}
}
result = tu.run(query)