可用工具参考¶
ToolUniverse所有科学工具及其功能的完整参考。
ToolUniverse提供1000+工具,涵盖八个主要类别,每个类别都服务于科学研究中的特定计算和分析需求。
工具生态系统概览¶
ToolUniverse 集成了八大类别的工具:
ToolUniverse Ecosystem (1000+ Tools):
┌─────────────────┐
│ ML Models │ 15 tools → Prediction, Classification, Generation
│ (AI/ML) │
└─────────────────┘
┌─────────────────┐
│ AI Agents │ 33 tools → Autonomous Planning, Tool Routing
│ (Agentic) │
└─────────────────┘
┌─────────────────┐
│ Software │ 164 tools → Bioinformatics, Analysis Packages
│ Packages │
└─────────────────┘
┌─────────────────┐
│ Human Expert │ 6 tools → Consultation, Validation, Feedback
│ Feedback │
└─────────────────┘
┌─────────────────┐
│ Robotics │ 1 tool → ROS Communication, Lab Automation
│ (Automation) │
└─────────────────┘
┌─────────────────┐
│ Databases │ 84 tools → Structured Data, Knowledge Bases
│ (Storage) │
└─────────────────┘
┌─────────────────┐
│ Embedding │ 4 tools → Vector Search, Semantic Retrieval
│ Stores │
└─────────────────┘
┌─────────────────┐
│ APIs │ 281 tools → External Services, Data Access
│ (Integration) │
└─────────────────┘
工具类别摘要¶
UniProt - 蛋白质信息¶
访问全面的蛋白质和基因信息。
主要功能:
* UniProt_get_function_by_accession - 通过 UniProt 访问号获取功能注释
* UniProt_search_proteins - 通过关键词搜索蛋白质
* UniProt_get_protein_sequence - 获取蛋白质序列
示例:
query = {
"name": "UniProt_get_function_by_accession",
"arguments": {"accession": "P38398"} # BRCA1 accession
}
result = tu.run(query)
基因本体论 - 功能注释¶
基因本体注释与功能分析。
主要功能:
* GeneOntology_get_annotations - 获取基因的GO注释
* GeneOntology_search_terms - 搜索GO术语
* GeneOntology_get_enrichment - 功能富集分析
示例:
query = {
"name": "GeneOntology_get_annotations",
"arguments": {"gene_symbols": ["BRCA1", "BRCA2", "TP53"]}
}
Enrichr - 基因集分析¶
全面的基因集富集分析。
主要功能:
* Enrichr_analyze_gene_list - 基因列表的富集分析
* Enrichr_get_libraries - 列出可用的基因集库
* Enrichr_download_results - 下载富集分析结果
示例:
query = {
"name": "Enrichr_analyze_gene_list",
"arguments": {
"genes": ["BRCA1", "BRCA2", "TP53", "ATM", "CHEK2"],
"library": "KEGG_2021_Human"
}
}
Disease & Target Data
OpenTargets 平台¶
全面的疾病-靶点关联数据。
主要功能:
* OpenTargets_get_associated_targets_by_disease_efoId - 疾病相关靶点
* OpenTargets_get_associated_diseases_by_target - 靶点相关疾病
* OpenTargets_get_disease_id_description_by_name - 疾病查询
* OpenTargets_get_evidence - 关联证据
* OpenTargets_get_drug_info - 药物信息及作用机制
示例:
# Get targets for Alzheimer's disease
query = {
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
"arguments": {"efoId": "EFO_0000537"} # hypertension
}
EFO - 实验因子本体¶
疾病与实验因素本体论。
主要功能:
* EFO_search_diseases - 按名称搜索疾病
* EFO_get_disease_hierarchy - 获取疾病关系层级
* EFO_get_synonyms - 获取疾病同义词
示例:
query = {
"name": "EFO_search_diseases",
"arguments": {"query": "diabetes"}
}
Drug & Chemical Data
PubChem - 化学信息¶
全面的化学化合物数据库。
主要功能:
* PubChem_get_compound_info - 通过名称或ID获取化合物信息
* PubChem_search_compounds - 通过结构或性质搜索化合物
* PubChem_get_compound_properties - 分子性质
* PubChem_similarity_search - 化学相似性搜索
示例:
query = {
"name": "PubChem_get_compound_info",
"arguments": {"compound_name": "aspirin"}
}
ChEMBL - 生物活性数据¶
化学生物活性与药物发现数据。
主要功能:
* ChEMBL_get_compound_targets - 获取化合物的靶点
* ChEMBL_get_compounds_by_target - 获取针对特定蛋白的化合物
* ChEMBL_get_bioactivity_data - 生物活性测量数据
* ChEMBL_search_similar_compounds - 化学相似性搜索
示例:
query = {
"name": "ChEMBL_get_compounds_by_target",
"arguments": {"target_symbol": "EGFR"}
}
️ Drug Safety & Regulatory¶
OpenFDA - FDA 数据¶
FDA药品标签和不良事件数据。
主要功能:
* FAERS_count_reactions_by_drug_event - 按药物统计不良反应
* openfda_get_warnings_by_drug_name - 获取FDA警告信息
* OpenFDA_get_drug_labels - 药品标签信息
* OpenFDA_search_recalls - 药品召回信息
示例:
# Search adverse events
query = {
"name": "FAERS_count_reactions_by_drug_event",
"arguments": {"medicinalproduct": "warfarin"}
}
# Get FDA warnings
query = {
"name": "openfda_get_warnings_by_drug_name",
"arguments": {"medicinalproduct": "warfarin"}
}
每日药典 - 药品标签¶
官方FDA药品标签信息
主要功能:
* DailyMed_get_drug_label - 获取官方药品说明书
* DailyMed_search_drugs - 按名称搜索药品
* DailyMed_get_NDC_info - 获取NDC(药品代码)信息
示例:
query = {
"name": "DailyMed_get_drug_label",
"arguments": {"medicinalproduct": "metformin"}
}
Clinical Research
ClinicalTrials.gov¶
临床试验注册与结果数据库
主要功能:
* ClinicalTrials_search_studies - 搜索临床试验
* ClinicalTrials_get_study_details - 获取详细的试验信息
* ClinicalTrials_get_trial_results - 获取试验结果
* ClinicalTrials_search_by_condition - 根据医疗条件查找试验
示例:
query = {
"name": "ClinicalTrials_search_studies",
"arguments": {
"condition": "breast cancer",
"intervention": "immunotherapy"
}
}
Literature & Publications
PubTator - 生物医学文献¶
使用命名实体识别的PubMed文献。
主要功能:
* PubTator_search_publications - 使用实体搜索文献
* PubTator_get_annotations - 获取实体注释
* PubTator_search_by_entity - 根据特定实体进行搜索
示例:
query = {
"name": "PubTator_search_publications",
"arguments": {
"query": "@GENE_BRCA1 @DISEASE_cancer"
}
}
欧洲PMC¶
欧洲文学数据库,提供全文访问。
主要功能:
* EuropePMC_search_articles - 搜索文章和摘要
* EuropePMC_get_full_text - 获取可用的全文
* EuropePMC_get_citations - 获取引用数据
示例:
query = {
"name": "EuropePMC_search_articles",
"arguments": {"query": "CRISPR gene therapy"}
}
语义学者¶
基于AI的学术搜索引擎
主要功能:
* SemanticScholar_search_papers - 搜索学术论文
* SemanticScholar_get_paper_details - 获取论文详细信息
* SemanticScholar_get_citations - 引用网络分析
示例:
query = {
"name": "SemanticScholar_search_papers",
"arguments": {"query": "machine learning drug discovery"}
}
OpenAlex¶
开放学术出版物数据库
主要功能:
* OpenAlex_search_works - 搜索学术作品
* OpenAlex_get_author_info - 获取作者信息和指标
* OpenAlex_get_institution_data - 获取机构研究数据
Specialized Databases
人类蛋白质图谱¶
组织和细胞表达数据
主要功能:
* HPA_get_tissue_expression - 组织表达模式
* HPA_get_cell_expression - 单细胞表达数据
* HPA_get_protein_localization - 亚细胞定位
示例:
query = {
"name": "HPA_get_tissue_expression",
"arguments": {"gene_symbol": "BRCA1"}
}
Reactome Pathways¶
生物通路数据库
主要功能:
* Reactome_get_pathways_by_gene - 获取基因对应的通路
* Reactome_search_pathways - 搜索通路数据库
* Reactome_get_pathway_details - 获取通路详细信息
示例:
query = {
"name": "Reactome_get_pathways_by_gene",
"arguments": {"gene_symbol": "TP53"}
}
人类数据库¶
组织特异性基因网络
主要功能:
* HumanBase_get_gene_networks - 组织特异性网络
* HumanBase_predict_gene_function - 基因功能预测
* HumanBase_get_tissue_expression - 组织表达模式
MedlinePlus¶
消费者健康信息
主要功能:
* MedlinePlus_get_health_topics - 健康主题信息
* MedlinePlus_search_conditions - 搜索医疗状况
* MedlinePlus_get_drug_info - 消费者药物信息
AI-Powered Tools
机器学习模型(15种工具)¶
将机器学习算法应用于预测、分类和生成任务。
Core ML 工具:
boltz2_docking - 蛋白质-配体结合预测
{
"name": "boltz2_docking",
"arguments": {
"protein_structure": "1ABC",
"ligand_smiles": "CCO"
}
}
# Returns: binding_affinity, binding_probability, confidence_score
ADMET_predict_CYP_interactions - 药物代谢预测
{
"name": "ADMET_predict_CYP_interactions",
"arguments": {
"smiles": "CC(=O)OC1=CC=CC=C1C(=O)O", # Aspirin
"cyp_enzymes": ["CYP3A4", "CYP2D6"]
}
}
# Returns: interaction_probabilities, metabolic_stability
run_TxAgent_biomedical_reasoning - 治疗推理
{
"name": "run_TxAgent_biomedical_reasoning",
"arguments": {
"query": "What are the therapeutic targets for Alzheimer's disease?",
"context": "precision_medicine"
}
}
# Returns: therapeutic_insights, target_recommendations
AI代理(33个工具)¶
能够感知环境、做出决策并采取行动以实现研究目标的自主工具。
文献与分析代理:
HypothesisGenerator - 生成研究假设
{
"name": "HypothesisGenerator",
"arguments": {
"research_area": "cancer immunotherapy",
"constraints": ["FDA-approved targets", "known biomarkers"],
"num_hypotheses": 5
}
}
# Returns: ranked_hypotheses, supporting_evidence, testable_predictions
ExperimentalDesignScorer - 评估实验设计
{
"name": "ExperimentalDesignScorer",
"arguments": {
"experiment_description": "Phase II trial for EGFR inhibitor",
"evaluation_criteria": ["feasibility", "statistical_power", "ethics"]
}
}
# Returns: design_score, improvement_suggestions, risk_assessment
MedicalLiteratureReviewer - 全面文献分析
{
"name": "MedicalLiteratureReviewer",
"arguments": {
"topic": "CAR-T cell therapy safety profile",
"databases": ["PubMed", "ClinicalTrials.gov"],
"time_range": "2020-2024"
}
}
# Returns: comprehensive_review, key_findings, research_gaps
工具发现与组合¶
用于发现和组合其他工具的AI工具。
主要功能:
* discover_tools_by_description - 根据自然语言查找工具
* compose_tools_for_workflow - 创建工具工作流
* optimize_tool_descriptions - 优化工具描述
示例:
query = {
"name": "discover_tools_by_description",
"arguments": {
"description": "I need to find genes associated with heart disease"
}
}
Search & Integration Tools
工具查找器¶
为您的研究需求寻找合适的工具。
主要功能:
* find_tools_by_keyword - 基于关键词的工具搜索
* find_tools_by_category - 按类别浏览工具
* get_tool_recommendations - 获取工具推荐
示例:
query = {
"name": "find_tools_by_keyword",
"arguments": {"keywords": ["drug", "safety", "adverse"]}
}
嵌入存储(4种工具)¶
存储和检索科学数据的向量化表示,以支持语义搜索。
核心嵌入工具:
embedding_tool_finder - 语义工具发现
{
"name": "embedding_tool_finder",
"arguments": {
"query": "predict protein folding dynamics",
"top_k": 10,
"similarity_threshold": 0.7
}
}
# Returns: relevant_tools, similarity_scores, tool_descriptions
embedding_database_search - 向量相似度搜索
{
"name": "embedding_database_search",
"arguments": {
"query_vector": embedding_vector,
"database": "pubmed_abstracts",
"top_k": 50
}
}
# Returns: similar_documents, relevance_scores, metadata
数据集成¶
用于合并来自多个来源的数据的工具。
主要功能:
* integrate_gene_data - 整合来自多个来源的基因数据
* cross_reference_identifiers - 在不同的ID系统之间进行映射
* validate_data_consistency - 检查数据的一致性
️ Tool Usage Patterns¶
单工具查询¶
简单、专注于特定信息的查询:
# Get protein function by accession (EGFR → P00533)
protein_query = {
"name": "UniProt_get_function_by_accession",
"arguments": {"accession": "P00533"}
}
# Search adverse events
safety_query = {
"name": "FAERS_count_reactions_by_drug_event",
"arguments": {"medicinalproduct": "metformin"}
}
多工具工作流程¶
结合多种工具进行全面分析:
# Step 1: Get disease info
disease_query = {
"name": "OpenTargets_get_disease_id_description_by_name",
"arguments": {"diseaseName": "diabetes"}
}
# Step 2: Get associated targets
targets_query = {
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
"arguments": {"efoId": disease_id}
}
# Step 3: Analyze target pathways
pathway_query = {
"name": "Enrichr_analyze_gene_list",
"arguments": {
"genes": target_list,
"library": "KEGG_2021_Human"
}
}
批量处理¶
高效处理多个相关查询:
# Process multiple genes
genes = ["BRCA1", "BRCA2", "TP53", "ATM"]
results = {}
for accession in ["P38398", "P51587", "P04637", "Q13315"]: # BRCA1, BRCA2, TP53, ATM
query = {
"name": "UniProt_get_function_by_accession",
"arguments": {"accession": accession}
}
results[accession] = tu.run(query)
集成模式¶
多工具工作流程¶
结合多种工具进行全面分析:
from tooluniverse import ToolUniverse
# Drug discovery workflow
def drug_discovery_pipeline(disease_name):
tooluni = ToolUniverse()
tooluni.load_tools()
# 1. Find disease ID
disease_query = {
"name": "OpenTargets_get_disease_id_description_by_name",
"arguments": {"disease_name": disease_name}
}
disease_info = tooluni.run(disease_query)
# 2. Get associated targets
targets_query = {
"name": "OpenTargets_get_associated_targets_by_disease_efoId",
"arguments": {"efoId": disease_info['id']}
}
targets_result = tooluni.run(targets_query)
targets = targets_result['data']['disease']['associatedTargets']['rows']
# 3. Find drugs for each target
drugs = []
for row in targets[:5]: # Top 5 targets
target = row['target']
drugs_query = {
"name": "OpenTargets_get_associated_drugs_by_target_ensemblID",
"arguments": {
"target_ensembl_id": target['id'],
"size": 10,
"cursor": ""
}
}
target_drugs = tooluni.run(drugs_query)
drugs.extend(target_drugs)
# 4. Check safety profiles
for drug in drugs[:10]: # Top 10 drugs
safety_query = {
"name": "openfda_get_warnings_by_drug_name",
"arguments": {"drug_name": drug['name']}
}
safety = tooluni.run(safety_query)
drug['safety_warnings'] = safety
return drugs
工具组成模式¶
顺序工作流:
# Disease → Targets → Compounds → Prediction
workflow = [
("OpenTargets_get_associated_targets_by_disease_efoId", {"efoId": disease_id}),
("ChEMBL_search_compounds_by_target", {"target_id": target_result}),
("boltz2_docking", {"protein_id": target, "ligand_smiles": compound}),
("ADMETAI_predict_admet_properties", {"smiles": compound})
]
并行数据收集:
# Multi-database literature search
parallel_searches = [
("PubTator_search_publications", {"query": research_topic}),
("EuropePMC_search_articles", {"query": research_topic}),
("SemanticScholar_search_papers", {"query": research_topic})
]
反馈回路:
# Iterative optimization
while not satisfactory_result:
prediction = ml_model_prediction(current_compound)
if prediction.score < threshold:
analogs = chemical_database_search(current_compound)
current_compound = select_best_analog(analogs)
else:
break
Tool Performance Tips
优化策略¶
使用特定查询:更具体的查询可以更快返回结果
限制结果:使用
limit参数来控制结果大小缓存结果:启用缓存以加速重复查询
尽量批量处理:某些工具支持批量操作
速率限制¶
ToolUniverse 会自动处理 API 速率限制,但您可以进行优化:
import time
# Add delays for large batch operations
for query in large_query_list:
result = tu.run(query)
time.sleep(0.1) # Small delay between requests
错误处理¶
始终包含错误处理以确保应用程序的健壮性:
try:
result = tu.run(query)
if result and 'data' in result:
# Process successful result
process_data(result['data'])
else:
print("No data returned")
except Exception as e:
print(f"Query failed: {e}")
性能优化¶
类别特定注意事项¶
ML 模型: - 远程执行可减少本地资源需求 - 尽可能进行批量预测 - 对于高成本计算结果进行缓存
API: - 遵守速率限制并实现退避机制 - 对于大型数据集使用分页 - 对频繁查询结果进行缓存
数据库: - 使用特定字段查询代替全文搜索 - 为数据检索设置结果限制 - 为经常访问的数据建立索引
代理: - 配置适当的超时值 - 对于长时间运行的任务使用流式处理 - 实现进度监控
最佳实践¶
工具选择:根据您的具体使用场景选择合适的工具
速率限制:遵守 API 速率限制以避免被阻止
错误处理:始终以优雅的方式处理潜在的 API 错误
缓存:对经常访问的数据使用缓存
批量处理:在可能的情况下使用批量操作以提高效率
配置:根据您的环境适当配置工具
工具发现与选择¶
选择合适的工具¶
按类别:
# List tools by type (use get_tool_types() to see available types)
print(tu.get_tool_types()) # e.g. ['opentarget', 'ChEMBL', 'uniprot', ...]
ml_tools = tu.filter_tools(include_tool_types=["ML_tools"])
database_tools = tu.filter_tools(include_tool_types=["uniprot", "ChEMBL"])
api_tools = tu.filter_tools(include_tool_types=["EuropePMC", "PubMed"])
按功能分类:
# Semantic search across all categories
protein_tools = tu.run({
"name": "find_tools",
"arguments": {"query": "protein structure prediction", "limit": 10}
})
drug_tools = tu.run({
"name": "find_tools",
"arguments": {"query": "drug safety analysis", "limit": 10}
})
literature_tools = tu.run({
"name": "find_tools",
"arguments": {"query": "literature review automation", "limit": 10}
})
按领域:
# Load domain-specific tools
tu.load_tools(tool_type=[
"opentarget", # Disease-target data
"ChEMBL", # Chemical data
"uniprot", # Protein data
"pubtator" # Literature with entities
])
API 身份验证¶
# API keys are managed via environment variables
# Set them before importing ToolUniverse or use a .env file
import os
os.environ['NCBI_API_KEY'] = 'your_ncbi_key'
os.environ['SEMANTIC_SCHOLAR_API_KEY'] = 'your_s2_key'
# ToolUniverse automatically reads API keys from environment variables
tu = ToolUniverse()
tu.load_tools()
未来扩展¶
计划类别: - 可视化工具:交互式绘图和仪表板生成 - 工作流引擎:高级编排和调度 - 云服务:分布式计算和存储 - 合规工具:法规与伦理验证
社区贡献: - 工具提交指南 - 质量保证流程 - 社区投票与验证 - 维护与更新
Next Steps
现在您已经了解有哪些可用的工具:
Try Examples: Examples - See tools in action
Build Workflows: 科学工作流 - Combine tools for research
Extend ToolUniverse: 导航 - Create custom tools
小技巧
发现提示:使用人工智能驱动的工具发现功能,找到适合您特定研究问题的正确工具!
小技巧
工具生态系统协同:这八个类别旨在协同工作。API 提供数据访问,ML 模型增加智能,代理负责协调复杂的工作流,而数据库和嵌入存储支持高效的信息管理。