ToolUniverse 架构¶
概述¶
ToolUniverse 采用以统一的 ToolUniverse 引擎为核心的模块化、基于注册表的架构。通过工具注册、配置和自动发现,连接海量科学数据库和 API,为上层智能体、应用程序及 MCP 客户端提供一致的接口。
┌────────────────────┐
│ Applications/Agents│ ← Your business logic, conversational systems, scripts
└──────────┬─────────┘
│ Python API/MCP
┌──────────▼─────────┐
│ ToolUniverse Core │ ← Tool loading, registration, routing, execution
└──────────┬─────────┘
│ Registry/Config
┌──────────▼─────────┐
│ Tool Implementation│ ← OpenFDA, OpenTargets, UniProt, PubChem, GWAS...
│ Modules │
└──────────┬─────────┘
│ HTTP/GraphQL/Local
┌──────────▼─────────┐
│External Services/DB│
└────────────────────┘
仓库结构树¶
ToolUniverse/
├── src/tooluniverse/ # Core package directory
│ ├── __init__.py # Package exports, lazy loading control
│ │
│ ├── # Core Engine & Registry
│ ├── execute_function.py # ToolUniverse main engine class
│ ├── base_tool.py # BaseTool base class & exceptions
│ ├── tool_registry.py # Tool registration & discovery
│ ├── default_config.py # Default tool file configurations
│ ├── logging_config.py # Logging setup
│ └── utils.py # Utility functions
│ │
│ ├── # Tool Implementation Modules
│ ├── openfda_tool.py # FDA drug labels & data
│ ├── openfda_adv_tool.py # FDA adverse events
│ ├── ctg_tool.py # ClinicalTrials.gov
│ ├── graphql_tool.py # OpenTargets GraphQL APIs
│ ├── uniprot_tool.py # UniProt protein database
│ ├── pubchem_tool.py # PubChem chemical database
│ ├── reactome_tool.py # Reactome pathway database
│ ├── europe_pmc_tool.py # Europe PMC literature
│ ├── semantic_scholar_tool.py # Semantic Scholar papers
│ ├── gwas_tool.py # GWAS Catalog genetics
│ ├── hpa_tool.py # Human Protein Atlas
│ ├── rcsb_pdb_tool.py # Protein Data Bank
│ ├── medlineplus_tool.py # MedlinePlus health info
│ ├── restful_tool.py # Generic REST APIs (Monarch)
│ ├── url_tool.py # Web scraping & PDF extraction
│ ├── pubtator_tool.py # PubTator literature mining
│ ├── xml_tool.py # XML data processing
│ ├── admetai_tool.py # ADMET AI predictions
│ ├── alphafold_tool.py # AlphaFold protein structures
│ ├── chem_tool.py # ChEMBL chemical bioactivity
│ ├── compose_tool.py # Tool composition & workflows
│ ├── package_tool.py # Local package tools
│ ├── dataset_tool.py # Local dataset access
│ ├── mcp_client_tool.py # MCP client for remote tools
│ ├── remote_tool.py # Remote tool abstractions
│ ├── agentic_tool.py # Agentic behavior tools
│ ├── enrichr_tool.py # Enrichr gene set analysis
│ ├── efo_tool.py # Experimental Factor Ontology
│ ├── gene_ontology_tool.py # Gene Ontology
│ ├── humanbase_tool.py # HumanBase networks
│ ├── dailymed_tool.py # DailyMed drug labels
│ ├── uspto_tool.py # USPTO patent data
│ ├── uspto_downloader_tool.py # USPTO bulk downloads
│ ├── openalex_tool.py # OpenAlex scholarly data
│ └── boltz_tool.py # Boltz protein folding
│ │
│ ├── # Tool Discovery & Search
│ ├── tool_finder_keyword.py # Keyword-based tool search
│ ├── tool_finder_embedding.py # Embedding-based tool search
│ ├── tool_finder_llm.py # LLM-powered tool discovery
│ ├── embedding_database.py # Tool embedding database
│ └── embedding_sync.py # Embedding synchronization
│ │
│ ├── # MCP Integration & Servers
│ ├── smcp.py # FastMCP wrapper (SMCP class)
│ ├── smcp_server.py # MCP server entry points
│ ├── mcp_integration.py # ToolUniverse MCP methods injection
│ └── mcp_tool_registry.py # MCP tool registry & URLs
│ │
│ ├── # Configuration & Data
│ ├── data/ # Tool configurations
│ │ ├── *.json # Tool instance definitions
│ │ ├── packages/ # Package-related configs
│ │ └── remote_tools/ # Remote/MCP tool definitions
│ │
│ ├── # Tool Collections & Workflows
│ ├── toolsets/ # Organized tool collections
│ │ ├── bioinformatics/ # Bioinformatics toolset
│ │ ├── research/ # Research toolset
│ │ └── software_dev/ # Software development tools
│ │
│ ├── compose_scripts/ # Workflow composition scripts
│ │ ├── __init__.py
│ │ ├── biomarker_discovery.py # Biomarker discovery workflow
│ │ ├── comprehensive_drug_discovery.py # Drug discovery pipeline
│ │ ├── drug_safety_analyzer.py # Drug safety analysis
│ │ ├── literature_tool.py # Literature analysis
│ │ ├── output_summarizer.py # Result summarization
│ │ ├── tool_description_optimizer.py # Tool description optimization
│ │ ├── tool_discover.py # Tool discovery workflows
│ │ └── tool_graph_composer.py # Tool graph composition
│ │
│ ├── # External Integrations & Examples
│ ├── remote/ # External system integrations
│ │ ├── expert_feedback/ # Human expert feedback system
│ │ ├── expert_feedback_mcp/ # MCP-enabled expert feedback
│ │ ├── boltz/ # Boltz integration
│ │ ├── depmap_24q2/ # DepMap data integration
│ │ ├── immune_compass/ # Immune system tools
│ │ ├── pinnacle/ # Pinnacle integration
│ │ ├── transcriptformer/ # Transcriptformer model
│ │ └── uspto_downloader/ # USPTO downloader service
│ │
│ ├── # Visualization & UI
│ ├── scripts/ # Utility scripts
│ │ ├── generate_tool_graph.py # Tool graph generation
│ │ └── visualize_tool_graph.py # Tool graph visualization
│ ├── tool_graph_web_ui.py # Web-based tool graph UI
│ │
│ ├── # Configuration Templates
│ ├── template/ # Configuration templates
│ │ ├── file_save_hook_config.json # File save hook template
│ │ └── hook_config.json # General hook template
│ │
│ ├── # Output Processing
│ ├── output_hook.py # Output processing hooks
│ ├── extended_hooks.py # Extended hook functionality
│ │
│ └── # Testing
│ └── test/ # Unit & integration tests
│ ├── *.py # Test modules
│ ├── *.xml # Test data
│ └── *.parquet # Test datasets
│
├── # Documentation
├── docs/ # Sphinx documentation
│ ├── _build/ # Built documentation
│ ├── _static/ # Static assets
│ ├── _templates/ # Doc templates
│ ├── api/ # API documentation
│ ├── expand_tooluniverse/ # Extension guides
│ ├── guide/ # User guides
│ ├── reference/ # Reference docs
│ ├── tutorials/ # Tutorials
│ └── *.rst # Documentation source
│
├── # Root-level Files
├── pyproject.toml # Project config, dependencies, CLI
├── smcp_tooluniverse_server.py # Simplified MCP server launcher
├── README.md # Project overview
├── README_USAGE.md # Usage documentation
├── LICENSE # License file
├── uv.lock # UV lock file
│
├── # Build & Meta
├── build_docs.sh # Documentation build script
├── internal/ # Internal data & utilities
├── img/ # Images & assets
└── generated_tool_* # Generated tool files
核心组件¶
引擎与注册表
execute_function.py: Core ToolUniverse engine class responsible for:
Reading tool configurations (local JSON, default configs) and building all_tools/all_tool_dict
Mapping tool types to concrete classes (tool_type_mappings) and instantiation
Tool execution routing (run_tool), validation, and result processing
Handling MCP auto-loaders, temporary clients (with mcp_integration.py)
base_tool.py: BaseTool base class and exception types. Supports:
Loading default configurations from tooluniverse.data package
Parameter validation, required parameter extraction, function call validation
tool_registry.py: Tool registration and discovery:
@register_tool decorator for registering tool classes
Lazy loading registry (on-demand module imports) and full discovery
Smart matching of configuration JSON to modules and tool types
default_config.py:默认工具配置文件列表
logging_config.py,utils.py:日志配置和工具函数
工具实现类
可用的工具类别(按字母顺序排列):
ADMETAITool、AgenticTool、AlphaFoldRESTTool、BoltzTool、ChEMBLTool、ClinicalTrialsDetailsTool、ClinicalTrialsSearchTool、ComposeTool、DatasetTool、DiseaseTargetScoreTool、EFOTool、EmbeddingDatabase、EmbeddingSync、EnrichrTool、EuropePMCTool、FDACountAdditiveReactionsTool、FDADrugAdverseEventTool、FDADrugLabelGetDrugGenericNameTool、FDADrugLabelSearchIDTool、FDADrugLabelSearchTool、FDADrugLabelTool、GWASAssociationByID、GWASAssociationSearch、GWASAssociationsForSNP、GWASAssociationsForStudy、GWASAssociationsForTrait、GWASSNPByID、GWASSNPSearch、GWASSNPsForGene、GWASStudiesForTrait、GWASStudyByID、GWASStudySearch、GWASVariantsForTrait、GeneOntologyTool、GetSPLBySetIDTool、HPAGetGeneJSONTool、HPAGetGeneXMLTool、HumanBaseTool、MCPAutoLoaderTool、MCPClientTool、MedlinePlusRESTTool、MonarchDiseasesForMultiplePhenoTool、MonarchTool、OpenAlexTool、OpentargetGeneticsTool、OpentargetTool、OpentargetToolDrugNameMatch、PackageTool、PubChemRESTTool、PubTatorTool、RCSBTool、ReactomeRESTTool、RemoteTool、SearchSPLTool、SemanticScholarTool、ToolFinderEmbedding、ToolFinderKeyword、ToolFinderLLM、URLHTMLTagTool、URLToPDFTextTool、USPTODownloaderTool、USPTOOpenDataPortalTool、UniProtRESTTool、XMLDatasetTool
数据与配置
data/*.json:每个数据源或类别的工具配置清单
data/packages/*:与软件包相关的扩展配置
data/remote_tools/*:远程工具/MCP 定义
toolsets/:按场景组织的工具集合(bioinformatics/,research/,software_dev/)
MCP 集成与服务器
smcp.py:FastMCP 的封装器,提供 SMCP 和 create_smcp_server
smcp_server.py:封装MCP服务器入口点(通过`pyproject.toml`命令行界面暴露)
mcp_integration.py:将 load_mcp_tools 和 discover_mcp_tools 方法注入到 ToolUniverse 中
mcp_tool_registry.py:用于URL和工具发现的MCP工具注册表
根目录下的 smcp_tooluniverse_server.py:用于本地快速启动服务器的简化启动脚本
外部生态系统与扩展示例
remote/: External system integrations including:
expert_feedback/: Human expert feedback system
expert_feedback_mcp/: MCP-enabled expert feedback
boltz/: Boltz protein folding integration
depmap_24q2/: DepMap cancer dependency data integration
immune_compass/: Immune system analysis tools
pinnacle/: Pinnacle platform integration
transcriptformer/: Transcriptformer model integration
uspto_downloader/: USPTO patent downloader service
执行流程(从配置到调用)¶
Configuration Loading
Engine startup reads default_tool_files and data/*.json to build tool manifest
Each JSON entry defines a tool instance: name, type, description, parameter (JSON Schema), endpoints, etc.
Tool Registration & Mapping
tool_registry.py maintains “tool type → tool class” mappings
Supports both full import discovery and lazy loading mappings (smart config-to-module matching)
Instantiation & Default Configuration
Based on type, finds corresponding class (e.g., FDADrugLabelTool)
Merges BaseTool default configurations with entry-specific config
Execution & Validation
ToolUniverse.tools.tool_name(**params):
通过名称定位实例 → 参数验证(必填字段) → 调用具体实现
统一的错误处理和返回结构
Composition/Discovery & Graphs
Use compose_tool.py or compose_scripts/ for orchestration
Leverage tool_finder_* (keyword/embedding/LLM) for tool retrieval
Visualize tool relationships and call chains via scripts or tool_graph_web_ui.py
MCP 集成¶
Server Side: - smcp.py provides SMCP object for one-click exposure of all ToolUniverse tools - smcp_server.py and root smcp_tooluniverse_server.py provide convenient startup - pyproject.toml exposes commands: tooluniverse-smcp, tooluniverse-smcp-stdio, tooluniverse-smcp-server, etc.
客户端/远程工具: - mcp_client_tool.py、mcp_integration.py 支持从远程 MCP 服务器进行发现和动态注册 - MCPAutoLoaderTool 可通过配置前缀和超时时间,自动发现并批量注册远程工具的 URL - list_mcp_connections() 显示已加载的远程连接及工具数量
配置与数据约定¶
**工具配置结构**(data/*.json 文件):
{
"name": "FDADrugLabelGetDrugGenericName",
"type": "FDADrugLabelGetDrugGenericNameTool",
"description": "Get generic name for an FDA drug label",
"parameter": {
"type": "object",
"properties": {
"drug_name": {"type": "string", "required": true}
}
},
"endpoint": "https://api.fda.gov/drug/label.json",
"method": "GET"
}
命名与映射规范: - *_tools.json 通常对应 *_tool.py 模块 - tool_registry.py 执行智能匹配 - 可在类定义时使用 @register_tool 进行显式注册
扩展点¶
添加新数据源工具:
在 src/tooluniverse/ 目录下创建继承自 BaseTool 的 xxx_tool.py 文件
使用 @register_tool(‘YourToolType’) 进行注册,或依赖命名约定。
在 data/xxx_tools.json 中添加一个或多个工具条目
集成远程MCP工具:
使用带有服务器 URL 的 MCPAutoLoaderTool 进行自动发现
或者使用 ToolUniverse.load_mcp_tools([…]) 进行运行时动态加载
组成与工作流程:
对于复杂的调用链,请使用 compose_tool.py 或在 compose_scripts/ 中添加脚本。
利用 tool_finder_* 进行检索和路由辅助
目录快速参考¶
核心包:src/tooluniverse/
工具实现:同一目录下的各个 *_tool.py 文件
工具配置:src/tooluniverse/data/*.json
工具集合:src/tooluniverse/toolsets/
组合脚本:src/tooluniverse/compose_scripts/
MCP 与服务器:src/tooluniverse/smcp.py,src/tooluniverse/smcp_server.py,根目录下的 smcp_tooluniverse_server.py
外部集成:src/tooluniverse/remote/
可视化与图表:src/tooluniverse/scripts/,src/tooluniverse/tool_graph_web_ui.py
临时/缓存输出:用户缓存目录(macOS: ~/Library/Caches/ToolUniverse,Linux: ~/.cache/tooluniverse,Windows: %LOCALAPPDATA%\ToolUniverse\Cache)
摘要¶
ToolUniverse 通过清晰的注册机制、标准化的 JSON 配置和丰富的工具模块,提供了从工具发现与执行到远程集成(MCP)的完整生态系统。您可以通过添加模块和配置,快速扩展新的数据源或功能,而无需修改引擎。组合与可视化工具支持构建可解释且可复用的科学工作流。