ToolUniverse 架构

概述

ToolUniverse 采用以统一的 ToolUniverse 引擎为核心的模块化、基于注册表的架构。通过工具注册、配置和自动发现,连接海量科学数据库和 API,为上层智能体、应用程序及 MCP 客户端提供一致的接口。

┌────────────────────┐
│ Applications/Agents│  ← Your business logic, conversational systems, scripts
└──────────┬─────────┘
           │ Python API/MCP
┌──────────▼─────────┐
│  ToolUniverse Core │  ← Tool loading, registration, routing, execution
└──────────┬─────────┘
           │ Registry/Config
┌──────────▼─────────┐
│ Tool Implementation│  ← OpenFDA, OpenTargets, UniProt, PubChem, GWAS...
│     Modules        │
└──────────┬─────────┘
           │ HTTP/GraphQL/Local
┌──────────▼─────────┐
│External Services/DB│
└────────────────────┘

仓库结构树

ToolUniverse/
├── src/tooluniverse/                          # Core package directory
│   ├── __init__.py                           # Package exports, lazy loading control
│   │
│   ├── # Core Engine & Registry
│   ├── execute_function.py                   # ToolUniverse main engine class
│   ├── base_tool.py                         # BaseTool base class & exceptions
│   ├── tool_registry.py                     # Tool registration & discovery
│   ├── default_config.py                    # Default tool file configurations
│   ├── logging_config.py                    # Logging setup
│   └── utils.py                             # Utility functions
│   │
│   ├── # Tool Implementation Modules
│   ├── openfda_tool.py                      # FDA drug labels & data
│   ├── openfda_adv_tool.py                  # FDA adverse events
│   ├── ctg_tool.py                          # ClinicalTrials.gov
│   ├── graphql_tool.py                      # OpenTargets GraphQL APIs
│   ├── uniprot_tool.py                      # UniProt protein database
│   ├── pubchem_tool.py                      # PubChem chemical database
│   ├── reactome_tool.py                     # Reactome pathway database
│   ├── europe_pmc_tool.py                   # Europe PMC literature
│   ├── semantic_scholar_tool.py             # Semantic Scholar papers
│   ├── gwas_tool.py                         # GWAS Catalog genetics
│   ├── hpa_tool.py                          # Human Protein Atlas
│   ├── rcsb_pdb_tool.py                     # Protein Data Bank
│   ├── medlineplus_tool.py                  # MedlinePlus health info
│   ├── restful_tool.py                      # Generic REST APIs (Monarch)
│   ├── url_tool.py                          # Web scraping & PDF extraction
│   ├── pubtator_tool.py                     # PubTator literature mining
│   ├── xml_tool.py                          # XML data processing
│   ├── admetai_tool.py                      # ADMET AI predictions
│   ├── alphafold_tool.py                    # AlphaFold protein structures
│   ├── chem_tool.py                         # ChEMBL chemical bioactivity
│   ├── compose_tool.py                      # Tool composition & workflows
│   ├── package_tool.py                      # Local package tools
│   ├── dataset_tool.py                      # Local dataset access
│   ├── mcp_client_tool.py                   # MCP client for remote tools
│   ├── remote_tool.py                       # Remote tool abstractions
│   ├── agentic_tool.py                      # Agentic behavior tools
│   ├── enrichr_tool.py                      # Enrichr gene set analysis
│   ├── efo_tool.py                          # Experimental Factor Ontology
│   ├── gene_ontology_tool.py                # Gene Ontology
│   ├── humanbase_tool.py                    # HumanBase networks
│   ├── dailymed_tool.py                     # DailyMed drug labels
│   ├── uspto_tool.py                        # USPTO patent data
│   ├── uspto_downloader_tool.py             # USPTO bulk downloads
│   ├── openalex_tool.py                     # OpenAlex scholarly data
│   └── boltz_tool.py                        # Boltz protein folding
│   │
│   ├── # Tool Discovery & Search
│   ├── tool_finder_keyword.py               # Keyword-based tool search
│   ├── tool_finder_embedding.py             # Embedding-based tool search
│   ├── tool_finder_llm.py                   # LLM-powered tool discovery
│   ├── embedding_database.py                # Tool embedding database
│   └── embedding_sync.py                    # Embedding synchronization
│   │
│   ├── # MCP Integration & Servers
│   ├── smcp.py                              # FastMCP wrapper (SMCP class)
│   ├── smcp_server.py                       # MCP server entry points
│   ├── mcp_integration.py                   # ToolUniverse MCP methods injection
│   └── mcp_tool_registry.py                 # MCP tool registry & URLs
│   │
│   ├── # Configuration & Data
│   ├── data/                                # Tool configurations
│   │   ├── *.json                          # Tool instance definitions
│   │   ├── packages/                       # Package-related configs
│   │   └── remote_tools/                   # Remote/MCP tool definitions
│   │
│   ├── # Tool Collections & Workflows
│   ├── toolsets/                           # Organized tool collections
│   │   ├── bioinformatics/                # Bioinformatics toolset
│   │   ├── research/                      # Research toolset
│   │   └── software_dev/                  # Software development tools
│   │
│   ├── compose_scripts/                    # Workflow composition scripts
│   │   ├── __init__.py
│   │   ├── biomarker_discovery.py         # Biomarker discovery workflow
│   │   ├── comprehensive_drug_discovery.py # Drug discovery pipeline
│   │   ├── drug_safety_analyzer.py        # Drug safety analysis
│   │   ├── literature_tool.py             # Literature analysis
│   │   ├── output_summarizer.py           # Result summarization
│   │   ├── tool_description_optimizer.py  # Tool description optimization
│   │   ├── tool_discover.py               # Tool discovery workflows
│   │   └── tool_graph_composer.py         # Tool graph composition
│   │
│   ├── # External Integrations & Examples
│   ├── remote/                             # External system integrations
│   │   ├── expert_feedback/               # Human expert feedback system
│   │   ├── expert_feedback_mcp/           # MCP-enabled expert feedback
│   │   ├── boltz/                         # Boltz integration
│   │   ├── depmap_24q2/                   # DepMap data integration
│   │   ├── immune_compass/                # Immune system tools
│   │   ├── pinnacle/                      # Pinnacle integration
│   │   ├── transcriptformer/              # Transcriptformer model
│   │   └── uspto_downloader/              # USPTO downloader service
│   │
│   ├── # Visualization & UI
│   ├── scripts/                           # Utility scripts
│   │   ├── generate_tool_graph.py         # Tool graph generation
│   │   └── visualize_tool_graph.py        # Tool graph visualization
│   ├── tool_graph_web_ui.py               # Web-based tool graph UI
│   │
│   ├── # Configuration Templates
│   ├── template/                          # Configuration templates
│   │   ├── file_save_hook_config.json     # File save hook template
│   │   └── hook_config.json               # General hook template
│   │
│   ├── # Output Processing
│   ├── output_hook.py                     # Output processing hooks
│   ├── extended_hooks.py                  # Extended hook functionality
│   │
│   └── # Testing
│       └── test/                          # Unit & integration tests
│           ├── *.py                       # Test modules
│           ├── *.xml                      # Test data
│           └── *.parquet                  # Test datasets
├── # Documentation
├── docs/                                  # Sphinx documentation
│   ├── _build/                           # Built documentation
│   ├── _static/                          # Static assets
│   ├── _templates/                       # Doc templates
│   ├── api/                              # API documentation
│   ├── expand_tooluniverse/              # Extension guides
│   ├── guide/                            # User guides
│   ├── reference/                        # Reference docs
│   ├── tutorials/                        # Tutorials
│   └── *.rst                             # Documentation source
├── # Root-level Files
├── pyproject.toml                        # Project config, dependencies, CLI
├── smcp_tooluniverse_server.py          # Simplified MCP server launcher
├── README.md                             # Project overview
├── README_USAGE.md                       # Usage documentation
├── LICENSE                               # License file
├── uv.lock                              # UV lock file
├── # Build & Meta
├── build_docs.sh                        # Documentation build script
├── internal/                            # Internal data & utilities
├── img/                                 # Images & assets
└── generated_tool_*                     # Generated tool files

核心组件

引擎与注册表

  • execute_function.py: Core ToolUniverse engine class responsible for:

  • Reading tool configurations (local JSON, default configs) and building all_tools/all_tool_dict

  • Mapping tool types to concrete classes (tool_type_mappings) and instantiation

  • Tool execution routing (run_tool), validation, and result processing

  • Handling MCP auto-loaders, temporary clients (with mcp_integration.py)

  • base_tool.py: BaseTool base class and exception types. Supports:

  • Loading default configurations from tooluniverse.data package

  • Parameter validation, required parameter extraction, function call validation

  • tool_registry.py: Tool registration and discovery:

  • @register_tool decorator for registering tool classes

  • Lazy loading registry (on-demand module imports) and full discovery

  • Smart matching of configuration JSON to modules and tool types

  • default_config.py:默认工具配置文件列表

  • logging_config.pyutils.py:日志配置和工具函数

工具实现类

可用的工具类别(按字母顺序排列):

ADMETAIToolAgenticToolAlphaFoldRESTToolBoltzToolChEMBLToolClinicalTrialsDetailsToolClinicalTrialsSearchToolComposeToolDatasetToolDiseaseTargetScoreToolEFOToolEmbeddingDatabaseEmbeddingSyncEnrichrToolEuropePMCToolFDACountAdditiveReactionsToolFDADrugAdverseEventToolFDADrugLabelGetDrugGenericNameToolFDADrugLabelSearchIDToolFDADrugLabelSearchToolFDADrugLabelToolGWASAssociationByIDGWASAssociationSearchGWASAssociationsForSNPGWASAssociationsForStudyGWASAssociationsForTraitGWASSNPByIDGWASSNPSearchGWASSNPsForGeneGWASStudiesForTraitGWASStudyByIDGWASStudySearchGWASVariantsForTraitGeneOntologyToolGetSPLBySetIDToolHPAGetGeneJSONToolHPAGetGeneXMLToolHumanBaseToolMCPAutoLoaderToolMCPClientToolMedlinePlusRESTToolMonarchDiseasesForMultiplePhenoToolMonarchToolOpenAlexToolOpentargetGeneticsToolOpentargetToolOpentargetToolDrugNameMatchPackageToolPubChemRESTToolPubTatorToolRCSBToolReactomeRESTToolRemoteToolSearchSPLToolSemanticScholarToolToolFinderEmbeddingToolFinderKeywordToolFinderLLMURLHTMLTagToolURLToPDFTextToolUSPTODownloaderToolUSPTOOpenDataPortalToolUniProtRESTToolXMLDatasetTool

数据与配置

  • data/*.json:每个数据源或类别的工具配置清单

  • data/packages/*:与软件包相关的扩展配置

  • data/remote_tools/*:远程工具/MCP 定义

  • toolsets/:按场景组织的工具集合(bioinformatics/research/software_dev/

MCP 集成与服务器

  • smcp.py:FastMCP 的封装器,提供 SMCPcreate_smcp_server

  • smcp_server.py:封装MCP服务器入口点(通过`pyproject.toml`命令行界面暴露)

  • mcp_integration.py:将 load_mcp_toolsdiscover_mcp_tools 方法注入到 ToolUniverse

  • mcp_tool_registry.py:用于URL和工具发现的MCP工具注册表

  • 根目录下的 smcp_tooluniverse_server.py:用于本地快速启动服务器的简化启动脚本

外部生态系统与扩展示例

  • remote/: External system integrations including:

  • expert_feedback/: Human expert feedback system

  • expert_feedback_mcp/: MCP-enabled expert feedback

  • boltz/: Boltz protein folding integration

  • depmap_24q2/: DepMap cancer dependency data integration

  • immune_compass/: Immune system analysis tools

  • pinnacle/: Pinnacle platform integration

  • transcriptformer/: Transcriptformer model integration

  • uspto_downloader/: USPTO patent downloader service

执行流程(从配置到调用)

  1. Configuration Loading

  • Engine startup reads default_tool_files and data/*.json to build tool manifest

  • Each JSON entry defines a tool instance: name, type, description, parameter (JSON Schema), endpoints, etc.

  1. Tool Registration & Mapping

  • tool_registry.py maintains “tool type → tool class” mappings

  • Supports both full import discovery and lazy loading mappings (smart config-to-module matching)

  1. Instantiation & Default Configuration

  • Based on type, finds corresponding class (e.g., FDADrugLabelTool)

  • Merges BaseTool default configurations with entry-specific config

  1. Execution & Validation

  • ToolUniverse.tools.tool_name(**params):

  • 通过名称定位实例 → 参数验证(必填字段) → 调用具体实现

  • 统一的错误处理和返回结构

  1. Composition/Discovery & Graphs

  • Use compose_tool.py or compose_scripts/ for orchestration

  • Leverage tool_finder_* (keyword/embedding/LLM) for tool retrieval

  • Visualize tool relationships and call chains via scripts or tool_graph_web_ui.py

MCP 集成

Server Side: - smcp.py provides SMCP object for one-click exposure of all ToolUniverse tools - smcp_server.py and root smcp_tooluniverse_server.py provide convenient startup - pyproject.toml exposes commands: tooluniverse-smcp, tooluniverse-smcp-stdio, tooluniverse-smcp-server, etc.

客户端/远程工具: - mcp_client_tool.pymcp_integration.py 支持从远程 MCP 服务器进行发现和动态注册 - MCPAutoLoaderTool 可通过配置前缀和超时时间,自动发现并批量注册远程工具的 URL - list_mcp_connections() 显示已加载的远程连接及工具数量

配置与数据约定

**工具配置结构**(data/*.json 文件):

{
  "name": "FDADrugLabelGetDrugGenericName",
  "type": "FDADrugLabelGetDrugGenericNameTool",
  "description": "Get generic name for an FDA drug label",
  "parameter": {
    "type": "object",
    "properties": {
      "drug_name": {"type": "string", "required": true}
    }
  },
  "endpoint": "https://api.fda.gov/drug/label.json",
  "method": "GET"
}

命名与映射规范: - *_tools.json 通常对应 *_tool.py 模块 - tool_registry.py 执行智能匹配 - 可在类定义时使用 @register_tool 进行显式注册

扩展点

添加新数据源工具:

  1. src/tooluniverse/ 目录下创建继承自 BaseToolxxx_tool.py 文件

  2. 使用 @register_tool(‘YourToolType’) 进行注册,或依赖命名约定。

  3. data/xxx_tools.json 中添加一个或多个工具条目

集成远程MCP工具:

  • 使用带有服务器 URL 的 MCPAutoLoaderTool 进行自动发现

  • 或者使用 ToolUniverse.load_mcp_tools([…]) 进行运行时动态加载

组成与工作流程:

  • 对于复杂的调用链,请使用 compose_tool.py 或在 compose_scripts/ 中添加脚本。

  • 利用 tool_finder_* 进行检索和路由辅助

目录快速参考

  • 核心包src/tooluniverse/

  • 工具实现:同一目录下的各个 *_tool.py 文件

  • 工具配置src/tooluniverse/data/*.json

  • 工具集合src/tooluniverse/toolsets/

  • 组合脚本src/tooluniverse/compose_scripts/

  • MCP 与服务器src/tooluniverse/smcp.pysrc/tooluniverse/smcp_server.py,根目录下的 smcp_tooluniverse_server.py

  • 外部集成src/tooluniverse/remote/

  • 可视化与图表src/tooluniverse/scripts/src/tooluniverse/tool_graph_web_ui.py

  • 临时/缓存输出:用户缓存目录(macOS: ~/Library/Caches/ToolUniverse,Linux: ~/.cache/tooluniverse,Windows: %LOCALAPPDATA%\ToolUniverse\Cache

摘要

ToolUniverse 通过清晰的注册机制、标准化的 JSON 配置和丰富的工具模块,提供了从工具发现与执行到远程集成(MCP)的完整生态系统。您可以通过添加模块和配置,快速扩展新的数据源或功能,而无需修改引擎。组合与可视化工具支持构建可解释且可复用的科学工作流。