ToolUniverse ArchitectureΒΆ

This document provides a comprehensive overview of ToolUniverse’s code architecture, directory organization, core components, tool discovery/execution flow, MCP integration, and extension points.

OverviewΒΆ

ToolUniverse follows a modular, registry-based architecture centered around the unified ToolUniverse engine. It connects to massive scientific databases and APIs through tool registration, configuration, and auto-discovery, providing a consistent interface for upper-layer agents, applications, and MCP clients.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Applications/Agentsβ”‚  ← Your business logic, conversational systems, scripts
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ SDK/MCP
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ToolUniverse Core β”‚  ← Tool loading, registration, routing, execution
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ Registry/Config
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tool Implementationβ”‚  ← OpenFDA, OpenTargets, UniProt, PubChem, GWAS...
β”‚     Modules        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ HTTP/GraphQL/Local
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚External Services/DBβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Repository Structure TreeΒΆ

ToolUniverse/
β”œβ”€β”€ src/tooluniverse/                          # Core package directory
β”‚   β”œβ”€β”€ __init__.py                           # Package exports, lazy loading control
β”‚   β”‚
β”‚   β”œβ”€β”€ # Core Engine & Registry
β”‚   β”œβ”€β”€ execute_function.py                   # ToolUniverse main engine class
β”‚   β”œβ”€β”€ base_tool.py                         # BaseTool base class & exceptions
β”‚   β”œβ”€β”€ tool_registry.py                     # Tool registration & discovery
β”‚   β”œβ”€β”€ default_config.py                    # Default tool file configurations
β”‚   β”œβ”€β”€ logging_config.py                    # Logging setup
β”‚   └── utils.py                             # Utility functions
β”‚   β”‚
β”‚   β”œβ”€β”€ # Tool Implementation Modules
β”‚   β”œβ”€β”€ openfda_tool.py                      # FDA drug labels & data
β”‚   β”œβ”€β”€ openfda_adv_tool.py                  # FDA adverse events
β”‚   β”œβ”€β”€ ctg_tool.py                          # ClinicalTrials.gov
β”‚   β”œβ”€β”€ graphql_tool.py                      # OpenTargets GraphQL APIs
β”‚   β”œβ”€β”€ uniprot_tool.py                      # UniProt protein database
β”‚   β”œβ”€β”€ pubchem_tool.py                      # PubChem chemical database
β”‚   β”œβ”€β”€ reactome_tool.py                     # Reactome pathway database
β”‚   β”œβ”€β”€ europe_pmc_tool.py                   # Europe PMC literature
β”‚   β”œβ”€β”€ semantic_scholar_tool.py             # Semantic Scholar papers
β”‚   β”œβ”€β”€ gwas_tool.py                         # GWAS Catalog genetics
β”‚   β”œβ”€β”€ hpa_tool.py                          # Human Protein Atlas
β”‚   β”œβ”€β”€ rcsb_pdb_tool.py                     # Protein Data Bank
β”‚   β”œβ”€β”€ medlineplus_tool.py                  # MedlinePlus health info
β”‚   β”œβ”€β”€ restful_tool.py                      # Generic REST APIs (Monarch)
β”‚   β”œβ”€β”€ url_tool.py                          # Web scraping & PDF extraction
β”‚   β”œβ”€β”€ pubtator_tool.py                     # PubTator literature mining
β”‚   β”œβ”€β”€ xml_tool.py                          # XML data processing
β”‚   β”œβ”€β”€ admetai_tool.py                      # ADMET AI predictions
β”‚   β”œβ”€β”€ alphafold_tool.py                    # AlphaFold protein structures
β”‚   β”œβ”€β”€ chem_tool.py                         # ChEMBL chemical bioactivity
β”‚   β”œβ”€β”€ compose_tool.py                      # Tool composition & workflows
β”‚   β”œβ”€β”€ package_tool.py                      # Local package tools
β”‚   β”œβ”€β”€ dataset_tool.py                      # Local dataset access
β”‚   β”œβ”€β”€ mcp_client_tool.py                   # MCP client for remote tools
β”‚   β”œβ”€β”€ remote_tool.py                       # Remote tool abstractions
β”‚   β”œβ”€β”€ agentic_tool.py                      # Agentic behavior tools
β”‚   β”œβ”€β”€ enrichr_tool.py                      # Enrichr gene set analysis
β”‚   β”œβ”€β”€ efo_tool.py                          # Experimental Factor Ontology
β”‚   β”œβ”€β”€ gene_ontology_tool.py                # Gene Ontology
β”‚   β”œβ”€β”€ humanbase_tool.py                    # HumanBase networks
β”‚   β”œβ”€β”€ dailymed_tool.py                     # DailyMed drug labels
β”‚   β”œβ”€β”€ uspto_tool.py                        # USPTO patent data
β”‚   β”œβ”€β”€ uspto_downloader_tool.py             # USPTO bulk downloads
β”‚   β”œβ”€β”€ openalex_tool.py                     # OpenAlex scholarly data
β”‚   └── boltz_tool.py                        # Boltz protein folding
β”‚   β”‚
β”‚   β”œβ”€β”€ # Tool Discovery & Search
β”‚   β”œβ”€β”€ tool_finder_keyword.py               # Keyword-based tool search
β”‚   β”œβ”€β”€ tool_finder_embedding.py             # Embedding-based tool search
β”‚   β”œβ”€β”€ tool_finder_llm.py                   # LLM-powered tool discovery
β”‚   β”œβ”€β”€ embedding_database.py                # Tool embedding database
β”‚   └── embedding_sync.py                    # Embedding synchronization
β”‚   β”‚
β”‚   β”œβ”€β”€ # MCP Integration & Servers
β”‚   β”œβ”€β”€ smcp.py                              # FastMCP wrapper (SMCP class)
β”‚   β”œβ”€β”€ smcp_server.py                       # MCP server entry points
β”‚   β”œβ”€β”€ mcp_integration.py                   # ToolUniverse MCP methods injection
β”‚   └── mcp_tool_registry.py                 # MCP tool registry & URLs
β”‚   β”‚
β”‚   β”œβ”€β”€ # Configuration & Data
β”‚   β”œβ”€β”€ data/                                # Tool configurations
β”‚   β”‚   β”œβ”€β”€ *.json                          # Tool instance definitions
β”‚   β”‚   β”œβ”€β”€ packages/                       # Package-related configs
β”‚   β”‚   └── remote_tools/                   # Remote/MCP tool definitions
β”‚   β”‚
β”‚   β”œβ”€β”€ # Tool Collections & Workflows
β”‚   β”œβ”€β”€ toolsets/                           # Organized tool collections
β”‚   β”‚   β”œβ”€β”€ bioinformatics/                # Bioinformatics toolset
β”‚   β”‚   β”œβ”€β”€ research/                      # Research toolset
β”‚   β”‚   └── software_dev/                  # Software development tools
β”‚   β”‚
β”‚   β”œβ”€β”€ compose_scripts/                    # Workflow composition scripts
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ biomarker_discovery.py         # Biomarker discovery workflow
β”‚   β”‚   β”œβ”€β”€ comprehensive_drug_discovery.py # Drug discovery pipeline
β”‚   β”‚   β”œβ”€β”€ drug_safety_analyzer.py        # Drug safety analysis
β”‚   β”‚   β”œβ”€β”€ literature_tool.py             # Literature analysis
β”‚   β”‚   β”œβ”€β”€ output_summarizer.py           # Result summarization
β”‚   β”‚   β”œβ”€β”€ tool_description_optimizer.py  # Tool description optimization
β”‚   β”‚   β”œβ”€β”€ tool_discover.py               # Tool discovery workflows
β”‚   β”‚   └── tool_graph_composer.py         # Tool graph composition
β”‚   β”‚
β”‚   β”œβ”€β”€ # External Integrations & Examples
β”‚   β”œβ”€β”€ remote/                             # External system integrations
β”‚   β”‚   β”œβ”€β”€ expert_feedback/               # Human expert feedback system
β”‚   β”‚   β”œβ”€β”€ expert_feedback_mcp/           # MCP-enabled expert feedback
β”‚   β”‚   β”œβ”€β”€ boltz/                         # Boltz integration
β”‚   β”‚   β”œβ”€β”€ depmap_24q2/                   # DepMap data integration
β”‚   β”‚   β”œβ”€β”€ immune_compass/                # Immune system tools
β”‚   β”‚   β”œβ”€β”€ pinnacle/                      # Pinnacle integration
β”‚   β”‚   β”œβ”€β”€ transcriptformer/              # Transcriptformer model
β”‚   β”‚   └── uspto_downloader/              # USPTO downloader service
β”‚   β”‚
β”‚   β”œβ”€β”€ # Visualization & UI
β”‚   β”œβ”€β”€ scripts/                           # Utility scripts
β”‚   β”‚   β”œβ”€β”€ generate_tool_graph.py         # Tool graph generation
β”‚   β”‚   └── visualize_tool_graph.py        # Tool graph visualization
β”‚   β”œβ”€β”€ tool_graph_web_ui.py               # Web-based tool graph UI
β”‚   β”‚
β”‚   β”œβ”€β”€ # Configuration Templates
β”‚   β”œβ”€β”€ template/                          # Configuration templates
β”‚   β”‚   β”œβ”€β”€ file_save_hook_config.json     # File save hook template
β”‚   β”‚   └── hook_config.json               # General hook template
β”‚   β”‚
β”‚   β”œβ”€β”€ # Output Processing
β”‚   β”œβ”€β”€ output_hook.py                     # Output processing hooks
β”‚   β”œβ”€β”€ extended_hooks.py                  # Extended hook functionality
β”‚   β”‚
β”‚   └── # Testing
β”‚       └── test/                          # Unit & integration tests
β”‚           β”œβ”€β”€ *.py                       # Test modules
β”‚           β”œβ”€β”€ *.xml                      # Test data
β”‚           └── *.parquet                  # Test datasets
β”‚
β”œβ”€β”€ # Documentation
β”œβ”€β”€ docs/                                  # Sphinx documentation
β”‚   β”œβ”€β”€ _build/                           # Built documentation
β”‚   β”œβ”€β”€ _static/                          # Static assets
β”‚   β”œβ”€β”€ _templates/                       # Doc templates
β”‚   β”œβ”€β”€ api/                              # API documentation
β”‚   β”œβ”€β”€ expand_tooluniverse/              # Extension guides
β”‚   β”œβ”€β”€ guide/                            # User guides
β”‚   β”œβ”€β”€ reference/                        # Reference docs
β”‚   β”œβ”€β”€ tutorials/                        # Tutorials
β”‚   └── *.rst                             # Documentation source
β”‚
β”œβ”€β”€ # Root-level Files
β”œβ”€β”€ pyproject.toml                        # Project config, dependencies, CLI
β”œβ”€β”€ smcp_tooluniverse_server.py          # Simplified MCP server launcher
β”œβ”€β”€ README.md                             # Project overview
β”œβ”€β”€ README_USAGE.md                       # Usage documentation
β”œβ”€β”€ LICENSE                               # License file
β”œβ”€β”€ uv.lock                              # UV lock file
β”‚
β”œβ”€β”€ # Build & Meta
β”œβ”€β”€ build_docs.sh                        # Documentation build script
β”œβ”€β”€ internal/                            # Internal data & utilities
β”œβ”€β”€ img/                                 # Images & assets
└── generated_tool_*                     # Generated tool files

Core ComponentsΒΆ

Engine & Registry

  • execute_function.py: Core ToolUniverse engine class responsible for: - Reading tool configurations (local JSON, default configs) and building all_tools/all_tool_dict - Mapping tool types to concrete classes (tool_type_mappings) and instantiation - Tool execution routing (run_tool), validation, and result processing - Handling MCP auto-loaders, temporary clients (with mcp_integration.py)

  • base_tool.py: BaseTool base class and exception types. Supports: - Loading default configurations from tooluniverse.data package - Parameter validation, required parameter extraction, function call validation

  • tool_registry.py: Tool registration and discovery: - @register_tool decorator for registering tool classes - Lazy loading registry (on-demand module imports) and full discovery - Smart matching of configuration JSON to modules and tool types

  • default_config.py: Default tool configuration file list

  • logging_config.py, utils.py: Logging setup and utility functions

Tool Implementation Classes

Available tool classes (alphabetically organized):

ADMETAITool, AgenticTool, AlphaFoldRESTTool, BoltzTool, ChEMBLTool, ClinicalTrialsDetailsTool, ClinicalTrialsSearchTool, ComposeTool, DatasetTool, DiseaseTargetScoreTool, EFOTool, EmbeddingDatabase, EmbeddingSync, EnrichrTool, EuropePMCTool, FDACountAdditiveReactionsTool, FDADrugAdverseEventTool, FDADrugLabelGetDrugGenericNameTool, FDADrugLabelSearchIDTool, FDADrugLabelSearchTool, FDADrugLabelTool, GWASAssociationByID, GWASAssociationSearch, GWASAssociationsForSNP, GWASAssociationsForStudy, GWASAssociationsForTrait, GWASSNPByID, GWASSNPSearch, GWASSNPsForGene, GWASStudiesForTrait, GWASStudyByID, GWASStudySearch, GWASVariantsForTrait, GeneOntologyTool, GetSPLBySetIDTool, HPAGetGeneJSONTool, HPAGetGeneXMLTool, HumanBaseTool, MCPAutoLoaderTool, MCPClientTool, MedlinePlusRESTTool, MonarchDiseasesForMultiplePhenoTool, MonarchTool, OpenAlexTool, OpentargetGeneticsTool, OpentargetTool, OpentargetToolDrugNameMatch, PackageTool, PubChemRESTTool, PubTatorTool, RCSBTool, ReactomeRESTTool, RemoteTool, SearchSPLTool, SemanticScholarTool, ToolFinderEmbedding, ToolFinderKeyword, ToolFinderLLM, URLHTMLTagTool, URLToPDFTextTool, USPTODownloaderTool, USPTOOpenDataPortalTool, UniProtRESTTool, XMLDatasetTool

Data & Configuration

  • data/*.json: Tool configuration manifests for each data source or category

  • data/packages/*: Package-related extension configurations

  • data/remote_tools/*: Remote tool/MCP definitions

  • toolsets/: Scenario-organized tool collections (bioinformatics/, research/, software_dev/)

MCP Integration & Servers

  • smcp.py: FastMCP wrapper providing SMCP and create_smcp_server

  • smcp_server.py: Package MCP server entry points (exposed via pyproject.toml CLI)

  • mcp_integration.py: Injects load_mcp_tools, discover_mcp_tools methods into ToolUniverse

  • mcp_tool_registry.py: MCP tool registry for URLs and tool discovery

  • Root smcp_tooluniverse_server.py: Simplified startup script for local quick server startup

External Ecosystem & Extension Examples

  • remote/: External system integrations including: - expert_feedback/: Human expert feedback system - expert_feedback_mcp/: MCP-enabled expert feedback - boltz/: Boltz protein folding integration - depmap_24q2/: DepMap cancer dependency data integration - immune_compass/: Immune system analysis tools - pinnacle/: Pinnacle platform integration - transcriptformer/: Transcriptformer model integration - uspto_downloader/: USPTO patent downloader service

Execution Flow (Configuration to Invocation)ΒΆ

  1. Configuration Loading - Engine startup reads default_tool_files and data/*.json to build tool manifest - Each JSON entry defines a tool instance: name, type, description, parameter (JSON Schema), endpoints, etc.

  2. Tool Registration & Mapping - tool_registry.py maintains β€œtool type β†’ tool class” mappings - Supports both full import discovery and lazy loading mappings (smart config-to-module matching)

  3. Instantiation & Default Configuration - Based on type, finds corresponding class (e.g., FDADrugLabelTool) - Merges BaseTool default configurations with entry-specific config

  4. Execution & Validation - ToolUniverse.run_tool(tool_name, params):

    • Locate instance by name β†’ Parameter validation (required fields) β†’ Call concrete implementation

    • Unified error handling and return structure

  5. Composition/Discovery & Graphs - Use compose_tool.py or compose_scripts/ for orchestration - Leverage tool_finder_* (keyword/embedding/LLM) for tool retrieval - Visualize tool relationships and call chains via scripts or tool_graph_web_ui.py

MCP IntegrationΒΆ

Server Side: - smcp.py provides SMCP object for one-click exposure of all ToolUniverse tools - smcp_server.py and root smcp_tooluniverse_server.py provide convenient startup - pyproject.toml exposes commands: tooluniverse-mcp, tooluniverse-smcp*, etc.

Client/Remote Tools: - mcp_client_tool.py, mcp_integration.py support discovery and dynamic registration from remote MCP servers - MCPAutoLoaderTool can auto-discover and batch-register remote tools by URL with configurable prefixes and timeouts - list_mcp_connections() shows loaded remote connections and tool counts

Configuration & Data ConventionsΒΆ

Tool Configuration Structure (data/*.json files):

{
  "name": "FDADrugLabelGetDrugGenericName",
  "type": "FDADrugLabelGetDrugGenericNameTool",
  "description": "Get generic name for an FDA drug label",
  "parameter": {
    "type": "object",
    "properties": {
      "drug_name": {"type": "string", "required": true}
    }
  },
  "endpoint": "https://api.fda.gov/drug/label.json",
  "method": "GET"
}

Naming & Mapping Conventions: - *_tools.json typically corresponds to *_tool.py modules - tool_registry.py performs smart matching - Can use @register_tool for explicit registration at class definition

Extension PointsΒΆ

Adding New Data Source Tools:

  1. Create xxx_tool.py in src/tooluniverse/ inheriting from BaseTool

  2. Use @register_tool(β€˜YourToolType’) for registration, or rely on naming conventions

  3. Add one or more tool entries in data/xxx_tools.json

Integrating Remote MCP Tools:

  • Use MCPAutoLoaderTool with server URL for auto-discovery

  • Or use ToolUniverse.load_mcp_tools([…]) for runtime dynamic loading

Composition & Workflows:

  • Use compose_tool.py or add scripts in compose_scripts/ for complex call chains

  • Leverage tool_finder_* for retrieval and routing assistance

Directory Quick ReferenceΒΆ

  • Core Package: src/tooluniverse/

  • Tool Implementations: Various *_tool.py files in same directory

  • Tool Configurations: src/tooluniverse/data/*.json

  • Tool Collections: src/tooluniverse/toolsets/

  • Composition Scripts: src/tooluniverse/compose_scripts/

  • MCP & Servers: src/tooluniverse/smcp.py, src/tooluniverse/smcp_server.py, root smcp_tooluniverse_server.py

  • External Integrations: src/tooluniverse/remote/

  • Visualization & Graphs: src/tooluniverse/scripts/, src/tooluniverse/tool_graph_web_ui.py

  • Tests: src/tooluniverse/test/

SummaryΒΆ

ToolUniverse provides a complete ecosystem from tool discovery and execution to remote integration (MCP) through clear registry mechanisms, standardized JSON configurations, and rich tool modules. You can quickly extend new data sources or capabilities by adding modules and configurations without modifying the engine. The composition and visualization tools enable building explainable, reusable scientific workflows.