ToolUniverse ArchitectureΒΆ
This document provides a comprehensive overview of ToolUniverseβs code architecture, directory organization, core components, tool discovery/execution flow, MCP integration, and extension points.
OverviewΒΆ
ToolUniverse follows a modular, registry-based architecture centered around the unified ToolUniverse engine. It connects to massive scientific databases and APIs through tool registration, configuration, and auto-discovery, providing a consistent interface for upper-layer agents, applications, and MCP clients.
ββββββββββββββββββββββ
β Applications/Agentsβ β Your business logic, conversational systems, scripts
ββββββββββββ¬ββββββββββ
β SDK/MCP
ββββββββββββΌββββββββββ
β ToolUniverse Core β β Tool loading, registration, routing, execution
ββββββββββββ¬ββββββββββ
β Registry/Config
ββββββββββββΌββββββββββ
β Tool Implementationβ β OpenFDA, OpenTargets, UniProt, PubChem, GWAS...
β Modules β
ββββββββββββ¬ββββββββββ
β HTTP/GraphQL/Local
ββββββββββββΌββββββββββ
βExternal Services/DBβ
ββββββββββββββββββββββ
Repository Structure TreeΒΆ
ToolUniverse/
βββ src/tooluniverse/ # Core package directory
β βββ __init__.py # Package exports, lazy loading control
β β
β βββ # Core Engine & Registry
β βββ execute_function.py # ToolUniverse main engine class
β βββ base_tool.py # BaseTool base class & exceptions
β βββ tool_registry.py # Tool registration & discovery
β βββ default_config.py # Default tool file configurations
β βββ logging_config.py # Logging setup
β βββ utils.py # Utility functions
β β
β βββ # Tool Implementation Modules
β βββ openfda_tool.py # FDA drug labels & data
β βββ openfda_adv_tool.py # FDA adverse events
β βββ ctg_tool.py # ClinicalTrials.gov
β βββ graphql_tool.py # OpenTargets GraphQL APIs
β βββ uniprot_tool.py # UniProt protein database
β βββ pubchem_tool.py # PubChem chemical database
β βββ reactome_tool.py # Reactome pathway database
β βββ europe_pmc_tool.py # Europe PMC literature
β βββ semantic_scholar_tool.py # Semantic Scholar papers
β βββ gwas_tool.py # GWAS Catalog genetics
β βββ hpa_tool.py # Human Protein Atlas
β βββ rcsb_pdb_tool.py # Protein Data Bank
β βββ medlineplus_tool.py # MedlinePlus health info
β βββ restful_tool.py # Generic REST APIs (Monarch)
β βββ url_tool.py # Web scraping & PDF extraction
β βββ pubtator_tool.py # PubTator literature mining
β βββ xml_tool.py # XML data processing
β βββ admetai_tool.py # ADMET AI predictions
β βββ alphafold_tool.py # AlphaFold protein structures
β βββ chem_tool.py # ChEMBL chemical bioactivity
β βββ compose_tool.py # Tool composition & workflows
β βββ package_tool.py # Local package tools
β βββ dataset_tool.py # Local dataset access
β βββ mcp_client_tool.py # MCP client for remote tools
β βββ remote_tool.py # Remote tool abstractions
β βββ agentic_tool.py # Agentic behavior tools
β βββ enrichr_tool.py # Enrichr gene set analysis
β βββ efo_tool.py # Experimental Factor Ontology
β βββ gene_ontology_tool.py # Gene Ontology
β βββ humanbase_tool.py # HumanBase networks
β βββ dailymed_tool.py # DailyMed drug labels
β βββ uspto_tool.py # USPTO patent data
β βββ uspto_downloader_tool.py # USPTO bulk downloads
β βββ openalex_tool.py # OpenAlex scholarly data
β βββ boltz_tool.py # Boltz protein folding
β β
β βββ # Tool Discovery & Search
β βββ tool_finder_keyword.py # Keyword-based tool search
β βββ tool_finder_embedding.py # Embedding-based tool search
β βββ tool_finder_llm.py # LLM-powered tool discovery
β βββ embedding_database.py # Tool embedding database
β βββ embedding_sync.py # Embedding synchronization
β β
β βββ # MCP Integration & Servers
β βββ smcp.py # FastMCP wrapper (SMCP class)
β βββ smcp_server.py # MCP server entry points
β βββ mcp_integration.py # ToolUniverse MCP methods injection
β βββ mcp_tool_registry.py # MCP tool registry & URLs
β β
β βββ # Configuration & Data
β βββ data/ # Tool configurations
β β βββ *.json # Tool instance definitions
β β βββ packages/ # Package-related configs
β β βββ remote_tools/ # Remote/MCP tool definitions
β β
β βββ # Tool Collections & Workflows
β βββ toolsets/ # Organized tool collections
β β βββ bioinformatics/ # Bioinformatics toolset
β β βββ research/ # Research toolset
β β βββ software_dev/ # Software development tools
β β
β βββ compose_scripts/ # Workflow composition scripts
β β βββ __init__.py
β β βββ biomarker_discovery.py # Biomarker discovery workflow
β β βββ comprehensive_drug_discovery.py # Drug discovery pipeline
β β βββ drug_safety_analyzer.py # Drug safety analysis
β β βββ literature_tool.py # Literature analysis
β β βββ output_summarizer.py # Result summarization
β β βββ tool_description_optimizer.py # Tool description optimization
β β βββ tool_discover.py # Tool discovery workflows
β β βββ tool_graph_composer.py # Tool graph composition
β β
β βββ # External Integrations & Examples
β βββ remote/ # External system integrations
β β βββ expert_feedback/ # Human expert feedback system
β β βββ expert_feedback_mcp/ # MCP-enabled expert feedback
β β βββ boltz/ # Boltz integration
β β βββ depmap_24q2/ # DepMap data integration
β β βββ immune_compass/ # Immune system tools
β β βββ pinnacle/ # Pinnacle integration
β β βββ transcriptformer/ # Transcriptformer model
β β βββ uspto_downloader/ # USPTO downloader service
β β
β βββ # Visualization & UI
β βββ scripts/ # Utility scripts
β β βββ generate_tool_graph.py # Tool graph generation
β β βββ visualize_tool_graph.py # Tool graph visualization
β βββ tool_graph_web_ui.py # Web-based tool graph UI
β β
β βββ # Configuration Templates
β βββ template/ # Configuration templates
β β βββ file_save_hook_config.json # File save hook template
β β βββ hook_config.json # General hook template
β β
β βββ # Output Processing
β βββ output_hook.py # Output processing hooks
β βββ extended_hooks.py # Extended hook functionality
β β
β βββ # Testing
β βββ test/ # Unit & integration tests
β βββ *.py # Test modules
β βββ *.xml # Test data
β βββ *.parquet # Test datasets
β
βββ # Documentation
βββ docs/ # Sphinx documentation
β βββ _build/ # Built documentation
β βββ _static/ # Static assets
β βββ _templates/ # Doc templates
β βββ api/ # API documentation
β βββ expand_tooluniverse/ # Extension guides
β βββ guide/ # User guides
β βββ reference/ # Reference docs
β βββ tutorials/ # Tutorials
β βββ *.rst # Documentation source
β
βββ # Root-level Files
βββ pyproject.toml # Project config, dependencies, CLI
βββ smcp_tooluniverse_server.py # Simplified MCP server launcher
βββ README.md # Project overview
βββ README_USAGE.md # Usage documentation
βββ LICENSE # License file
βββ uv.lock # UV lock file
β
βββ # Build & Meta
βββ build_docs.sh # Documentation build script
βββ internal/ # Internal data & utilities
βββ img/ # Images & assets
βββ generated_tool_* # Generated tool files
Core ComponentsΒΆ
Engine & Registry
execute_function.py: Core ToolUniverse engine class responsible for: - Reading tool configurations (local JSON, default configs) and building all_tools/all_tool_dict - Mapping tool types to concrete classes (tool_type_mappings) and instantiation - Tool execution routing (run_tool), validation, and result processing - Handling MCP auto-loaders, temporary clients (with mcp_integration.py)
base_tool.py: BaseTool base class and exception types. Supports: - Loading default configurations from tooluniverse.data package - Parameter validation, required parameter extraction, function call validation
tool_registry.py: Tool registration and discovery: - @register_tool decorator for registering tool classes - Lazy loading registry (on-demand module imports) and full discovery - Smart matching of configuration JSON to modules and tool types
default_config.py: Default tool configuration file list
logging_config.py, utils.py: Logging setup and utility functions
Tool Implementation Classes
Available tool classes (alphabetically organized):
ADMETAITool, AgenticTool, AlphaFoldRESTTool, BoltzTool, ChEMBLTool, ClinicalTrialsDetailsTool, ClinicalTrialsSearchTool, ComposeTool, DatasetTool, DiseaseTargetScoreTool, EFOTool, EmbeddingDatabase, EmbeddingSync, EnrichrTool, EuropePMCTool, FDACountAdditiveReactionsTool, FDADrugAdverseEventTool, FDADrugLabelGetDrugGenericNameTool, FDADrugLabelSearchIDTool, FDADrugLabelSearchTool, FDADrugLabelTool, GWASAssociationByID, GWASAssociationSearch, GWASAssociationsForSNP, GWASAssociationsForStudy, GWASAssociationsForTrait, GWASSNPByID, GWASSNPSearch, GWASSNPsForGene, GWASStudiesForTrait, GWASStudyByID, GWASStudySearch, GWASVariantsForTrait, GeneOntologyTool, GetSPLBySetIDTool, HPAGetGeneJSONTool, HPAGetGeneXMLTool, HumanBaseTool, MCPAutoLoaderTool, MCPClientTool, MedlinePlusRESTTool, MonarchDiseasesForMultiplePhenoTool, MonarchTool, OpenAlexTool, OpentargetGeneticsTool, OpentargetTool, OpentargetToolDrugNameMatch, PackageTool, PubChemRESTTool, PubTatorTool, RCSBTool, ReactomeRESTTool, RemoteTool, SearchSPLTool, SemanticScholarTool, ToolFinderEmbedding, ToolFinderKeyword, ToolFinderLLM, URLHTMLTagTool, URLToPDFTextTool, USPTODownloaderTool, USPTOOpenDataPortalTool, UniProtRESTTool, XMLDatasetTool
Data & Configuration
data/*.json: Tool configuration manifests for each data source or category
data/packages/*: Package-related extension configurations
data/remote_tools/*: Remote tool/MCP definitions
toolsets/: Scenario-organized tool collections (bioinformatics/, research/, software_dev/)
MCP Integration & Servers
smcp.py: FastMCP wrapper providing SMCP and create_smcp_server
smcp_server.py: Package MCP server entry points (exposed via pyproject.toml CLI)
mcp_integration.py: Injects load_mcp_tools, discover_mcp_tools methods into ToolUniverse
mcp_tool_registry.py: MCP tool registry for URLs and tool discovery
Root smcp_tooluniverse_server.py: Simplified startup script for local quick server startup
External Ecosystem & Extension Examples
remote/: External system integrations including: - expert_feedback/: Human expert feedback system - expert_feedback_mcp/: MCP-enabled expert feedback - boltz/: Boltz protein folding integration - depmap_24q2/: DepMap cancer dependency data integration - immune_compass/: Immune system analysis tools - pinnacle/: Pinnacle platform integration - transcriptformer/: Transcriptformer model integration - uspto_downloader/: USPTO patent downloader service
Execution Flow (Configuration to Invocation)ΒΆ
Configuration Loading - Engine startup reads default_tool_files and data/*.json to build tool manifest - Each JSON entry defines a tool instance: name, type, description, parameter (JSON Schema), endpoints, etc.
Tool Registration & Mapping - tool_registry.py maintains βtool type β tool classβ mappings - Supports both full import discovery and lazy loading mappings (smart config-to-module matching)
Instantiation & Default Configuration - Based on type, finds corresponding class (e.g., FDADrugLabelTool) - Merges BaseTool default configurations with entry-specific config
Execution & Validation - ToolUniverse.run_tool(tool_name, params):
Locate instance by name β Parameter validation (required fields) β Call concrete implementation
Unified error handling and return structure
Composition/Discovery & Graphs - Use compose_tool.py or compose_scripts/ for orchestration - Leverage tool_finder_* (keyword/embedding/LLM) for tool retrieval - Visualize tool relationships and call chains via scripts or tool_graph_web_ui.py
MCP IntegrationΒΆ
Server Side: - smcp.py provides SMCP object for one-click exposure of all ToolUniverse tools - smcp_server.py and root smcp_tooluniverse_server.py provide convenient startup - pyproject.toml exposes commands: tooluniverse-mcp, tooluniverse-smcp*, etc.
Client/Remote Tools: - mcp_client_tool.py, mcp_integration.py support discovery and dynamic registration from remote MCP servers - MCPAutoLoaderTool can auto-discover and batch-register remote tools by URL with configurable prefixes and timeouts - list_mcp_connections() shows loaded remote connections and tool counts
Configuration & Data ConventionsΒΆ
Tool Configuration Structure (data/*.json files):
{
"name": "FDADrugLabelGetDrugGenericName",
"type": "FDADrugLabelGetDrugGenericNameTool",
"description": "Get generic name for an FDA drug label",
"parameter": {
"type": "object",
"properties": {
"drug_name": {"type": "string", "required": true}
}
},
"endpoint": "https://api.fda.gov/drug/label.json",
"method": "GET"
}
Naming & Mapping Conventions: - *_tools.json typically corresponds to *_tool.py modules - tool_registry.py performs smart matching - Can use @register_tool for explicit registration at class definition
Extension PointsΒΆ
Adding New Data Source Tools:
Create xxx_tool.py in src/tooluniverse/ inheriting from BaseTool
Use @register_tool(βYourToolTypeβ) for registration, or rely on naming conventions
Add one or more tool entries in data/xxx_tools.json
Integrating Remote MCP Tools:
Use MCPAutoLoaderTool with server URL for auto-discovery
Or use ToolUniverse.load_mcp_tools([β¦]) for runtime dynamic loading
Composition & Workflows:
Use compose_tool.py or add scripts in compose_scripts/ for complex call chains
Leverage tool_finder_* for retrieval and routing assistance
Directory Quick ReferenceΒΆ
Core Package: src/tooluniverse/
Tool Implementations: Various *_tool.py files in same directory
Tool Configurations: src/tooluniverse/data/*.json
Tool Collections: src/tooluniverse/toolsets/
Composition Scripts: src/tooluniverse/compose_scripts/
MCP & Servers: src/tooluniverse/smcp.py, src/tooluniverse/smcp_server.py, root smcp_tooluniverse_server.py
External Integrations: src/tooluniverse/remote/
Visualization & Graphs: src/tooluniverse/scripts/, src/tooluniverse/tool_graph_web_ui.py
Tests: src/tooluniverse/test/
SummaryΒΆ
ToolUniverse provides a complete ecosystem from tool discovery and execution to remote integration (MCP) through clear registry mechanisms, standardized JSON configurations, and rich tool modules. You can quickly extend new data sources or capabilities by adding modules and configurations without modifying the engine. The composition and visualization tools enable building explainable, reusable scientific workflows.