HTTP API - Remote Access¶
The ToolUniverse HTTP API server provides remote access to all ToolUniverse methods via HTTP/REST endpoints.
Key Feature: When you add or modify methods in ToolUniverse, the server and client automatically support them with zero manual updates!
Quick Start¶
Start Server¶
On the machine with ToolUniverse installed:
# Install full ToolUniverse package
pip install tooluniverse
# Start HTTP API server (defaults to loopback 127.0.0.1)
tooluniverse-http-api --port 8080
The server binds to 127.0.0.1 by default and is reachable only from the
local machine. Exposing it to the network requires authentication — see
Authentication and Network Exposure below.
Authentication and Network Exposure¶
The HTTP API can execute any ToolUniverse method, including the Python code executor, so it must never be reachable on an untrusted network without authentication.
Default bind is loopback (
127.0.0.1). The server refuses to bind to a non-loopback address (e.g.0.0.0.0) unlessTOOLUNIVERSE_API_TOKENis set.Set a token to enable Bearer authentication. When
TOOLUNIVERSE_API_TOKENis set, every request must includeAuthorization: Bearer <token>(only/healthis exempt). Generate a strong random value, e.g.python -c "import secrets; print(secrets.token_urlsafe(32))".
# Expose to the network with authentication required
export TOOLUNIVERSE_API_TOKEN="$(python -c 'import secrets; print(secrets.token_urlsafe(32))')"
tooluniverse-http-api --host 0.0.0.0 --port 8080
Clients authenticate by passing the token (or by setting TOOLUNIVERSE_API_TOKEN
in the client’s environment, which is picked up automatically):
from tooluniverse import ToolUniverseClient
client = ToolUniverseClient("http://your-server:8080", api_token="<token>")
The same TOOLUNIVERSE_API_TOKEN enables Bearer authentication on the SMCP
(MCP, streamable-http) server as well.
Use Client¶
Install minimal client on any machine:
pip install tooluniverse[client] # Only needs: requests + pydantic
from tooluniverse import ToolUniverseClient
client = ToolUniverseClient("http://your-server:8080")
client.load_tools(tool_type=['uniprot', 'ChEMBL'])
result = client.run_one_function({
"name": "UniProt_get_entry_by_accession",
"arguments": {"accession": "P05067"}
})
Client Usage Details¶
Official Client Features¶
The official ToolUniverseClient includes:
Automatic method discovery:
client.list_available_methods()Built-in help:
client.help("method_name")Health check:
client.health_check()Context manager support:
with ToolUniverseClient(url) as client:All ToolUniverse methods via dynamic proxy
Custom Client Implementation¶
If you need a custom client, ensure discovery methods are real methods (not proxied):
import requests
class CustomToolUniverseClient:
def __init__(self, url):
self.base_url = url.rstrip("/")
# Real methods for discovery endpoints (not proxied)
def list_available_methods(self):
"""Uses GET /api/methods (not POST /api/call)"""
response = requests.get(f"{self.base_url}/api/methods")
return response.json()["methods"]
def health_check(self):
"""Uses GET /health"""
response = requests.get(f"{self.base_url}/health")
return response.json()
# Proxy for ToolUniverse methods
def __getattr__(self, method_name):
"""Only ToolUniverse methods use POST /api/call"""
def proxy(**kwargs):
response = requests.post(
f"{self.base_url}/api/call",
json={"method": method_name, "kwargs": kwargs}
)
result = response.json()
if not result.get("success"):
raise Exception(f"{result['error_type']}: {result['error']}")
return result["result"]
return proxy
Important: list_available_methods() must be a real method, not proxied through __getattr__, because it needs to use GET /api/methods, not POST /api/call.
How It Works¶
Auto-Discovery (Server)¶
The server uses Python introspection to automatically discover all public methods:
import inspect
# Automatically discovers ALL public methods
for name, method in inspect.getmembers(ToolUniverse, inspect.isfunction):
if not name.startswith('_'):
# Extract signature, parameters, docstring
# Methods are now callable via HTTP!
Result: 49+ methods including load_tools, prepare_tool_prompts, tool_specification, run_one_function, etc.
Dynamic Proxying (Client)¶
The client uses __getattr__ magic to intercept any method call:
class ToolUniverseClient:
def __getattr__(self, method_name):
# Intercepts ANY method call
def proxy(**kwargs):
# Forwards to server via HTTP
return requests.post(url, json={
"method": method_name,
"kwargs": kwargs
})
return proxy
Workflow Example:
When you call client.load_tools(tool_type=['uniprot']):
Python looks for
load_toolsattribute → NOT FOUNDCalls
__getattr__("load_tools")→ Returns proxy functionProxy called with
tool_type=['uniprot']HTTP POST to server:
{"method": "load_tools", "kwargs": {...}}Server calls
tu.load_tools(tool_type=['uniprot'])Result returned to client
Result: ANY method works automatically, even future ones you haven’t written yet!
API Endpoints¶
The server exposes the following REST endpoints:
Endpoint |
Method |
Purpose |
Client Usage |
|---|---|---|---|
|
GET |
Server health check |
|
|
GET |
List all ToolUniverse methods |
|
|
POST |
Call any ToolUniverse method |
|
|
POST |
Reset ToolUniverse instance |
|
|
GET |
Interactive Swagger UI docs |
Open in browser |
|
GET |
Alternative ReDoc docs |
Open in browser |
Key distinction:
Discovery endpoints (
/health,/api/methods): Use GET, handled by client as real methodsExecution endpoint (
/api/call): Use POST, calls actual ToolUniverse methods
Example Usage¶
See examples/http_api_usage_example.py for comprehensive examples including:
Listing methods and getting help
Loading tools and getting specifications
Executing tools and checking health
Tool prompts preparation
Production Deployment¶
Important
The --host 0.0.0.0 examples below expose the server to the network and
therefore require TOOLUNIVERSE_API_TOKEN to be set — the server
refuses to bind to a non-loopback address without it. Prefix each command
with TOOLUNIVERSE_API_TOKEN=<token> (or export it). See
Authentication and Network Exposure.
GPU-Optimized Configuration (Recommended)¶
For GPU-based inference workloads (default, recommended):
# Single worker with async thread pool (default: 20 threads)
tooluniverse-http-api --host 0.0.0.0 --port 8080
# High concurrency: increase thread pool size
tooluniverse-http-api --host 0.0.0.0 --port 8080 --thread-pool-size 50
# Very high concurrency
tooluniverse-http-api --host 0.0.0.0 --port 8080 --thread-pool-size 100
Why single worker for GPU?
Single ToolUniverse instance → Single GPU model in memory (~2GB)
Multiple workers → Multiple GPU model copies (~16GB+ wasted memory)
High concurrency via async thread pool (20-100 concurrent operations)
Efficient GPU memory usage
Multi-Worker Configuration (CPU-Only Workloads)¶
Only use multiple workers for CPU-only workloads without GPU:
# Multiple workers (only for CPU-only operations)
tooluniverse-http-api --host 0.0.0.0 --port 8080 --workers 8
Warning: Multiple workers create separate ToolUniverse instances, each consuming GPU memory if GPU is used.
Development Mode¶
For development with auto-reload:
tooluniverse-http-api --host 127.0.0.1 --port 8080 --reload
Installation¶
Server:
pip install tooluniverseClient:
pip install tooluniverse[client](only requests + pydantic)
Testing¶
# Run tests
pytest tests/test_http_api_server.py -v
# Run examples
python examples/http_api_usage_example.py
Implementation Files¶
Core Implementation¶
src/tooluniverse/http_api_server.py- FastAPI server with auto-discoverysrc/tooluniverse/http_api_server_cli.py- CLI entry pointsrc/tooluniverse/http_client.py- Auto-proxying client (minimal dependencies)
Examples & Tests¶
examples/http_api_usage_example.py- 7 comprehensive usage examplestests/test_http_api_server.py- Comprehensive test suite
Configuration Options¶
Command-Line Arguments¶
tooluniverse-http-api --help
Available options:
--host- Host to bind to (default: 127.0.0.1)--port- Port to bind to (default: 8080)--workers- Number of worker processes (default: 1, recommended for GPU)--thread-pool-size- Async thread pool size per worker (default: 20)--log-level- Log level: debug, info, warning, error, critical--reload- Enable auto-reload for development
Environment Variables¶
# Set thread pool size via environment variable
export TOOLUNIVERSE_THREAD_POOL_SIZE=50
tooluniverse-http-api --host 0.0.0.0 --port 8080
Performance Tuning¶
For GPU workloads, scale concurrency with thread pool size:
# Low traffic (20 concurrent requests)
tooluniverse-http-api --thread-pool-size 20
# Medium traffic (50 concurrent requests)
tooluniverse-http-api --thread-pool-size 50
# High traffic (100 concurrent requests)
tooluniverse-http-api --thread-pool-size 100
Rule of thumb: thread_pool_size = GPU_batch_size × 2 to 5
Benefits¶
Zero Maintenance - Add ToolUniverse methods → They automatically work over HTTP
Minimal Client - Only needs
requests+pydantic(no ToolUniverse package)Full API Access - All 49+ ToolUniverse methods available remotely
Stateful - Server maintains ToolUniverse instance across requests
Type Discovery - Client can query available methods at runtime
Automatic - Both server and client use introspection/magic methods
Flexible Install - Server needs full package, client uses
tooluniverse[client]GPU-Optimized - Single worker with async thread pool for efficient GPU usage
High Concurrency - 20-100+ concurrent operations via async thread pool
Docker Deployment¶
With GPU Support¶
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
WORKDIR /app
COPY . .
RUN pip install tooluniverse uvicorn fastapi
EXPOSE 8080
# Single worker with high thread pool for GPU
CMD ["tooluniverse-http-api", \
"--host", "0.0.0.0", \
"--port", "8080", \
"--workers", "1", \
"--thread-pool-size", "50"]
Run with GPU:
docker run --gpus all -p 8080:8080 tooluniverse-api
Without GPU (CPU-Only)¶
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install tooluniverse uvicorn fastapi
EXPOSE 8080
# Multiple workers for CPU workloads
CMD ["tooluniverse-http-api", \
"--host", "0.0.0.0", \
"--port", "8080", \
"--workers", "4"]
Monitoring¶
GPU Memory Usage¶
Check GPU memory usage:
# Monitor GPU in real-time
watch -n 1 nvidia-smi
Expected output with single worker:
+-----------------------------------------------------------------------------+
| Processes: |
| GPU PID Type Process name GPU Memory |
|============================================================================|
| 0 12345 C python3 tooluniverse-http-api 2048MiB |
+-----------------------------------------------------------------------------+
Only ONE process should be using GPU memory.
Server Health¶
# Check server health
curl http://localhost:8080/health
# Monitor logs
tooluniverse-http-api --log-level info
Interactive Documentation¶
Once the server is running, you can access interactive API documentation:
Swagger UI: http://server:8080/docs
ReDoc: http://server:8080/redoc
These provide a web interface to explore and test all API endpoints.
Troubleshooting¶
Most Common Issue¶
Error: “Method ‘list_available_methods’ not found on ToolUniverse”
This is the most common error when using a custom client.
What’s happening:
Your custom client uses __getattr__ to proxy ALL method calls to POST /api/call, including list_available_methods(). But list_available_methods() is NOT a ToolUniverse method - it’s a client-side utility that should use GET /api/methods.
Why this happens:
# Your custom client code:
client.list_available_methods()
↓
__getattr__("list_available_methods") intercepts it
↓
POST /api/call {"method": "list_available_methods", "kwargs": {}}
↓
Server tries: tu.list_available_methods()
↓
❌ ERROR: "Method 'list_available_methods' not found on ToolUniverse"
The fix:
Wrong: Custom client that proxies everything
class ToolUniverseClient:
def __getattr__(self, method_name):
# This proxies EVERYTHING, including list_available_methods!
def proxy(**kwargs):
return requests.post(url, json={"method": method_name, "kwargs": kwargs})
return proxy
client.list_available_methods() # ❌ Tries to call it on ToolUniverse (doesn't exist)
**Solution**: Use the official client
from tooluniverse import ToolUniverseClient
client = ToolUniverseClient("http://server:8080")
methods = client.list_available_methods() # ✅ Works correctly
Or if implementing a custom client, see the “Custom Client Implementation” section above for the correct pattern.
Key insight: list_available_methods() must be a real method using GET /api/methods, not proxied through __getattr__ to POST /api/call.
Server Not Starting
# Check if port is in use
lsof -i :8080
# Use different port
tooluniverse-http-api --port 8081
High Memory Usage
If you see multiple worker processes consuming GPU memory:
# Use single worker (default)
tooluniverse-http-api --workers 1
# Increase thread pool instead
tooluniverse-http-api --thread-pool-size 50
Connection Refused
Ensure server is accessible:
# Listen on all interfaces
tooluniverse-http-api --host 0.0.0.0 --port 8080
# Check firewall
curl http://server:8080/health
Tool_RAG Tensor Copy Error
If you encounter a tensor copy error when using Tool_RAG:
RuntimeError: Trying to copy tensor from CPU to CUDA device...
This was a device mismatch bug where cached embeddings (CPU) didn’t match the model device (GPU).
Fixed in latest version:
Tool_RAG model automatically loads on GPU if available
Cached embeddings are automatically moved to model’s device
Device compatibility checks before tensor operations
Solution: Update to the latest version:
pip install --upgrade tooluniverse[embedding]
The model will now automatically use GPU when available for faster inference.