ChatGPT APIΒΆ

Building AI Scientists with ChatGPT API and ToolUniverse

OverviewΒΆ

ChatGPT API integration enables powerful programmatic scientific research through OpenAI’s function calling capabilities. This approach provides a scalable, automatable interface for research while leveraging ChatGPT’s reasoning and ToolUniverse’s comprehensive scientific tools ecosystem.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   ChatGPT API   β”‚ ← Function Calling & Reasoning
β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚ Functions (OpenAI schema)
          β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
β”‚  ToolUniverse   β”‚ ← Tool Registry & Executor
β”‚   (Python SDK)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
β”‚ 600+ Scientific β”‚ ← Scientific Tools Ecosystem
β”‚     Tools       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benefits of ChatGPT API Integration:

  • Function Calling: Structured tool selection and execution with JSON arguments

  • Iterative Reasoning: Feedback loop using tool results to refine subsequent steps

  • Scalability: Dynamically grow the function list with discovered tools

  • Automation: Build research workflows and batch processing systems

  • Reusability: Clear system prompt and minimal boilerplate for new tasks

PrerequisitesΒΆ

Before setting up ChatGPT API integration, ensure you have:

  • OpenAI API Key: Valid API key with function calling access

  • ToolUniverse: Installed

  • Python 3.10+: For ToolUniverse and OpenAI SDK

  • API Keys: For specific tools or optional hooks (if required by your research domain)

Installation and SetupΒΆ

Step 1: Install Required PackagesΒΆ

Install ToolUniverse and OpenAI SDK:

pip install tooluniverse openai

Verify installation:

python -c "import tooluniverse, openai; print('Packages installed successfully')"

Step 2: Set Up API KeyΒΆ

Configure your OpenAI API key:

export OPENAI_API_KEY='your-api-key-here'

Or in Python:

import os
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

Step 3: Initialize ToolUniverseΒΆ

import os
import json
import openai
from tooluniverse import ToolUniverse

# Initialize components
client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
tu = ToolUniverse()
tu.load_tools()

Step 4: Configure System PromptΒΆ

Define the system prompt for iterative tool discovery and usage:

SYSTEM_PROMPT = (
    "You are a scientific assistant using ToolUniverse via function calling.\n"
    "Your job is to solve the user's task by: (1) discovering relevant tools, "
    "(2) selecting and calling the best tool for each step, and (3) iterating on results until the final answer is ready.\n\n"
    "Principles\n"
    "- If you need tools, first call the tool finder Tool_Finder with a short, precise description of the task.\n"
    "- Only call tools listed in the provided function list. Do not invent tools.\n"
    "- One step at a time: choose the single best next tool and provide minimal sufficient arguments.\n"
    "- After each tool result, update your plan. Decide: call another tool (with refined arguments), re-run the tool finder to expand available tools if necessary, or produce the final answer.\n"
    "- Handle tool errors gracefully: adjust parameters, pick an alternative tool, or ask for missing required inputs.\n"
    "- Prefer high-signal outputs: extract key findings, IDs, links, citations if available.\n"
    "- Stop when the answer is sufficient and accurate. Otherwise, continue iterating.\n\n"
    "Output Contract\n"
    "- During the process, respond with function calls only (no extra text), using valid JSON arguments.\n"
    "- When finished, return an assistant message (no function_call) with: \n"
    "  1) a concise answer; and\n"
    "  2) a short bullet summary of the steps/tools used."
)

Step 5: Set Up Dynamic Tool DiscoveryΒΆ

Start with the tool finder and expand functions dynamically:

# Start with only the tool finder
functions = tu.get_tool_specification_by_names(['Tool_Finder'], format='openai')
tool_finders = {'Tool_Finder'}

Step 6: Implement Chat LoopΒΆ

Create the iterative chat loop with function calling:

def run_with_dynamic_tools(query: str, max_iters: int = 12):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query}
    ]

    for _ in range(max_iters):
        resp = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            functions=functions,
            function_call="auto"
        )
        msg = resp.choices[0].message
        messages.append(msg)

        if msg.function_call:
            name = msg.function_call.name
            args = json.loads(msg.function_call.arguments or "{}")
            result = tu.run({"name": name, "arguments": args})

            # Expand functions if tool finder was called
            if name in tool_finders:
                try:
                    tool_names = [t['name'] for t in result][:8]
                    if tool_names:
                        functions += tu.get_tool_specification_by_names(tool_names, format='openai')
                except Exception:
                    pass

            messages.append({
                "role": "function",
                "name": name,
                "content": json.dumps(result, ensure_ascii=False)
            })
            continue

        return msg.content

    return "Reached max iterations without a final answer"

Step 7: Verify IntegrationΒΆ

Test the integration with a simple research query:

# Test basic functionality
result = run_with_dynamic_tools("Find recent CRISPR gene-editing papers and summarize key findings")
print(result)

Scientific Research CapabilitiesΒΆ

Drug Discovery and DevelopmentΒΆ

ChatGPT API with ToolUniverse enables comprehensive drug discovery workflows:

Target Identification: - Disease analysis and EFO ID lookup - Target discovery and validation - Literature-based target assessment

Drug Analysis: - Drug information retrieval from multiple databases - Safety profile analysis - Drug interaction checking - Clinical trial data access

Example Workflow:

result = run_with_dynamic_tools("Find FDA-approved drugs that target the EGFR protein and analyze their safety profiles")

Genomics and Molecular BiologyΒΆ

Access comprehensive genomics tools for molecular research:

Gene Analysis: - Gene information from UniProt - Protein structure analysis - Expression pattern analysis - Pathway involvement

Molecular Interactions: - Protein-protein interactions - Drug-target interactions - Pathway analysis - Functional annotation

Example Workflow:

result = run_with_dynamic_tools("Analyze the BRCA1 gene and its role in cancer development")

Literature Research and AnalysisΒΆ

Comprehensive literature search and analysis capabilities:

Literature Search: - PubMed, Europe PMC, and Semantic Scholar - Citation analysis and trend detection

Content Analysis: - Abstract summarization - Key finding extraction - Gap identification

Example Workflow:

result = run_with_dynamic_tools("Find recent papers about COVID-19 vaccine efficacy published in the last year")

Clinical Research and TrialsΒΆ

Access clinical trial data and regulatory information:

Clinical Trials: - ClinicalTrials.gov searches - Trial status and results analysis - Patient population assessment

Regulatory Information: - FDA drug approvals - Safety warnings and adverse events - Labeling information

Example Workflow:

result = run_with_dynamic_tools("Search clinical trials for diabetes treatment and analyze their outcomes")

Multi-Step Research WorkflowsΒΆ

ChatGPT API excels at complex, multi-step research workflows:

Hypothesis-Driven Research: 1. Formulate research hypothesis 2. Design experimental approach 3. Gather supporting evidence 4. Validate findings 5. Generate conclusions

Comparative Analysis: 1. Identify comparison targets 2. Gather data for each target 3. Perform comparative analysis 4. Identify differences and similarities 5. Draw conclusions

Settings and ConfigurationΒΆ

Tool Discovery OptionsΒΆ

ToolUniverse offers multiple discovery methods:

# 1) Keyword search (fast, recommended)
results = tu.run({
    'name': 'Tool_Finder_Keyword',
    'arguments': {
        'description': 'protein analysis',
        'limit': 5
    }
})

# 2) LLM-based search (requires API key, more intelligent)
results = tu.run({
    'name': 'Tool_Finder_LLM',
    'arguments': {
        'description': 'Find tools for drug safety analysis',
        'limit': 5
    }
})

# 3) Pattern search
results = tu.find_tools_by_pattern("ChEMBL")

Advanced ConfigurationΒΆ

Batch Processing: Handle multiple research tasks efficiently:

def batch_research(queries: list):
    results = []
    for query in queries:
        result = run_with_dynamic_tools(query)
        results.append(result)
    return results

Periodic Tool Discovery: Re-run the finder on intermediate results:

# Example: expand tools every 3 iterations
if iteration % 3 == 0:
    new_tools = tu.run({
        'name': 'Tool_Finder',
        'arguments': {'description': str(result), 'limit': 3}
    })
    functions += tu.get_tool_specification_by_names(
        [t['name'] for t in new_tools], format='openai'
    )

Custom Tool Sets: Load specific tools for focused research:

# Literature-focused research
literature_tools = [
    'EuropePMC_search_articles',
    'openalex_literature_search',
    'PubTator3_LiteratureSearch'
]
functions = tu.get_tool_specification_by_names(literature_tools, format='openai')

Full Example (All Steps Combined):

import os
import json
import openai
from tooluniverse import ToolUniverse

os.environ['OPENAI_API_KEY'] = 'your-api-key-here'

def run_with_dynamic_tools(query: str, max_iters: int = 12):
    client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
    tu = ToolUniverse(); tu.load_tools()

    SYSTEM_PROMPT = (
        "You are a scientific assistant using ToolUniverse via function calling.\n"
        "Your job is to solve the user's task by: (1) discovering relevant tools, "
        "(2) selecting and calling the best tool for each step, and (3) iterating on results until the final answer is ready.\n\n"
        "Principles\n"
        "- If you need tools, first call the tool finder Tool_Finder with a short, precise description of the task.\n"
        "- Only call tools listed in the provided function list. Do not invent tools.\n"
        "- One step at a time: choose the single best next tool and provide minimal sufficient arguments.\n"
        "- After each tool result, update your plan. Decide: call another tool (with refined arguments), re-run the tool finder to expand available tools if necessary, or produce the final answer.\n"
        "- Handle tool errors gracefully: adjust parameters, pick an alternative tool, or ask for missing required inputs.\n"
        "- Prefer high-signal outputs: extract key findings, IDs, links, citations if available.\n"
        "- Stop when the answer is sufficient and accurate. Otherwise, continue iterating.\n\n"
        "Output Contract\n"
        "- During the process, respond with function calls only (no extra text), using valid JSON arguments.\n"
        "- When finished, return an assistant message (no function_call) with: \n"
        "  1) a concise answer; and\n"
        "  2) a short bullet summary of the steps/tools used."
    )

    functions = tu.get_tool_specification_by_names(['Tool_Finder'], format='openai')
    tool_finders = {'Tool_Finder'}
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": query}
    ]

    for _ in range(max_iters):
        resp = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            functions=functions,
            function_call="auto"
        )
        msg = resp.choices[0].message
        messages.append(msg)

        if msg.function_call:
            name = msg.function_call.name
            args = json.loads(msg.function_call.arguments or "{}")
            result = tu.run({"name": name, "arguments": args})

            if name in tool_finders:
                try:
                    tool_names = [t['name'] for t in result][:8]
                    if tool_names:
                        functions += tu.get_tool_specification_by_names(tool_names, format='openai')
                except Exception:
                    pass

            messages.append({
                "role": "function",
                "name": name,
                "content": json.dumps(result, ensure_ascii=False)
            })
            continue

        return msg.content

    return "Reached max iterations without a final answer"

# Example usage
print(run_with_dynamic_tools("Find recent CRISPR gene-editing papers and summarize key findings"))

TroubleshootingΒΆ

Common Issues and SolutionsΒΆ

API Key Errors: - Ensure OPENAI_API_KEY is set correctly - Verify your API key has function calling access - Check your OpenAI account billing status

Tool Not Found: - Use the tool finder to discover available tools - Verify tool names match exactly - Check if tools require specific API keys

Function Call Errors: - Inspect tool parameter schemas using tu.tool_specification(β€œtool_name”) - Ensure JSON arguments are properly formatted - Handle missing required parameters gracefully

Performance Issues: - Limit the number of tools added to the function list - Use Tool_Finder_Keyword instead of Tool_Finder_LLM for faster discovery - Implement timeout handling for long-running tools

Debug Mode:

# Inspect a tool's OpenAI function schema
spec = tu.tool_specification("EuropePMC_search_articles")
import json; print(json.dumps(spec, indent=2))

TipsΒΆ

Tool Selection: Start with the tool finder, then add discovered tools incrementally.

Error Handling: Always wrap tool execution in try-catch blocks for production use.

Prompt Engineering: Customize the system prompt for domain-specific research tasks.

Batch Processing: Use the pattern for processing multiple queries efficiently.

Rate Limiting: Implement delays between API calls if you hit OpenAI rate limits.

Tip

Start Simple: Begin with basic research queries to understand the integration, then progress to complex multi-step workflows as you become familiar with the capabilities.

Note

Function Calling: ChatGPT API provides a powerful programmatic interface for scientific research, enabling both interactive exploration and automated batch processing of research tasks.