TxAgent: An AI agent for therapeutic reasoning across a universe of tools

Author Affiliations
1Department of Biomedical Informatics, Harvard Medical School, Boston, MA 2Cardiovascular Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 3MIT Lincoln Laboratory, Lexington, MA 4Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA 5Broad Institute of MIT and Harvard, Cambridge, MA
6Harvard Data Science Initiative, Cambridge, MA

TxAgent model

TxAgent model

Abstract

Precision therapeutics require multimodal adaptive models that generate personalized treatment recommendations. We introduce TxAgent, an AI agent that leverages multi-step reasoning and real-time biomedical knowledge retrieval across a toolbox of 211 tools to analyze drug interactions, contraindications, and patient-specific treatment strategies. TxAgent evaluates how drugs interact at molecular, pharmacokinetic, and clinical levels, identifies contraindications based on patient comorbidities and concurrent medications, and tailors treatment strategies to individual patient characteristics, including age, genetic factors, and disease progression. TxAgent retrieves and synthesizes evidence from multiple biomedical sources, assesses interactions between drugs and patient conditions, and refines treatment recommendations through iterative reasoning. It selects tools based on task objectives and executes structured function calls to solve therapeutic tasks that require clinical reasoning and cross-source validation. The ToolUniverse consolidates 211 tools from trusted sources, including all US FDA-approved drugs since 1939 and validated clinical insights from Open Targets. TxAgent outperforms leading LLMs, tool-use models, and reasoning agents across five new benchmarks: DrugPC, BrandPC, GenericPC, TreatmentPC, and DescriptionPC, covering 3,168 drug reasoning tasks and 456 personalized treatment scenarios. It achieves 92.1% accuracy in open-ended drug reasoning tasks, surpassing GPT-4o by up to 25.8% and outperforming DeepSeek-R1 (671B) in structured multi-step reasoning. TxAgent generalizes across drug name variants and descriptions, maintaining a variance of <0.01 between brand, generic, and description-based drug references, exceeding existing tool-use LLMs by over 55%. By integrating multi-step inference, real-time knowledge grounding, and tool- assisted decision-making, TxAgent ensures that treatment recommendations align with established clinical guidelines and real-world evidence, reducing the risk of adverse events and improving therapeutic decision-making.

A Simple Guide to Using TxAgent

Install ToolUniverse: pip install tooluniverse
Install TxAgent: pip install txagent
Run TxAgent demo/script: https://github.com/mims-harvard/TxAgent

TxAgent capabilities

TxAgent Capabilities
  • Knowledge grounding using tool calls: TxAgent utilizes tools to obtain verified knowledge and provides outputs based on it.
  • Goal-oriented tool selection: TxAgent proactively requests tools from ToolUniverse using the ToolRAG model and selects and applies the most suitable tool from the available candidates.
  • Problem solving with multi-step reasoning: TxAgent manages complex tasks or unexpected responses from tools through multiple iterations of thought and function calls.
  • Leveraging constantly updated knowledge bases: TxAgent accesses continuously updated databases via tools to handle problems that go beyond the TxAgent’s intrinsic knowledge.

ToolUniverse

ToolUniverse is a critical component of TxAgent, providing the agent with the ability to access and leverage a vast array of biomedical knowledge to solve complex therapeutic reasoning tasks. ToolUniverse includes 211 biomedical tools that address various aspects of drugs and diseases. These tools are linked to trusted sources, including all US FDA-approved drugs since 1939 and validated clinical insights from Open Targets and Monarch Initiative.
Install ToolUniverse with one line of code: pip install tooluniverse

Training TxAgent: TxAgent-instruct dataset

TxAgent Capabilities

The multi-agent systems (i.e., ToolGen, QuestionGen, and TraceGen) construct the TxAgent-Instruct training dataset for instruction tuning LLM to achieve the capabilities of TxAgent. TxAgent-Instruct is a diverse, synthetic, multi-step reasoning and large-scale function call training dataset anchored in biomedical knowledge. It consists of 378,027 instruction-tuning data samples, with each sample generated by breaking down complete reasoning traces into step-by-step training data. These samples are derived from 85,340 multi-step reasoning traces, which collectively include 177,626 reasoning steps and 281,695 function calls.

Building TxAgent-Instruct: multi-agent systems

TxAgent Capabilities
  • ToolGen: A tool generation multi-agent system that transforms APIs into 211 agent-compatible tools, aggregating them into the ToolUniverse.
  • QuestionGen: A question generation multi-agent system designed to extract critical information from documents (e.g., FDA drug documentation) and generate relevant questions.
  • TraceGen: A reasoning trace generation multi-agent system, where a Helper agent and a Tool Provider module assist the Solver agent in generating step-by-step reasoning and function calls to solve a problem.

Drug reasoning on 11 tasks

TxAgent Capabilities

DrugPC (Drug Prescribing Card) benchmark includes 3,168 questions covering 11 common tasks related to drugs. TxAgent outperforms larger open-source LLMs and GPT-4 as well as existing tool-use LLMs across 11 tasks, excelling in both open-ended questions. These tasks cover various drug-related topics, including drug overview, ingredients, warnings and safety, dependence and abuse, dosage and administration, use in specific populations, pharmacology, clinical information, nonclinical toxicology, patient-focused information, and storage and supply.

Specialized treatment recommendations

TxAgent Capabilities

TreatmentPC (Treatment Prescribing Cards) benchmark includes 456 questions regarding specialized treatment recommendations. While multiple indications can be applied to a single disease, patients with specific conditions—such as pregnancy or comorbidities—require specialized treatment approaches, including customized drug selection and dosage adjustments. The TreatmentPC benchmark is designed to evaluate such specialized treatment scenarios with questions that account for the varying application conditions of drugs.
TxAgnet outperforms larger LLMs such as GPT-4o and Llama 3.1-70B-Instruct as well as tool-use LLMs in both open-ended and multiple-choice settings. TxAgent achieves superior performance compared to the full DeepSeek-R1 model (671B) and its two distilled versions based on Llama-3.1-8B and Llama-3.3-70B.

TxAgent demos

Get code in TxAgent! Lauch the TxAgent demo to see how TxAgent can assist in therapeutic reasoning across a universe of tools.

BibTeX

BibTex Code Here

Concat

If you have any questions or suggestions, please email Shanghua Gao and Marinka Zitnik.