Tdc Dataset Tools

Configuration File: tdc_dataset_tools.json Tool Type: Local Tools Count: 2

This page contains all tools defined in the tdc_dataset_tools.json configuration file.

Available Tools

TDC_list_datasets (Type: TDCDatasetTool)

List the available dataset names for a Therapeutics Data Commons (TDC) problem, via the PyTDC pac…

TDC_list_datasets tool specification

Tool Information:

  • Name: TDC_list_datasets

  • Type: TDCDatasetTool

  • Description: List the available dataset names for a Therapeutics Data Commons (TDC) problem, via the PyTDC package (tdc.utils.retrieve_dataset_names). Use this to discover valid ‘name’ values before calling TDC_load_dataset. single_pred problems: ADME, Tox, HTS, QM, Yields, Epitope, Develop. multi_pred problems: DTI, DDI, PPI, GDA, DrugRes, DrugSyn, PeptideMHC, AntibodyAff, MTI, Catalyst, TCREpitopeBinding, TrialOutcome. Returns normalized (lowercase) dataset names that are accepted case-insensitively by TDC_load_dataset. Requires the optional ‘PyTDC’ package; returns a clean error if it is not installed.

Parameters:

  • problem (string) (required) TDC problem (case-insensitive). single_pred: ADME, Tox, HTS, QM, Yields, Epitope, Develop. multi_pred: DTI, DDI, PPI, GDA, DrugRes, DrugSyn, PeptideMHC, AntibodyAff, MTI, Catalyst, TCREpitopeBinding, TrialOutcome.

Example Usage:

query = {
    "name": "TDC_list_datasets",
    "arguments": {
        "problem": "example_value"
    }
}
result = tu.run(query)

TDC_load_dataset (Type: TDCDatasetTool)

Load a named Therapeutics Data Commons (TDC) benchmark dataset locally via the PyTDC package and …

TDC_load_dataset tool specification

Tool Information:

  • Name: TDC_load_dataset

  • Type: TDCDatasetTool

  • Description: Load a named Therapeutics Data Commons (TDC) benchmark dataset locally via the PyTDC package and return a summary plus a small sample of rows. Distinct from ‘TDC_predict_oracle_score’ (which scores SMILES with oracles); this tool retrieves benchmark DATASETS. TDC datasets are organized by problem: single_pred problems are ‘ADME’, ‘Tox’, ‘HTS’, ‘QM’, ‘Yields’, ‘Epitope’, ‘Develop’; multi_pred problems are ‘DTI’, ‘DDI’, ‘PPI’, ‘GDA’, ‘DrugRes’, ‘DrugSyn’, ‘PeptideMHC’, ‘AntibodyAff’, ‘MTI’, ‘Catalyst’, ‘TCREpitopeBinding’, ‘TrialOutcome’. Returns n_rows, columns, a label summary (class distribution for classification labels or summary statistics for regression labels), train/valid/test split sizes, and a capped head() sample (default 5 rows, max 20). Datasets download on first use (network); single_pred datasets are the most reliable. Requires the optional ‘PyTDC’ package; returns a clean error if it is not installed or if a problem module cannot be imported in this environment. Use TDC_list_datasets to discover valid names for a problem.

Parameters:

  • problem (string) (required) TDC problem (case-insensitive). single_pred: ADME, Tox, HTS, QM, Yields, Epitope, Develop. multi_pred: DTI, DDI, PPI, GDA, DrugRes, DrugSyn, PeptideMHC, AntibodyAff, MTI, Catalyst, TCREpitopeBinding, TrialOutcome.

  • name (string) (required) Dataset name within the problem (case-insensitive). Example: ‘Caco2_Wang’ for ADME, ‘hERG’ for Tox. Use TDC_list_datasets to discover valid names.

  • sample_rows (integer) (optional) Number of head rows to include in the sample (default 5, max 20).

Example Usage:

query = {
    "name": "TDC_load_dataset",
    "arguments": {
        "problem": "example_value",
        "name": "example_value"
    }
}
result = tu.run(query)