Tdc Dataset Tools¶
Configuration File: tdc_dataset_tools.json
Tool Type: Local
Tools Count: 2
This page contains all tools defined in the tdc_dataset_tools.json configuration file.
Available Tools¶
TDC_list_datasets (Type: TDCDatasetTool)¶
List the available dataset names for a Therapeutics Data Commons (TDC) problem, via the PyTDC pac…
TDC_list_datasets tool specification
Tool Information:
Name:
TDC_list_datasetsType:
TDCDatasetToolDescription: List the available dataset names for a Therapeutics Data Commons (TDC) problem, via the PyTDC package (tdc.utils.retrieve_dataset_names). Use this to discover valid ‘name’ values before calling TDC_load_dataset. single_pred problems: ADME, Tox, HTS, QM, Yields, Epitope, Develop. multi_pred problems: DTI, DDI, PPI, GDA, DrugRes, DrugSyn, PeptideMHC, AntibodyAff, MTI, Catalyst, TCREpitopeBinding, TrialOutcome. Returns normalized (lowercase) dataset names that are accepted case-insensitively by TDC_load_dataset. Requires the optional ‘PyTDC’ package; returns a clean error if it is not installed.
Parameters:
problem(string) (required) TDC problem (case-insensitive). single_pred: ADME, Tox, HTS, QM, Yields, Epitope, Develop. multi_pred: DTI, DDI, PPI, GDA, DrugRes, DrugSyn, PeptideMHC, AntibodyAff, MTI, Catalyst, TCREpitopeBinding, TrialOutcome.
Example Usage:
query = {
"name": "TDC_list_datasets",
"arguments": {
"problem": "example_value"
}
}
result = tu.run(query)
TDC_load_dataset (Type: TDCDatasetTool)¶
Load a named Therapeutics Data Commons (TDC) benchmark dataset locally via the PyTDC package and …
TDC_load_dataset tool specification
Tool Information:
Name:
TDC_load_datasetType:
TDCDatasetToolDescription: Load a named Therapeutics Data Commons (TDC) benchmark dataset locally via the PyTDC package and return a summary plus a small sample of rows. Distinct from ‘TDC_predict_oracle_score’ (which scores SMILES with oracles); this tool retrieves benchmark DATASETS. TDC datasets are organized by problem: single_pred problems are ‘ADME’, ‘Tox’, ‘HTS’, ‘QM’, ‘Yields’, ‘Epitope’, ‘Develop’; multi_pred problems are ‘DTI’, ‘DDI’, ‘PPI’, ‘GDA’, ‘DrugRes’, ‘DrugSyn’, ‘PeptideMHC’, ‘AntibodyAff’, ‘MTI’, ‘Catalyst’, ‘TCREpitopeBinding’, ‘TrialOutcome’. Returns n_rows, columns, a label summary (class distribution for classification labels or summary statistics for regression labels), train/valid/test split sizes, and a capped head() sample (default 5 rows, max 20). Datasets download on first use (network); single_pred datasets are the most reliable. Requires the optional ‘PyTDC’ package; returns a clean error if it is not installed or if a problem module cannot be imported in this environment. Use TDC_list_datasets to discover valid names for a problem.
Parameters:
problem(string) (required) TDC problem (case-insensitive). single_pred: ADME, Tox, HTS, QM, Yields, Epitope, Develop. multi_pred: DTI, DDI, PPI, GDA, DrugRes, DrugSyn, PeptideMHC, AntibodyAff, MTI, Catalyst, TCREpitopeBinding, TrialOutcome.name(string) (required) Dataset name within the problem (case-insensitive). Example: ‘Caco2_Wang’ for ADME, ‘hERG’ for Tox. Use TDC_list_datasets to discover valid names.sample_rows(integer) (optional) Number of head rows to include in the sample (default 5, max 20).
Example Usage:
query = {
"name": "TDC_load_dataset",
"arguments": {
"problem": "example_value",
"name": "example_value"
}
}
result = tu.run(query)