Therapeutics Data Commons

Machine Learning Datasets for Therapeutics

Therapeutics Data Commons (TDC) is an open and extensive data hub that includes more than 50 machine learning-ready datasets across more than 20 therapeutic tasks, ranging from target discovery, activity screening, efficacy, safety, to manufacturing, covering small molecule, antibodies, miRNA and other therapeutics products.

TDC logo
TDC is a community-driven effort. If you want to contribute to TDC, please fill out this form!

3 Lines of Code

TDC uses minimum extra packages thus is installed hassle-free. Data loaders are simplified so that you can get access to ML-ready datasets within only 3 lines of code.

From Bench to Bedside

TDC covers a wide rage of established and emerging therapeutics tasks through the entire drug discovery and development pipelines.

Numerous Data Functions

TDC provides extensive data functions such as data evaluators, realistic splits, processing, molecule generation oracles, and so much more.

Loading a dataset is as simple as...

TDC Demo

Get Started