Openml Tools

Configuration File: openml_tools.json Tool Type: Local Tools Count: 2

This page contains all tools defined in the openml_tools.json configuration file.

Available Tools

OpenML_get_dataset (Type: BaseRESTTool)

Get detailed metadata for a specific OpenML dataset by its numeric ID. Returns comprehensive info…

OpenML_get_dataset tool specification

Tool Information:

  • Name: OpenML_get_dataset

  • Type: BaseRESTTool

  • Description: Get detailed metadata for a specific OpenML dataset by its numeric ID. Returns comprehensive information including description, data format, creator, collection date, license, download URLs (ARFF and Parquet), default target attribute, tags, citation info, and version history. Well-known dataset IDs include: 61 (iris), 40996 (Fashion-MNIST), 554 (mnist_784), 1169 (airlines), 31 (credit-g), 1590 (adult). Use OpenML_search_datasets to discover dataset IDs.

Parameters:

  • data_id (integer) (required) OpenML dataset ID number (e.g., 61 for iris, 554 for MNIST, 31 for credit-g). Obtain from OpenML_search_datasets.

Example Usage:

query = {
    "name": "OpenML_get_dataset",
    "arguments": {
        "data_id": 10
    }
}
result = tu.run(query)

OpenML_search_datasets (Type: BaseRESTTool)

Search the OpenML platform for machine learning benchmark datasets. OpenML hosts thousands of cur…

OpenML_search_datasets tool specification

Tool Information:

  • Name: OpenML_search_datasets

  • Type: BaseRESTTool

  • Description: Search the OpenML platform for machine learning benchmark datasets. OpenML hosts thousands of curated datasets with standardized metadata, quality metrics, and associated ML tasks/runs. Returns dataset IDs, names, versions, formats, and quality statistics (number of instances, features, classes, missing values). Use this to discover datasets for benchmarking, AutoML experiments, or finding datasets by name. Results are paginated; use limit to control page size. Use OpenML_get_dataset for full metadata on a specific dataset.

Parameters:

  • limit (integer) (required) Maximum number of datasets to return (default 20, max 10000). Controls pagination page size.

Example Usage:

query = {
    "name": "OpenML_search_datasets",
    "arguments": {
        "limit": 10
    }
}
result = tu.run(query)