Literature Search Tools Tutorial¶

This comprehensive tutorial demonstrates how to use the literature search tools available in ToolUniverse for finding academic papers, preprints, and research articles across multiple databases.

Overview¶

ToolUniverse provides 15 powerful literature search tools that cover different aspects of academic research. All tools have been optimized with enhanced data extraction, comprehensive metadata, and data quality indicators:

Preprint Archives: - ArXiv - Physics, mathematics, computer science, and other fields - BioRxiv - Biology preprints - MedRxiv - Medical preprints - HAL - French research archive

Academic Databases: - Crossref - Scholarly articles with DOI metadata - PubMed - Medical and life sciences literature - Europe PMC - Biomedical literature - Semantic Scholar - AI-powered academic search - OpenAlex - Open academic graph - DBLP - Computer science bibliography

Open Access Tools: - DOAJ - Directory of Open Access Journals - Unpaywall - Open access status checker - CORE - World’s largest open access research papers collection - PMC - PubMed Central full-text biomedical literature - Zenodo - Open research data and publications repository

Tool Overview Table¶

Tool Name

Database

Best For

Key Features

ArXiv_search_ papers

ArXiv

Preprints

Physics, Math, CS, Biology

Crossref_search_ works

Crossref

Scholarly Articles

DOI metadata, Rich metadata, HTML cleaning

PubMed_search_ articles

PubMed

Medical/Life Sciences

Medical literature, MeSH keywords

SemanticScholar_ search_papers

Semantic Scholar

AI-powered Search

AI ranking, Citations, Rate limit handling

openalex_ literature_ search

OpenAlex

Comprehensive Academic Search

Open access, Year filtering, Abstract reconstruction

EuropePMC_ search_articles

Europe PMC

Biomedical Literature

European research, Core+Lite modes

DBLP_search_ publications

DBLP

Computer Science

CS bibliography, Conference papers

DOAJ_search_ articles

DOAJ

Open Access

Articles & Journals, HTML cleaning

BioRxiv_search_ preprints

BioRxiv

Biology Preprints

Biology preprints, Abstracts

MedRxiv_search_ preprints

MedRxiv

Medical Preprints

Medical preprints, Abstracts

HAL_search_ archive

HAL

French Research Archive

French academic papers

Unpaywall_ check_oa_status

Unpaywall

Open Access Status

OA status checking

Enhanced Features¶

All literature search tools have been optimized with the following enhancements:

Comprehensive Data Extraction: - Authors information with proper formatting - DOI (Digital Object Identifier) extraction - Citation counts where available - Open access status indicators - Keywords and subject terms - Article type classification - Publisher information - Data quality indicators for transparency

Improved Error Handling: - Enhanced rate limit handling with automatic retry - User-friendly error messages - Graceful handling of missing data - Consistent fallback values

Data Quality Transparency: - Each result includes a data_quality object - Boolean indicators for field availability - Clear indication of missing vs. unavailable data - Consistent data structure across all tools

Advanced Metadata: - HTML tag cleaning for abstracts - Abstract reconstruction from inverted indexes (OpenAlex) - Multi-mode API calls for comprehensive data (Europe PMC) - Enhanced author and affiliation extraction

Getting Started¶

First, let’s initialize ToolUniverse and load the literature search tools:

from tooluniverse import ToolUniverse

# Initialize ToolUniverse
tu = ToolUniverse()
tu.load_tools()

print("Available literature search tools:")
literature_tools = [
    "ArXiv_search_papers",
    "Crossref_search_works",
    "DBLP_search_publications",
    "PubMed_search_articles",
    "DOAJ_search_articles",
    "Unpaywall_check_oa_status",
    "BioRxiv_search_preprints",
    "MedRxiv_search_preprints",
    "HAL_search_archive",
    "SemanticScholar_search_papers",
    "openalex_literature_search",
    "EuropePMC_search_articles",
    "CORE_search_papers",
    "PMC_search_papers",
    "Zenodo_search_records"
]

for tool in literature_tools:
    print(f"- {tool}")

Basic Usage Pattern¶

All literature search tools follow a similar usage pattern:

# Basic search pattern
result = tu.run({
    "name": "tool_name",
    "arguments": {
    "query": "your search terms",
    "limit": 5  # number of results
    }
})

# Check if results were found
if isinstance(result, list) and len(result) > 0:
    print(f"Found {len(result)} results")
    for i, paper in enumerate(result, 1):
    print(f"{i}. {paper.get('title', 'No title')}")
    print(f"   Authors: {', '.join(paper.get('authors', [])[:3])}")
    print(f"   Year: {paper.get('year', 'Unknown')}")

    # Show data quality information
    if 'data_quality' in paper:
        quality = paper['data_quality']
        available_fields = [k for k, v in quality.items() if v]
        print(f"   Available data: {', '.join(available_fields)}")

    # Show additional metadata if available
    if paper.get('doi'):
        print(f"   DOI: {paper['doi']}")
    if paper.get('citations') or paper.get('citation_count'):
        citations = paper.get('citations') or paper.get('citation_count')
        print(f"   Citations: {citations}")
    if paper.get('open_access') is not None:
        print(f"   Open Access: {paper['open_access']}")
    print()
else:
    print(f"No results found or error: {result}")

Tool-Specific Examples¶

Preprint Archives¶

Search for preprints in specific fields:

# Biology preprints
biorxiv_results = tu.run({
    "name": "BioRxiv_search_preprints",
    "arguments": {
    "query": "CRISPR gene editing",
    "max_results": 2
    }
})

# Medical preprints
medrxiv_results = tu.run({
    "name": "MedRxiv_search_preprints",
    "arguments": {
    "query": "COVID-19 treatment",
    "max_results": 2
    }
})

# French research archive
hal_results = tu.run({
    "name": "HAL_search_archive",
    "arguments": {
    "query": "mathematics statistics",
    "max_results": 2
    }
})

Open Access Tools¶

Find open access articles and check access status:

# Search DOAJ for open access articles
doaj_articles = tu.run({
    "name": "DOAJ_search_articles",
    "arguments": {
    "query": "renewable energy",
    "max_results": 3,
    "type": "articles"
    }
})

# Search for open access journals
doaj_journals = tu.run({
    "name": "DOAJ_search_articles",
    "arguments": {
    "query": "biology",
    "max_results": 2,
    "type": "journals"
    }
})

# Check open access status
oa_status = tu.run({
    "name": "Unpaywall_check_oa_status",
    "arguments": {
    "doi": "10.1038/nature12373",
    "email": "your-email@example.com"
    }
})

Advanced Usage Patterns¶

Combining Multiple Searches¶

Search across multiple databases for comprehensive results:

def comprehensive_search(query, max_results=5):
    """Search across multiple literature databases."""
    results = {}

    # Search different databases
    databases = [
    ("ArXiv", "ArXiv_search_papers"),
    ("Crossref", "Crossref_search_works"),
    ("Semantic Scholar", "SemanticScholar_search_papers"),
    ("OpenAlex", "openalex_literature_search")
    ]

    for db_name, tool_name in databases:
    try:
        result = tu.run({
    "name": tool_name,
    "arguments": {"query": query, "limit": max_results}
        })
        results[db_name] = result if isinstance(result, list) else []
    except Exception as e:
        print(f"Error searching {db_name}: {e}")
        results[db_name] = []

    return results

# Use the comprehensive search
all_results = comprehensive_search("machine learning", 3)
for db, papers in all_results.items():
    print(f"\n{db}: {len(papers)} papers found")

Filtering and Sorting¶

Different tools offer various filtering and sorting options:

# ArXiv with sorting
sorted_papers = tu.run({
    "name": "ArXiv_search_papers",
    "arguments": {
    "query": "quantum physics",
    "limit": 5,
    "sort_by": "submittedDate",
    "sort_order": "descending"
    }
})

# OpenAlex with year filtering
recent_papers = tu.run({
    "name": "openalex_literature_search",
    "arguments": {
    "search_keywords": "artificial intelligence",
    "max_results": 5,
    "year_from": 2022,
    "year_to": 2024,
    "open_access": True
    }
})

# Crossref with type filtering
journal_articles = tu.run({
    "name": "Crossref_search_works",
    "arguments": {
    "query": "machine learning",
    "limit": 5,
    "filter": "type:journal-article,from-pub-date:2020-01-01"
    }
})

Error Handling¶

Always include proper error handling for robust applications:

def safe_search(tool_name, arguments):
    """Safely search with error handling."""
    try:
    result = tu.run({
        "name": tool_name,
        "arguments": arguments
    })

    if isinstance(result, list):
        return {"success": True, "data": result, "count": len(result)}
    elif isinstance(result, dict) and "error" in result:
        return {"success": False, "error": result["error"]}
    else:
        return {"success": False, "error": "Unexpected result format"}

    except Exception as e:
    return {"success": False, "error": str(e)}

# Use safe search
result = safe_search("ArXiv_search_papers", {
    "query": "machine learning",
    "limit": 3
})

if result["success"]:
    print(f"Found {result['count']} papers")
    for paper in result["data"]:
    print(f"- {paper.get('title', 'No title')}")
else:
    print(f"Search failed: {result['error']}")

Best Practices¶

  1. Choose the Right Tool: - Use ArXiv for preprints in physics, math, CS - Use PubMed for medical/life sciences - Use Semantic Scholar for AI-powered ranking - Use OpenAlex for comprehensive academic search

  2. Optimize Your Queries: - Use specific, relevant keywords - Combine terms with appropriate operators - Use filters to narrow results

  3. Handle Rate Limits: - Some APIs have rate limits - Implement delays between requests if needed - Use error handling for 429 (Too Many Requests) errors

  4. Process Results Efficiently: - Check result types before processing - Extract only needed fields - Implement pagination for large result sets

  5. Combine Multiple Sources: - Use different tools for comprehensive coverage - Cross-reference results for validation - Merge and deduplicate results

Complete Example¶

Here’s a complete example that demonstrates searching across multiple literature databases:

#!/usr/bin/env python3
"""
Comprehensive Literature Search Example
"""

from tooluniverse import ToolUniverse
import json

def main():
    # Initialize ToolUniverse
    tu = ToolUniverse()
    tu.load_tools()

    # Define search query
    query = "machine learning deep learning"

    # Define tools to search
    search_tools = [
    {
        "name": "ArXiv_search_papers",
        "description": "ArXiv Preprints",
        "args": {"query": query, "limit": 2, "sort_by": "relevance"}
    },
    {
        "name": "Crossref_search_works",
        "description": "Crossref Articles",
        "args": {"query": query, "limit": 2}
    },
    {
        "name": "SemanticScholar_search_papers",
        "description": "Semantic Scholar",
        "args": {"query": query, "limit": 2}
    },
    {
        "name": "openalex_literature_search",
        "description": "OpenAlex",
        "args": {"search_keywords": query, "max_results": 2}
    }
    ]

    print(f"Searching for: '{query}'")
    print("=" * 50)

    all_results = []

    for tool in search_tools:
    print(f"\nSearching {tool['description']}...")

    try:
        result = tu.run({
    "name": tool["name"],
    "arguments": tool["args"]
        })

        if isinstance(result, list) and len(result) > 0:
    print(f"✅ Found {len(result)} results")
    all_results.extend(result)

    # Show first result
    first_paper = result[0]
    print(f"📄 Sample: {first_paper.get('title', 'No title')[:60]}...")
        else:
    print(f"❌ No results or error: {result}")

    except Exception as e:
        print(f"❌ Exception: {str(e)[:100]}...")

    print(f"\n📊 Total papers found: {len(all_results)}")

    # Save results to file
    with open("literature_search_results.json", "w") as f:
    json.dump(all_results, f, indent=2, ensure_ascii=False)

    print("đŸ’Ÿ Results saved to literature_search_results.json")

if __name__ == "__main__":
    main()

This tutorial provides a comprehensive guide to using ToolUniverse’s literature search tools. For more specific examples and advanced usage patterns, refer to the individual tool documentation in the API reference.

Troubleshooting¶

Common Issues and Solutions:

  1. API Rate Limits: - Some services have rate limits - Implement delays between requests - Use error handling for 429 errors

  2. Missing Results: - Check query spelling and keywords - Try different search terms - Verify tool parameters

  3. Authentication Errors: - Some tools require API keys - Check your .env file configuration - Verify API key validity

  4. Network Issues: - Check internet connection - Implement retry logic - Handle timeout exceptions

For more help, see the troubleshooting guide.

Quick Reference¶

Common Parameters¶

Most search tools accept these common parameters:

Parameter

Type

Description

Example

query

string

Search terms

“machine learning”

limit

integer

Max results

5

max_results

integer

Max results (alternative)

5

sort_by

string

Sort order

“relevance”

sort_order

string

Sort direction

“descending”

filter

string

Result filters

“type:journal- article”

year_from

integer

Start year

2020

year_to

integer

End year

2024

open_access

boolean

Open access only

True

type

string

Content type

“articles”

email

string

Contact email (required)

“user@example. com”

Quick Usage Examples¶

ArXiv Search: .. code-block:: python

result = tu.run({

“name”: “ArXiv_search_papers”, “arguments”: { “query”: “machine learning”, “limit”: 5, “sort_by”: “relevance” }

})

Crossref Search: .. code-block:: python

result = tu.run({

“name”: “Crossref_search_works”, “arguments”: { “query”: “artificial intelligence”, “limit”: 5, “filter”: “type:journal-article” }

})

PubMed Search: .. code-block:: python

result = tu.run({

“name”: “PubMed_search_articles”, “arguments”: { “query”: “cancer research”, “limit”: 5 }

})

Semantic Scholar Search: .. code-block:: python

result = tu.run({

“name”: “SemanticScholar_search_papers”, “arguments”: { “query”: “deep learning”, “limit”: 5 }

})

OpenAlex Search: .. code-block:: python

result = tu.run({

“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “artificial intelligence”, “max_results”: 5, “year_from”: 2020, “open_access”: True }

})

Preprint Archives: .. code-block:: python

# BioRxiv (Biology) result = tu.run({

“name”: “BioRxiv_search_preprints”, “arguments”: { “query”: “CRISPR”, “max_results”: 5 }

})

# MedRxiv (Medical) result = tu.run({

“name”: “MedRxiv_search_preprints”, “arguments”: { “query”: “COVID-19”, “max_results”: 5 }

})

# HAL (French Archive) result = tu.run({

“name”: “HAL_search_archive”, “arguments”: { “query”: “mathematics”, “max_results”: 5 }

})

Open Access Tools: .. code-block:: python

# DOAJ Search result = tu.run({

“name”: “DOAJ_search_articles”, “arguments”: { “query”: “renewable energy”, “max_results”: 5, “type”: “articles” }

})

# Unpaywall Check result = tu.run({

“name”: “Unpaywall_check_oa_status”, “arguments”: { “doi”: “10.1038/nature12373”, “email”: “your-email@example.com” }

})

Tool Selection Guide¶

Choose the right tool for your research field:

  1. Physics/Math/CS: ArXiv

  2. Medical/Life Sciences: PubMed, Europe PMC, PMC

  3. General Academic: Crossref, Semantic Scholar, OpenAlex

  4. Computer Science: DBLP

  5. Preprints: BioRxiv, MedRxiv, HAL

  6. Open Access: DOAJ, Unpaywall, CORE

  7. Full-Text Biomedical: PMC

  8. Comprehensive Open Access: CORE

  9. Research Data & Datasets: Zenodo

Optimized Tool Examples¶

Here are examples showing the enhanced features of the optimized tools:

Europe PMC with Enhanced Data: .. code-block:: python

# Europe PMC now provides comprehensive metadata result = tu.run({

“name”: “EuropePMC_search_articles”, “arguments”: { “query”: “machine learning”, “limit”: 2 }

})

if isinstance(result, list) and result:

paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Journal: {paper.get(‘journal’)}”) print(f”DOI: {paper.get(‘doi’)}”) print(f”Citations: {paper.get(‘citations’)}”) print(f”Open Access: {paper.get(‘open_access’)}”) print(f”Keywords: {paper.get(‘keywords’)}”) print(f”Data Quality: {paper.get(‘data_quality’)}”)

OpenAlex with Abstract Reconstruction: .. code-block:: python

# OpenAlex now reconstructs abstracts from inverted index result = tu.run({

“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “artificial intelligence”, “max_results”: 2 }

})

if isinstance(result, list) and result:

paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Abstract: {paper.get(‘abstract’)[:200]}
”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Venue: {paper.get(‘venue’)}”) print(f”Citation Count: {paper.get(‘citation_count’)}”) print(f”Keywords: {paper.get(‘keywords’)}”)

Semantic Scholar with Rate Limit Handling: .. code-block:: python

# Semantic Scholar now handles rate limits gracefully result = tu.run({

“name”: “SemanticScholar_search_papers”, “arguments”: { “query”: “deep learning”, “limit”: 2, “api_key”: “your_api_key_here” # Optional for higher limits }

})

if isinstance(result, list) and result:

paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Abstract: {paper.get(‘abstract’)}”) print(f”Journal: {paper.get(‘journal’)}”) print(f”Data Quality: {paper.get(‘data_quality’)}”)

Crossref with HTML Cleaning: .. code-block:: python

# Crossref now cleans HTML tags from abstracts result = tu.run({

“name”: “Crossref_search_works”, “arguments”: { “query”: “machine learning”, “limit”: 2 }

})

if isinstance(result, list) and result:

paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Clean Abstract: {paper.get(‘abstract’)}”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Publisher: {paper.get(‘publisher’)}”) print(f”Article Type: {paper.get(‘article_type’)}”)

Data Quality Analysis: .. code-block:: python

def analyze_data_quality(results):

“””Analyze data quality across multiple tools.””” if not isinstance(results, list): return

total_papers = len(results) quality_stats = { ‘has_abstract’: 0, ‘has_authors’: 0, ‘has_doi’: 0, ‘has_citations’: 0, ‘has_keywords’: 0 }

for paper in results: if ‘data_quality’ in paper:

for field, available in paper[‘data_quality’].items():

if field in quality_stats and available:

quality_stats[field] += 1

print(f”Data Quality Analysis ({total_papers} papers):”) for field, count in quality_stats.items(): percentage = (count / total_papers) * 100 print(f” {field}: {count}/{total_papers} ({percentage:.1f}%)”)

# Use with any search results result = tu.run({

“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “machine learning”, “max_results”: 5 }

})

if isinstance(result, list):

analyze_data_quality(result)