Literature Search Tools Tutorial

This comprehensive tutorial demonstrates how to use the literature search tools available in ToolUniverse for finding academic papers, preprints, and research articles across multiple databases.

Overview

ToolUniverse provides 15 powerful literature search tools that cover different aspects of academic research. All tools have been optimized with enhanced data extraction, comprehensive metadata, and data quality indicators:

Preprint Archives: - ArXiv - Physics, mathematics, computer science, and other fields - BioRxiv - Biology preprints - MedRxiv - Medical preprints - HAL - French research archive

Academic Databases: - Crossref - Scholarly articles with DOI metadata - PubMed - Medical and life sciences literature - Europe PMC - Biomedical literature - Semantic Scholar - AI-powered academic search - OpenAlex - Open academic graph - DBLP - Computer science bibliography

Open Access Tools: - DOAJ - Directory of Open Access Journals - Unpaywall - Open access status checker - CORE - World’s largest open access research papers collection - PMC - PubMed Central full-text biomedical literature - Zenodo - Open research data and publications repository

Tool Overview Table

Enhanced Features

All literature search tools have been optimized with the following enhancements:

Comprehensive Data Extraction: - Authors information with proper formatting - DOI (Digital Object Identifier) extraction - Citation counts where available - Open access status indicators - Keywords and subject terms - Article type classification - Publisher information - Data quality indicators for transparency

Improved Error Handling: - Enhanced rate limit handling with automatic retry - User-friendly error messages - Graceful handling of missing data - Consistent fallback values

Data Quality Transparency: - Each result includes a data_quality object - Boolean indicators for field availability - Clear indication of missing vs. unavailable data - Consistent data structure across all tools

Advanced Metadata: - HTML tag cleaning for abstracts - Abstract reconstruction from inverted indexes (OpenAlex) - Multi-mode API calls for comprehensive data (Europe PMC) - Enhanced author and affiliation extraction

Getting Started

First, let’s initialize ToolUniverse and load the literature search tools:

from tooluniverse import ToolUniverse

# Initialize ToolUniverse
tu = ToolUniverse()
tu.load_tools()

print("Available literature search tools:")
literature_tools = [
    "ArXiv_search_papers",
    "Crossref_search_works",
    "DBLP_search_publications",
    "PubMed_search_articles",
    "DOAJ_search_articles",
    "Unpaywall_check_oa_status",
    "BioRxiv_search_preprints",
    "MedRxiv_search_preprints",
    "HAL_search_archive",
    "SemanticScholar_search_papers",
    "openalex_literature_search",
    "EuropePMC_search_articles",
    "CORE_search_papers",
    "PMC_search_papers",
    "Zenodo_search_records"
]

for tool in literature_tools:
    print(f"- {tool}")

Basic Usage Pattern

All literature search tools follow a similar usage pattern:

# Basic search pattern
result = tu.run({
    "name": "tool_name",
    "arguments": {
    "query": "your search terms",
    "limit": 5  # number of results
    }
})

# Check if results were found
if isinstance(result, list) and len(result) > 0:
    print(f"Found {len(result)} results")
    for i, paper in enumerate(result, 1):
    print(f"{i}. {paper.get('title', 'No title')}")
    print(f"   Authors: {', '.join(paper.get('authors', [])[:3])}")
    print(f"   Year: {paper.get('year', 'Unknown')}")

    # Show data quality information
    if 'data_quality' in paper:
        quality = paper['data_quality']
        available_fields = [k for k, v in quality.items() if v]
        print(f"   Available data: {', '.join(available_fields)}")

    # Show additional metadata if available
    if paper.get('doi'):
        print(f"   DOI: {paper['doi']}")
    if paper.get('citations') or paper.get('citation_count'):
        citations = paper.get('citations') or paper.get('citation_count')
        print(f"   Citations: {citations}")
    if paper.get('open_access') is not None:
        print(f"   Open Access: {paper['open_access']}")
    print()
else:
    print(f"No results found or error: {result}")

Tool-Specific Examples

Preprint Archives

Search for preprints in specific fields:

# Biology preprints
biorxiv_results = tu.run({
    "name": "BioRxiv_search_preprints",
    "arguments": {
    "query": "CRISPR gene editing",
    "max_results": 2
    }
})

# Medical preprints
medrxiv_results = tu.run({
    "name": "MedRxiv_search_preprints",
    "arguments": {
    "query": "COVID-19 treatment",
    "max_results": 2
    }
})

# French research archive
hal_results = tu.run({
    "name": "HAL_search_archive",
    "arguments": {
    "query": "mathematics statistics",
    "max_results": 2
    }
})

Open Access Tools

Find open access articles and check access status:

# Search DOAJ for open access articles
doaj_articles = tu.run({
    "name": "DOAJ_search_articles",
    "arguments": {
    "query": "renewable energy",
    "max_results": 3,
    "type": "articles"
    }
})

# Search for open access journals
doaj_journals = tu.run({
    "name": "DOAJ_search_articles",
    "arguments": {
    "query": "biology",
    "max_results": 2,
    "type": "journals"
    }
})

# Check open access status
oa_status = tu.run({
    "name": "Unpaywall_check_oa_status",
    "arguments": {
    "doi": "10.1038/nature12373",
    "email": "your-email@example.com"
    }
})

Advanced Usage Patterns

Combining Multiple Searches

Search across multiple databases for comprehensive results:

def comprehensive_search(query, max_results=5):
    """Search across multiple literature databases."""
    results = {}

    # Search different databases
    databases = [
    ("ArXiv", "ArXiv_search_papers"),
    ("Crossref", "Crossref_search_works"),
    ("Semantic Scholar", "SemanticScholar_search_papers"),
    ("OpenAlex", "openalex_literature_search")
    ]

    for db_name, tool_name in databases:
    try:
        result = tu.run({
    "name": tool_name,
    "arguments": {"query": query, "limit": max_results}
        })
        results[db_name] = result if isinstance(result, list) else []
    except Exception as e:
        print(f"Error searching {db_name}: {e}")
        results[db_name] = []

    return results

# Use the comprehensive search
all_results = comprehensive_search("machine learning", 3)
for db, papers in all_results.items():
    print(f"\n{db}: {len(papers)} papers found")

Filtering and Sorting

Different tools offer various filtering and sorting options:

# ArXiv with sorting
sorted_papers = tu.run({
    "name": "ArXiv_search_papers",
    "arguments": {
    "query": "quantum physics",
    "limit": 5,
    "sort_by": "submittedDate",
    "sort_order": "descending"
    }
})

# OpenAlex with year filtering
recent_papers = tu.run({
    "name": "openalex_literature_search",
    "arguments": {
    "search_keywords": "artificial intelligence",
    "max_results": 5,
    "year_from": 2022,
    "year_to": 2024,
    "open_access": True
    }
})

# Crossref with type filtering
journal_articles = tu.run({
    "name": "Crossref_search_works",
    "arguments": {
    "query": "machine learning",
    "limit": 5,
    "filter": "type:journal-article,from-pub-date:2020-01-01"
    }
})

Error Handling

Always include proper error handling for robust applications:

def safe_search(tool_name, arguments):
    """Safely search with error handling."""
    try:
    result = tu.run({
        "name": tool_name,
        "arguments": arguments
    })

    if isinstance(result, list):
        return {"success": True, "data": result, "count": len(result)}
    elif isinstance(result, dict) and "error" in result:
        return {"success": False, "error": result["error"]}
    else:
        return {"success": False, "error": "Unexpected result format"}

    except Exception as e:
    return {"success": False, "error": str(e)}

# Use safe search
result = safe_search("ArXiv_search_papers", {
    "query": "machine learning",
    "limit": 3
})

if result["success"]:
    print(f"Found {result['count']} papers")
    for paper in result["data"]:
    print(f"- {paper.get('title', 'No title')}")
else:
    print(f"Search failed: {result['error']}")

Best Practices

  1. Choose the Right Tool:

  • Use ArXiv for preprints in physics, math, CS

  • Use PubMed for medical/life sciences

  • Use Semantic Scholar for AI-powered ranking

  • Use OpenAlex for comprehensive academic search

  1. Optimize Your Queries:

  • Use specific, relevant keywords

  • Combine terms with appropriate operators

  • Use filters to narrow results

  1. Handle Rate Limits:

  • Some APIs have rate limits

  • Implement delays between requests if needed

  • Use error handling for 429 (Too Many Requests) errors

  1. Process Results Efficiently:

  • Check result types before processing

  • Extract only needed fields

  • Implement pagination for large result sets

  1. Combine Multiple Sources:

  • Use different tools for comprehensive coverage

  • Cross-reference results for validation

  • Merge and deduplicate results

Complete Example

Here’s a complete example that demonstrates searching across multiple literature databases:

#!/usr/bin/env python3
"""
Comprehensive Literature Search Example
"""

from tooluniverse import ToolUniverse
import json

def main():
    # Initialize ToolUniverse
    tu = ToolUniverse()
    tu.load_tools()

    # Define search query
    query = "machine learning deep learning"

    # Define tools to search
    search_tools = [
    {
        "name": "ArXiv_search_papers",
        "description": "ArXiv Preprints",
        "args": {"query": query, "limit": 2, "sort_by": "relevance"}
    },
    {
        "name": "Crossref_search_works",
        "description": "Crossref Articles",
        "args": {"query": query, "limit": 2}
    },
    {
        "name": "SemanticScholar_search_papers",
        "description": "Semantic Scholar",
        "args": {"query": query, "limit": 2}
    },
    {
        "name": "openalex_literature_search",
        "description": "OpenAlex",
        "args": {"search_keywords": query, "max_results": 2}
    }
    ]

    print(f"Searching for: '{query}'")
    print("=" * 50)

    all_results = []

    for tool in search_tools:
    print(f"\nSearching {tool['description']}...")

    try:
        result = tu.run({
    "name": tool["name"],
    "arguments": tool["args"]
        })

        if isinstance(result, list) and len(result) > 0:
    print(f"✅ Found {len(result)} results")
    all_results.extend(result)

    # Show first result
    first_paper = result[0]
    print(f"📄 Sample: {first_paper.get('title', 'No title')[:60]}...")
        else:
    print(f"❌ No results or error: {result}")

    except Exception as e:
        print(f"❌ Exception: {str(e)[:100]}...")

    print(f"\n📊 Total papers found: {len(all_results)}")

    # Save results to file
    with open("literature_search_results.json", "w") as f:
    json.dump(all_results, f, indent=2, ensure_ascii=False)

    print("💾 Results saved to literature_search_results.json")

if __name__ == "__main__":
    main()

This tutorial provides a comprehensive guide to using ToolUniverse’s literature search tools. For more specific examples and advanced usage patterns, refer to the individual tool documentation in the API reference.

Troubleshooting

Common Issues and Solutions:

  1. API Rate Limits:

  • Some services have rate limits

  • Implement delays between requests

  • Use error handling for 429 errors

  1. Missing Results:

  • Check query spelling and keywords

  • Try different search terms

  • Verify tool parameters

  1. Authentication Errors:

  • Some tools require API keys

  • Check your .env file configuration

  • Verify API key validity

  1. Network Issues:

  • Check internet connection

  • Implement retry logic

  • Handle timeout exceptions

For more help, see the troubleshooting guide.

Quick Reference

Common Parameters

Most search tools accept these common parameters:

Quick Usage Examples

ArXiv Search: .. code-block:: python

result = tu.run({

“name”: “ArXiv_search_papers”, “arguments”: { “query”: “machine learning”, “limit”: 5, “sort_by”: “relevance” }

})

Crossref Search: .. code-block:: python

result = tu.run({

“name”: “Crossref_search_works”, “arguments”: { “query”: “artificial intelligence”, “limit”: 5, “filter”: “type:journal-article” }

})

PubMed Search: .. code-block:: python

result = tu.run({

“name”: “PubMed_search_articles”, “arguments”: { “query”: “cancer research”, “limit”: 5 }

})

Semantic Scholar Search: .. code-block:: python

result = tu.run({

“name”: “SemanticScholar_search_papers”, “arguments”: { “query”: “deep learning”, “limit”: 5 }

})

OpenAlex Search: .. code-block:: python

result = tu.run({

“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “artificial intelligence”, “max_results”: 5, “year_from”: 2020, “open_access”: True }

})

Preprint Archives: .. code-block:: python

# BioRxiv (Biology) result = tu.run({

“name”: “BioRxiv_search_preprints”, “arguments”: { “query”: “CRISPR”, “max_results”: 5 }

})

# MedRxiv (Medical) result = tu.run({

“name”: “MedRxiv_search_preprints”, “arguments”: { “query”: “COVID-19”, “max_results”: 5 }

})

# HAL (French Archive) result = tu.run({

“name”: “HAL_search_archive”, “arguments”: { “query”: “mathematics”, “max_results”: 5 }

})

Open Access Tools: .. code-block:: python

# DOAJ Search result = tu.run({

“name”: “DOAJ_search_articles”, “arguments”: { “query”: “renewable energy”, “max_results”: 5, “type”: “articles” }

})

# Unpaywall Check result = tu.run({

“name”: “Unpaywall_check_oa_status”, “arguments”: { “doi”: “10.1038/nature12373”, “email”: “your-email@example.com” }

})

Tool Selection Guide

Choose the right tool for your research field:

  1. Physics/Math/CS: ArXiv

  2. Medical/Life Sciences: PubMed, Europe PMC, PMC

  3. General Academic: Crossref, Semantic Scholar, OpenAlex

  4. Computer Science: DBLP

  5. Preprints: BioRxiv, MedRxiv, HAL

  6. Open Access: DOAJ, Unpaywall, CORE

  7. Full-Text Biomedical: PMC

  8. Comprehensive Open Access: CORE

  9. Research Data & Datasets: Zenodo

Optimized Tool Examples

Here are examples showing the enhanced features of the optimized tools:

Europe PMC with Enhanced Data: .. code-block:: python

# Europe PMC now provides comprehensive metadata result = tu.run({

“name”: “EuropePMC_search_articles”, “arguments”: { “query”: “machine learning”, “limit”: 2 }

})

if isinstance(result, list) and result:

paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Journal: {paper.get(‘journal’)}”) print(f”DOI: {paper.get(‘doi’)}”) print(f”Citations: {paper.get(‘citations’)}”) print(f”Open Access: {paper.get(‘open_access’)}”) print(f”Keywords: {paper.get(‘keywords’)}”) print(f”Data Quality: {paper.get(‘data_quality’)}”)

OpenAlex with Abstract Reconstruction: .. code-block:: python

# OpenAlex now reconstructs abstracts from inverted index result = tu.run({

“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “artificial intelligence”, “max_results”: 2 }

})

if isinstance(result, list) and result:

paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Abstract: {paper.get(‘abstract’)[:200]}…”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Venue: {paper.get(‘venue’)}”) print(f”Citation Count: {paper.get(‘citation_count’)}”) print(f”Keywords: {paper.get(‘keywords’)}”)

Semantic Scholar with Rate Limit Handling: .. code-block:: python

# Semantic Scholar now handles rate limits gracefully result = tu.run({

“name”: “SemanticScholar_search_papers”, “arguments”: { “query”: “deep learning”, “limit”: 2, “api_key”: “your_api_key_here” # Optional for higher limits }

})

if isinstance(result, list) and result:

paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Abstract: {paper.get(‘abstract’)}”) print(f”Journal: {paper.get(‘journal’)}”) print(f”Data Quality: {paper.get(‘data_quality’)}”)

Crossref with HTML Cleaning: .. code-block:: python

# Crossref now cleans HTML tags from abstracts result = tu.run({

“name”: “Crossref_search_works”, “arguments”: { “query”: “machine learning”, “limit”: 2 }

})

if isinstance(result, list) and result:

paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Clean Abstract: {paper.get(‘abstract’)}”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Publisher: {paper.get(‘publisher’)}”) print(f”Article Type: {paper.get(‘article_type’)}”)

Data Quality Analysis: .. code-block:: python

def analyze_data_quality(results):

“””Analyze data quality across multiple tools.””” if not isinstance(results, list): return

total_papers = len(results) quality_stats = { ‘has_abstract’: 0, ‘has_authors’: 0, ‘has_doi’: 0, ‘has_citations’: 0, ‘has_keywords’: 0 }

for paper in results: if ‘data_quality’ in paper:

for field, available in paper[‘data_quality’].items():

if field in quality_stats and available:

quality_stats[field] += 1

print(f”Data Quality Analysis ({total_papers} papers):”) for field, count in quality_stats.items(): percentage = (count / total_papers) * 100 print(f” {field}: {count}/{total_papers} ({percentage:.1f}%)”)

# Use with any search results result = tu.run({

“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “machine learning”, “max_results”: 5 }

})

if isinstance(result, list):

analyze_data_quality(result)