Literature Search Tools Tutorial¶

This comprehensive tutorial demonstrates how to use the literature search tools available in ToolUniverse for finding academic papers, preprints, and research articles across multiple databases.

Overview¶

ToolUniverse provides 15 powerful literature search tools that cover different aspects of academic research. All tools have been optimized with enhanced data extraction, comprehensive metadata, and data quality indicators:

Preprint Archives: - ArXiv - Physics, mathematics, computer science, and other fields - BioRxiv - Biology preprints - MedRxiv - Medical preprints - HAL - French research archive

Academic Databases: - Crossref - Scholarly articles with DOI metadata - PubMed - Medical and life sciences literature - Europe PMC - Biomedical literature - Semantic Scholar - AI-powered academic search - OpenAlex - Open academic graph - DBLP - Computer science bibliography

Open Access Tools: - DOAJ - Directory of Open Access Journals - Unpaywall - Open access status checker - CORE - World’s largest open access research papers collection - PMC - PubMed Central full-text biomedical literature - Zenodo - Open research data and publications repository

Tool Overview Table¶

Tool Name	Database	Best For	Key Features
ArXiv_search_ papers	ArXiv	Preprints	Physics, Math, CS, Biology
Crossref_search_ works	Crossref	Scholarly Articles	DOI metadata, Rich metadata, HTML cleaning
PubMed_search_ articles	PubMed	Medical/Life Sciences	Medical literature, MeSH keywords
SemanticScholar_ search_papers	Semantic Scholar	AI-powered Search	AI ranking, Citations, Rate limit handling
openalex_ literature_ search	OpenAlex	Comprehensive Academic Search	Open access, Year filtering, Abstract reconstruction
EuropePMC_ search_articles	Europe PMC	Biomedical Literature	European research, Core+Lite modes
DBLP_search_ publications	DBLP	Computer Science	CS bibliography, Conference papers
DOAJ_search_ articles	DOAJ	Open Access	Articles & Journals, HTML cleaning
BioRxiv_search_ preprints	BioRxiv	Biology Preprints	Biology preprints, Abstracts
MedRxiv_search_ preprints	MedRxiv	Medical Preprints	Medical preprints, Abstracts
HAL_search_ archive	HAL	French Research Archive	French academic papers
Unpaywall_ check_oa_status	Unpaywall	Open Access Status	OA status checking

Enhanced Features¶

All literature search tools have been optimized with the following enhancements:

Comprehensive Data Extraction: - Authors information with proper formatting - DOI (Digital Object Identifier) extraction - Citation counts where available - Open access status indicators - Keywords and subject terms - Article type classification - Publisher information - Data quality indicators for transparency

Improved Error Handling: - Enhanced rate limit handling with automatic retry - User-friendly error messages - Graceful handling of missing data - Consistent fallback values

Data Quality Transparency: - Each result includes a data_quality object - Boolean indicators for field availability - Clear indication of missing vs. unavailable data - Consistent data structure across all tools

Advanced Metadata: - HTML tag cleaning for abstracts - Abstract reconstruction from inverted indexes (OpenAlex) - Multi-mode API calls for comprehensive data (Europe PMC) - Enhanced author and affiliation extraction

Getting Started¶

First, let’s initialize ToolUniverse and load the literature search tools:

from tooluniverse import ToolUniverse

# Initialize ToolUniverse
tu = ToolUniverse()
tu.load_tools()

print("Available literature search tools:")
literature_tools = [
    "ArXiv_search_papers",
    "Crossref_search_works",
    "DBLP_search_publications",
    "PubMed_search_articles",
    "DOAJ_search_articles",
    "Unpaywall_check_oa_status",
    "BioRxiv_search_preprints",
    "MedRxiv_search_preprints",
    "HAL_search_archive",
    "SemanticScholar_search_papers",
    "openalex_literature_search",
    "EuropePMC_search_articles",
    "CORE_search_papers",
    "PMC_search_papers",
    "Zenodo_search_records"
]

for tool in literature_tools:
    print(f"- {tool}")

Basic Usage Pattern¶

All literature search tools follow a similar usage pattern:

# Basic search pattern
result = tu.run({
    "name": "tool_name",
    "arguments": {
    "query": "your search terms",
    "limit": 5  # number of results
    }
})

# Check if results were found
if isinstance(result, list) and len(result) > 0:
    print(f"Found {len(result)} results")
    for i, paper in enumerate(result, 1):
    print(f"{i}. {paper.get('title', 'No title')}")
    print(f"   Authors: {', '.join(paper.get('authors', [])[:3])}")
    print(f"   Year: {paper.get('year', 'Unknown')}")

    # Show data quality information
    if 'data_quality' in paper:
        quality = paper['data_quality']
        available_fields = [k for k, v in quality.items() if v]
        print(f"   Available data: {', '.join(available_fields)}")

    # Show additional metadata if available
    if paper.get('doi'):
        print(f"   DOI: {paper['doi']}")
    if paper.get('citations') or paper.get('citation_count'):
        citations = paper.get('citations') or paper.get('citation_count')
        print(f"   Citations: {citations}")
    if paper.get('open_access') is not None:
        print(f"   Open Access: {paper['open_access']}")
    print()
else:
    print(f"No results found or error: {result}")

Tool-Specific Examples¶

ArXiv Search¶

Search for preprints in physics, mathematics, computer science, and other fields:

# Search for machine learning papers
arxiv_results = tu.run({
    "name": "ArXiv_search_papers",
    "arguments": {
    "query": "machine learning deep learning",
    "limit": 3,
    "sort_by": "relevance"
    }
})

# Search with date filtering
recent_papers = tu.run({
    "name": "ArXiv_search_papers",
    "arguments": {
    "query": "quantum computing",
    "limit": 2,
    "sort_by": "submittedDate",
    "sort_order": "descending"
    }
})

Crossref Search¶

Search for scholarly articles with rich metadata:

# Basic search
crossref_results = tu.run({
    "name": "Crossref_search_works",
    "arguments": {
    "query": "artificial intelligence",
    "limit": 3
    }
})

# Search with filters
filtered_results = tu.run({
    "name": "Crossref_search_works",
    "arguments": {
    "query": "machine learning",
    "limit": 2,
    "filter": "type:journal-article,from-pub-date:2020-01-01"
    }
})

PubMed Search¶

Search medical and life sciences literature:

# Search for medical research
pubmed_results = tu.run({
    "name": "PubMed_search_articles",
    "arguments": {
    "query": "cancer immunotherapy",
    "limit": 3
    }
})

# Search for COVID-19 research
covid_results = tu.run({
    "name": "PubMed_search_articles",
    "arguments": {
    "query": "COVID-19 vaccine efficacy",
    "limit": 2
    }
})

Semantic Scholar Search¶

AI-powered academic search with enhanced results:

# Search with AI-powered ranking
semantic_results = tu.run({
    "name": "SemanticScholar_search_papers",
    "arguments": {
    "query": "deep learning neural networks",
    "limit": 3
    }
})

OpenAlex Search¶

Comprehensive academic search with advanced filtering:

# Basic search
openalex_results = tu.run({
    "name": "openalex_literature_search",
    "arguments": {
    "search_keywords": "artificial intelligence",
    "max_results": 3
    }
})

# Search with year filtering
recent_ai_papers = tu.run({
    "name": "openalex_literature_search",
    "arguments": {
    "search_keywords": "machine learning",
    "max_results": 2,
    "year_from": 2020,
    "open_access": True
    }
})

Preprint Archives¶

Search for preprints in specific fields:

# Biology preprints
biorxiv_results = tu.run({
    "name": "BioRxiv_search_preprints",
    "arguments": {
    "query": "CRISPR gene editing",
    "max_results": 2
    }
})

# Medical preprints
medrxiv_results = tu.run({
    "name": "MedRxiv_search_preprints",
    "arguments": {
    "query": "COVID-19 treatment",
    "max_results": 2
    }
})

# French research archive
hal_results = tu.run({
    "name": "HAL_search_archive",
    "arguments": {
    "query": "mathematics statistics",
    "max_results": 2
    }
})

Open Access Tools¶

Find open access articles and check access status:

# Search DOAJ for open access articles
doaj_articles = tu.run({
    "name": "DOAJ_search_articles",
    "arguments": {
    "query": "renewable energy",
    "max_results": 3,
    "type": "articles"
    }
})

# Search for open access journals
doaj_journals = tu.run({
    "name": "DOAJ_search_articles",
    "arguments": {
    "query": "biology",
    "max_results": 2,
    "type": "journals"
    }
})

# Check open access status
oa_status = tu.run({
    "name": "Unpaywall_check_oa_status",
    "arguments": {
    "doi": "10.1038/nature12373",
    "email": "your-email@example.com"
    }
})

Advanced Usage Patterns¶

Combining Multiple Searches¶

Search across multiple databases for comprehensive results:

def comprehensive_search(query, max_results=5):
    """Search across multiple literature databases."""
    results = {}

    # Search different databases
    databases = [
    ("ArXiv", "ArXiv_search_papers"),
    ("Crossref", "Crossref_search_works"),
    ("Semantic Scholar", "SemanticScholar_search_papers"),
    ("OpenAlex", "openalex_literature_search")
    ]

    for db_name, tool_name in databases:
    try:
        result = tu.run({
    "name": tool_name,
    "arguments": {"query": query, "limit": max_results}
        })
        results[db_name] = result if isinstance(result, list) else []
    except Exception as e:
        print(f"Error searching {db_name}: {e}")
        results[db_name] = []

    return results

# Use the comprehensive search
all_results = comprehensive_search("machine learning", 3)
for db, papers in all_results.items():
    print(f"\n{db}: {len(papers)} papers found")

Filtering and Sorting¶

Different tools offer various filtering and sorting options:

# ArXiv with sorting
sorted_papers = tu.run({
    "name": "ArXiv_search_papers",
    "arguments": {
    "query": "quantum physics",
    "limit": 5,
    "sort_by": "submittedDate",
    "sort_order": "descending"
    }
})

# OpenAlex with year filtering
recent_papers = tu.run({
    "name": "openalex_literature_search",
    "arguments": {
    "search_keywords": "artificial intelligence",
    "max_results": 5,
    "year_from": 2022,
    "year_to": 2024,
    "open_access": True
    }
})

# Crossref with type filtering
journal_articles = tu.run({
    "name": "Crossref_search_works",
    "arguments": {
    "query": "machine learning",
    "limit": 5,
    "filter": "type:journal-article,from-pub-date:2020-01-01"
    }
})

Error Handling¶

Always include proper error handling for robust applications:

def safe_search(tool_name, arguments):
    """Safely search with error handling."""
    try:
    result = tu.run({
        "name": tool_name,
        "arguments": arguments
    })

    if isinstance(result, list):
        return {"success": True, "data": result, "count": len(result)}
    elif isinstance(result, dict) and "error" in result:
        return {"success": False, "error": result["error"]}
    else:
        return {"success": False, "error": "Unexpected result format"}

    except Exception as e:
    return {"success": False, "error": str(e)}

# Use safe search
result = safe_search("ArXiv_search_papers", {
    "query": "machine learning",
    "limit": 3
})

if result["success"]:
    print(f"Found {result['count']} papers")
    for paper in result["data"]:
    print(f"- {paper.get('title', 'No title')}")
else:
    print(f"Search failed: {result['error']}")

Best Practices¶

Choose the Right Tool: - Use ArXiv for preprints in physics, math, CS - Use PubMed for medical/life sciences - Use Semantic Scholar for AI-powered ranking - Use OpenAlex for comprehensive academic search
Optimize Your Queries: - Use specific, relevant keywords - Combine terms with appropriate operators - Use filters to narrow results
Handle Rate Limits: - Some APIs have rate limits - Implement delays between requests if needed - Use error handling for 429 (Too Many Requests) errors
Process Results Efficiently: - Check result types before processing - Extract only needed fields - Implement pagination for large result sets
Combine Multiple Sources: - Use different tools for comprehensive coverage - Cross-reference results for validation - Merge and deduplicate results

Complete Example¶

Here’s a complete example that demonstrates searching across multiple literature databases:

#!/usr/bin/env python3
"""
Comprehensive Literature Search Example
"""

from tooluniverse import ToolUniverse
import json

def main():
    # Initialize ToolUniverse
    tu = ToolUniverse()
    tu.load_tools()

    # Define search query
    query = "machine learning deep learning"

    # Define tools to search
    search_tools = [
    {
        "name": "ArXiv_search_papers",
        "description": "ArXiv Preprints",
        "args": {"query": query, "limit": 2, "sort_by": "relevance"}
    },
    {
        "name": "Crossref_search_works",
        "description": "Crossref Articles",
        "args": {"query": query, "limit": 2}
    },
    {
        "name": "SemanticScholar_search_papers",
        "description": "Semantic Scholar",
        "args": {"query": query, "limit": 2}
    },
    {
        "name": "openalex_literature_search",
        "description": "OpenAlex",
        "args": {"search_keywords": query, "max_results": 2}
    }
    ]

    print(f"Searching for: '{query}'")
    print("=" * 50)

    all_results = []

    for tool in search_tools:
    print(f"\nSearching {tool['description']}...")

    try:
        result = tu.run({
    "name": tool["name"],
    "arguments": tool["args"]
        })

        if isinstance(result, list) and len(result) > 0:
    print(f"✅ Found {len(result)} results")
    all_results.extend(result)

    # Show first result
    first_paper = result[0]
    print(f"📄 Sample: {first_paper.get('title', 'No title')[:60]}...")
        else:
    print(f"❌ No results or error: {result}")

    except Exception as e:
        print(f"❌ Exception: {str(e)[:100]}...")

    print(f"\n📊 Total papers found: {len(all_results)}")

    # Save results to file
    with open("literature_search_results.json", "w") as f:
    json.dump(all_results, f, indent=2, ensure_ascii=False)

    print("💾 Results saved to literature_search_results.json")

if __name__ == "__main__":
    main()

This tutorial provides a comprehensive guide to using ToolUniverse’s literature search tools. For more specific examples and advanced usage patterns, refer to the individual tool documentation in the API reference.

Troubleshooting¶

Common Issues and Solutions:

API Rate Limits: - Some services have rate limits - Implement delays between requests - Use error handling for 429 errors
Missing Results: - Check query spelling and keywords - Try different search terms - Verify tool parameters
Authentication Errors: - Some tools require API keys - Check your .env file configuration - Verify API key validity
Network Issues: - Check internet connection - Implement retry logic - Handle timeout exceptions

For more help, see the troubleshooting guide.

Quick Reference¶

Common Parameters¶

Most search tools accept these common parameters:

Parameter	Type	Description	Example
query	string	Search terms	“machine learning”
limit	integer	Max results	5
max_results	integer	Max results (alternative)	5
sort_by	string	Sort order	“relevance”
sort_order	string	Sort direction	“descending”
filter	string	Result filters	“type:journal- article”
year_from	integer	Start year	2020
year_to	integer	End year	2024
open_access	boolean	Open access only	True
type	string	Content type	“articles”
email	string	Contact email (required)	“user@example. com”

Quick Usage Examples¶

ArXiv Search: .. code-block:: python

result = tu.run({
“name”: “ArXiv_search_papers”, “arguments”: { “query”: “machine learning”, “limit”: 5, “sort_by”: “relevance” }

})

Crossref Search: .. code-block:: python

result = tu.run({
“name”: “Crossref_search_works”, “arguments”: { “query”: “artificial intelligence”, “limit”: 5, “filter”: “type:journal-article” }

})

PubMed Search: .. code-block:: python

result = tu.run({
“name”: “PubMed_search_articles”, “arguments”: { “query”: “cancer research”, “limit”: 5 }

})

Semantic Scholar Search: .. code-block:: python

result = tu.run({
“name”: “SemanticScholar_search_papers”, “arguments”: { “query”: “deep learning”, “limit”: 5 }

})

OpenAlex Search: .. code-block:: python

result = tu.run({
“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “artificial intelligence”, “max_results”: 5, “year_from”: 2020, “open_access”: True }

})

Preprint Archives: .. code-block:: python

# BioRxiv (Biology) result = tu.run({

“name”: “BioRxiv_search_preprints”, “arguments”: { “query”: “CRISPR”, “max_results”: 5 }

})

# MedRxiv (Medical) result = tu.run({

“name”: “MedRxiv_search_preprints”, “arguments”: { “query”: “COVID-19”, “max_results”: 5 }

})

# HAL (French Archive) result = tu.run({

“name”: “HAL_search_archive”, “arguments”: { “query”: “mathematics”, “max_results”: 5 }

})

Open Access Tools: .. code-block:: python

# DOAJ Search result = tu.run({

“name”: “DOAJ_search_articles”, “arguments”: { “query”: “renewable energy”, “max_results”: 5, “type”: “articles” }

})

# Unpaywall Check result = tu.run({

“name”: “Unpaywall_check_oa_status”, “arguments”: { “doi”: “10.1038/nature12373”, “email”: “your-email@example.com” }

})

CORE Search¶

Search the world’s largest collection of open access research papers:

# Basic CORE search
result = tu.run({
    "name": "CORE_search_papers",
    "arguments": {
    "query": "machine learning",
    "limit": 5
    }
})

# CORE search with year filter
result = tu.run({
    "name": "CORE_search_papers",
    "arguments": {
    "query": "artificial intelligence",
    "limit": 3,
    "year_from": 2020,
    "year_to": 2024
    }
})

# CORE search with language filter
result = tu.run({
    "name": "CORE_search_papers",
    "arguments": {
    "query": "climate change",
    "limit": 5,
    "language": "en"
    }
})

PMC Search¶

Search PubMed Central full-text biomedical literature:

# Basic PMC search
result = tu.run({
    "name": "PMC_search_papers",
    "arguments": {
    "query": "cancer research",
    "limit": 5
    }
})

# PMC search with date filter
result = tu.run({
    "name": "PMC_search_papers",
    "arguments": {
    "query": "COVID-19",
    "limit": 3,
    "date_from": "2020/01/01",
    "date_to": "2024/12/31"
    }
})

# PMC search with article type filter
result = tu.run({
    "name": "PMC_search_papers",
    "arguments": {
    "query": "diabetes treatment",
    "limit": 5,
    "article_type": "research-article"
    }
})

Zenodo Search¶

Search Zenodo for research data, publications, and datasets:

# Basic Zenodo search
result = tu.run({
    "name": "Zenodo_search_records",
    "arguments": {
    "query": "machine learning",
    "max_results": 5
    }
})

# Zenodo search with community filter
result = tu.run({
    "name": "Zenodo_search_records",
    "arguments": {
    "query": "climate change",
    "max_results": 3,
    "community": "zenodo"
    }
})

# Zenodo search for COVID-19 datasets
result = tu.run({
    "name": "Zenodo_search_records",
    "arguments": {
    "query": "COVID-19 dataset",
    "max_results": 5
    }
})

Tool Selection Guide¶

Choose the right tool for your research field:

Physics/Math/CS: ArXiv
Medical/Life Sciences: PubMed, Europe PMC, PMC
General Academic: Crossref, Semantic Scholar, OpenAlex
Computer Science: DBLP
Preprints: BioRxiv, MedRxiv, HAL
Open Access: DOAJ, Unpaywall, CORE
Full-Text Biomedical: PMC
Comprehensive Open Access: CORE
Research Data & Datasets: Zenodo

Optimized Tool Examples¶

Here are examples showing the enhanced features of the optimized tools:

Europe PMC with Enhanced Data: .. code-block:: python

# Europe PMC now provides comprehensive metadata result = tu.run({

“name”: “EuropePMC_search_articles”, “arguments”: { “query”: “machine learning”, “limit”: 2 }

})

if isinstance(result, list) and result:
paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Journal: {paper.get(‘journal’)}”) print(f”DOI: {paper.get(‘doi’)}”) print(f”Citations: {paper.get(‘citations’)}”) print(f”Open Access: {paper.get(‘open_access’)}”) print(f”Keywords: {paper.get(‘keywords’)}”) print(f”Data Quality: {paper.get(‘data_quality’)}”)

OpenAlex with Abstract Reconstruction: .. code-block:: python

# OpenAlex now reconstructs abstracts from inverted index result = tu.run({

“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “artificial intelligence”, “max_results”: 2 }

})

if isinstance(result, list) and result:
paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Abstract: {paper.get(‘abstract’)[:200]}…”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Venue: {paper.get(‘venue’)}”) print(f”Citation Count: {paper.get(‘citation_count’)}”) print(f”Keywords: {paper.get(‘keywords’)}”)

Semantic Scholar with Rate Limit Handling: .. code-block:: python

# Semantic Scholar now handles rate limits gracefully result = tu.run({

“name”: “SemanticScholar_search_papers”, “arguments”: { “query”: “deep learning”, “limit”: 2, “api_key”: “your_api_key_here” # Optional for higher limits }

})

if isinstance(result, list) and result:
paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Abstract: {paper.get(‘abstract’)}”) print(f”Journal: {paper.get(‘journal’)}”) print(f”Data Quality: {paper.get(‘data_quality’)}”)

Crossref with HTML Cleaning: .. code-block:: python

# Crossref now cleans HTML tags from abstracts result = tu.run({

“name”: “Crossref_search_works”, “arguments”: { “query”: “machine learning”, “limit”: 2 }

})

if isinstance(result, list) and result:
paper = result[0] print(f”Title: {paper.get(‘title’)}”) print(f”Clean Abstract: {paper.get(‘abstract’)}”) print(f”Authors: {paper.get(‘authors’)}”) print(f”Publisher: {paper.get(‘publisher’)}”) print(f”Article Type: {paper.get(‘article_type’)}”)

Data Quality Analysis: .. code-block:: python

def analyze_data_quality(results):
“””Analyze data quality across multiple tools.””” if not isinstance(results, list): return

total_papers = len(results) quality_stats = { ‘has_abstract’: 0, ‘has_authors’: 0, ‘has_doi’: 0, ‘has_citations’: 0, ‘has_keywords’: 0 }

for paper in results: if ‘data_quality’ in paper:

for field, available in paper[‘data_quality’].items():

if field in quality_stats and available:
quality_stats[field] += 1

print(f”Data Quality Analysis ({total_papers} papers):”) for field, count in quality_stats.items(): percentage = (count / total_papers) * 100 print(f” {field}: {count}/{total_papers} ({percentage:.1f}%)”)

# Use with any search results result = tu.run({

“name”: “openalex_literature_search”, “arguments”: { “search_keywords”: “machine learning”, “max_results”: 5 }

})

if isinstance(result, list):
analyze_data_quality(result)