Literature Search Tools Tutorial¶
This comprehensive tutorial demonstrates how to use the literature search tools available in ToolUniverse for finding academic papers, preprints, and research articles across multiple databases.
Overview¶
ToolUniverse provides 15 powerful literature search tools that cover different aspects of academic research. All tools have been optimized with enhanced data extraction, comprehensive metadata, and data quality indicators:
Preprint Archives: - ArXiv - Physics, mathematics, computer science, and other fields - BioRxiv - Biology preprints - MedRxiv - Medical preprints - HAL - French research archive
Academic Databases: - Crossref - Scholarly articles with DOI metadata - PubMed - Medical and life sciences literature - Europe PMC - Biomedical literature - Semantic Scholar - AI-powered academic search - OpenAlex - Open academic graph - DBLP - Computer science bibliography
Open Access Tools: - DOAJ - Directory of Open Access Journals - Unpaywall - Open access status checker - CORE - Worldâs largest open access research papers collection - PMC - PubMed Central full-text biomedical literature - Zenodo - Open research data and publications repository
Tool Overview Table¶
Tool Name |
Database |
Best For |
Key Features |
---|---|---|---|
ArXiv_search_ papers |
ArXiv |
Preprints |
Physics, Math, CS, Biology |
Crossref_search_ works |
Crossref |
Scholarly Articles |
DOI metadata, Rich metadata, HTML cleaning |
PubMed_search_ articles |
PubMed |
Medical/Life Sciences |
Medical literature, MeSH keywords |
SemanticScholar_ search_papers |
Semantic Scholar |
AI-powered Search |
AI ranking, Citations, Rate limit handling |
openalex_ literature_ search |
OpenAlex |
Comprehensive Academic Search |
Open access, Year filtering, Abstract reconstruction |
EuropePMC_ search_articles |
Europe PMC |
Biomedical Literature |
European research, Core+Lite modes |
DBLP_search_ publications |
DBLP |
Computer Science |
CS bibliography, Conference papers |
DOAJ_search_ articles |
DOAJ |
Open Access |
Articles & Journals, HTML cleaning |
BioRxiv_search_ preprints |
BioRxiv |
Biology Preprints |
Biology preprints, Abstracts |
MedRxiv_search_ preprints |
MedRxiv |
Medical Preprints |
Medical preprints, Abstracts |
HAL_search_ archive |
HAL |
French Research Archive |
French academic papers |
Unpaywall_ check_oa_status |
Unpaywall |
Open Access Status |
OA status checking |
Enhanced Features¶
All literature search tools have been optimized with the following enhancements:
Comprehensive Data Extraction: - Authors information with proper formatting - DOI (Digital Object Identifier) extraction - Citation counts where available - Open access status indicators - Keywords and subject terms - Article type classification - Publisher information - Data quality indicators for transparency
Improved Error Handling: - Enhanced rate limit handling with automatic retry - User-friendly error messages - Graceful handling of missing data - Consistent fallback values
Data Quality Transparency: - Each result includes a data_quality object - Boolean indicators for field availability - Clear indication of missing vs. unavailable data - Consistent data structure across all tools
Advanced Metadata: - HTML tag cleaning for abstracts - Abstract reconstruction from inverted indexes (OpenAlex) - Multi-mode API calls for comprehensive data (Europe PMC) - Enhanced author and affiliation extraction
Getting Started¶
First, letâs initialize ToolUniverse and load the literature search tools:
from tooluniverse import ToolUniverse
# Initialize ToolUniverse
tu = ToolUniverse()
tu.load_tools()
print("Available literature search tools:")
literature_tools = [
"ArXiv_search_papers",
"Crossref_search_works",
"DBLP_search_publications",
"PubMed_search_articles",
"DOAJ_search_articles",
"Unpaywall_check_oa_status",
"BioRxiv_search_preprints",
"MedRxiv_search_preprints",
"HAL_search_archive",
"SemanticScholar_search_papers",
"openalex_literature_search",
"EuropePMC_search_articles",
"CORE_search_papers",
"PMC_search_papers",
"Zenodo_search_records"
]
for tool in literature_tools:
print(f"- {tool}")
Basic Usage Pattern¶
All literature search tools follow a similar usage pattern:
# Basic search pattern
result = tu.run({
"name": "tool_name",
"arguments": {
"query": "your search terms",
"limit": 5 # number of results
}
})
# Check if results were found
if isinstance(result, list) and len(result) > 0:
print(f"Found {len(result)} results")
for i, paper in enumerate(result, 1):
print(f"{i}. {paper.get('title', 'No title')}")
print(f" Authors: {', '.join(paper.get('authors', [])[:3])}")
print(f" Year: {paper.get('year', 'Unknown')}")
# Show data quality information
if 'data_quality' in paper:
quality = paper['data_quality']
available_fields = [k for k, v in quality.items() if v]
print(f" Available data: {', '.join(available_fields)}")
# Show additional metadata if available
if paper.get('doi'):
print(f" DOI: {paper['doi']}")
if paper.get('citations') or paper.get('citation_count'):
citations = paper.get('citations') or paper.get('citation_count')
print(f" Citations: {citations}")
if paper.get('open_access') is not None:
print(f" Open Access: {paper['open_access']}")
print()
else:
print(f"No results found or error: {result}")
Tool-Specific Examples¶
ArXiv Search¶
Search for preprints in physics, mathematics, computer science, and other fields:
# Search for machine learning papers
arxiv_results = tu.run({
"name": "ArXiv_search_papers",
"arguments": {
"query": "machine learning deep learning",
"limit": 3,
"sort_by": "relevance"
}
})
# Search with date filtering
recent_papers = tu.run({
"name": "ArXiv_search_papers",
"arguments": {
"query": "quantum computing",
"limit": 2,
"sort_by": "submittedDate",
"sort_order": "descending"
}
})
Crossref Search¶
Search for scholarly articles with rich metadata:
# Basic search
crossref_results = tu.run({
"name": "Crossref_search_works",
"arguments": {
"query": "artificial intelligence",
"limit": 3
}
})
# Search with filters
filtered_results = tu.run({
"name": "Crossref_search_works",
"arguments": {
"query": "machine learning",
"limit": 2,
"filter": "type:journal-article,from-pub-date:2020-01-01"
}
})
PubMed Search¶
Search medical and life sciences literature:
# Search for medical research
pubmed_results = tu.run({
"name": "PubMed_search_articles",
"arguments": {
"query": "cancer immunotherapy",
"limit": 3
}
})
# Search for COVID-19 research
covid_results = tu.run({
"name": "PubMed_search_articles",
"arguments": {
"query": "COVID-19 vaccine efficacy",
"limit": 2
}
})
Semantic Scholar Search¶
AI-powered academic search with enhanced results:
# Search with AI-powered ranking
semantic_results = tu.run({
"name": "SemanticScholar_search_papers",
"arguments": {
"query": "deep learning neural networks",
"limit": 3
}
})
OpenAlex Search¶
Comprehensive academic search with advanced filtering:
# Basic search
openalex_results = tu.run({
"name": "openalex_literature_search",
"arguments": {
"search_keywords": "artificial intelligence",
"max_results": 3
}
})
# Search with year filtering
recent_ai_papers = tu.run({
"name": "openalex_literature_search",
"arguments": {
"search_keywords": "machine learning",
"max_results": 2,
"year_from": 2020,
"open_access": True
}
})
Preprint Archives¶
Search for preprints in specific fields:
# Biology preprints
biorxiv_results = tu.run({
"name": "BioRxiv_search_preprints",
"arguments": {
"query": "CRISPR gene editing",
"max_results": 2
}
})
# Medical preprints
medrxiv_results = tu.run({
"name": "MedRxiv_search_preprints",
"arguments": {
"query": "COVID-19 treatment",
"max_results": 2
}
})
# French research archive
hal_results = tu.run({
"name": "HAL_search_archive",
"arguments": {
"query": "mathematics statistics",
"max_results": 2
}
})
Open Access Tools¶
Find open access articles and check access status:
# Search DOAJ for open access articles
doaj_articles = tu.run({
"name": "DOAJ_search_articles",
"arguments": {
"query": "renewable energy",
"max_results": 3,
"type": "articles"
}
})
# Search for open access journals
doaj_journals = tu.run({
"name": "DOAJ_search_articles",
"arguments": {
"query": "biology",
"max_results": 2,
"type": "journals"
}
})
# Check open access status
oa_status = tu.run({
"name": "Unpaywall_check_oa_status",
"arguments": {
"doi": "10.1038/nature12373",
"email": "your-email@example.com"
}
})
Advanced Usage Patterns¶
Combining Multiple Searches¶
Search across multiple databases for comprehensive results:
def comprehensive_search(query, max_results=5):
"""Search across multiple literature databases."""
results = {}
# Search different databases
databases = [
("ArXiv", "ArXiv_search_papers"),
("Crossref", "Crossref_search_works"),
("Semantic Scholar", "SemanticScholar_search_papers"),
("OpenAlex", "openalex_literature_search")
]
for db_name, tool_name in databases:
try:
result = tu.run({
"name": tool_name,
"arguments": {"query": query, "limit": max_results}
})
results[db_name] = result if isinstance(result, list) else []
except Exception as e:
print(f"Error searching {db_name}: {e}")
results[db_name] = []
return results
# Use the comprehensive search
all_results = comprehensive_search("machine learning", 3)
for db, papers in all_results.items():
print(f"\n{db}: {len(papers)} papers found")
Filtering and Sorting¶
Different tools offer various filtering and sorting options:
# ArXiv with sorting
sorted_papers = tu.run({
"name": "ArXiv_search_papers",
"arguments": {
"query": "quantum physics",
"limit": 5,
"sort_by": "submittedDate",
"sort_order": "descending"
}
})
# OpenAlex with year filtering
recent_papers = tu.run({
"name": "openalex_literature_search",
"arguments": {
"search_keywords": "artificial intelligence",
"max_results": 5,
"year_from": 2022,
"year_to": 2024,
"open_access": True
}
})
# Crossref with type filtering
journal_articles = tu.run({
"name": "Crossref_search_works",
"arguments": {
"query": "machine learning",
"limit": 5,
"filter": "type:journal-article,from-pub-date:2020-01-01"
}
})
Error Handling¶
Always include proper error handling for robust applications:
def safe_search(tool_name, arguments):
"""Safely search with error handling."""
try:
result = tu.run({
"name": tool_name,
"arguments": arguments
})
if isinstance(result, list):
return {"success": True, "data": result, "count": len(result)}
elif isinstance(result, dict) and "error" in result:
return {"success": False, "error": result["error"]}
else:
return {"success": False, "error": "Unexpected result format"}
except Exception as e:
return {"success": False, "error": str(e)}
# Use safe search
result = safe_search("ArXiv_search_papers", {
"query": "machine learning",
"limit": 3
})
if result["success"]:
print(f"Found {result['count']} papers")
for paper in result["data"]:
print(f"- {paper.get('title', 'No title')}")
else:
print(f"Search failed: {result['error']}")
Best Practices¶
Choose the Right Tool: - Use ArXiv for preprints in physics, math, CS - Use PubMed for medical/life sciences - Use Semantic Scholar for AI-powered ranking - Use OpenAlex for comprehensive academic search
Optimize Your Queries: - Use specific, relevant keywords - Combine terms with appropriate operators - Use filters to narrow results
Handle Rate Limits: - Some APIs have rate limits - Implement delays between requests if needed - Use error handling for 429 (Too Many Requests) errors
Process Results Efficiently: - Check result types before processing - Extract only needed fields - Implement pagination for large result sets
Combine Multiple Sources: - Use different tools for comprehensive coverage - Cross-reference results for validation - Merge and deduplicate results
Complete Example¶
Hereâs a complete example that demonstrates searching across multiple literature databases:
#!/usr/bin/env python3
"""
Comprehensive Literature Search Example
"""
from tooluniverse import ToolUniverse
import json
def main():
# Initialize ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Define search query
query = "machine learning deep learning"
# Define tools to search
search_tools = [
{
"name": "ArXiv_search_papers",
"description": "ArXiv Preprints",
"args": {"query": query, "limit": 2, "sort_by": "relevance"}
},
{
"name": "Crossref_search_works",
"description": "Crossref Articles",
"args": {"query": query, "limit": 2}
},
{
"name": "SemanticScholar_search_papers",
"description": "Semantic Scholar",
"args": {"query": query, "limit": 2}
},
{
"name": "openalex_literature_search",
"description": "OpenAlex",
"args": {"search_keywords": query, "max_results": 2}
}
]
print(f"Searching for: '{query}'")
print("=" * 50)
all_results = []
for tool in search_tools:
print(f"\nSearching {tool['description']}...")
try:
result = tu.run({
"name": tool["name"],
"arguments": tool["args"]
})
if isinstance(result, list) and len(result) > 0:
print(f"â
Found {len(result)} results")
all_results.extend(result)
# Show first result
first_paper = result[0]
print(f"đ Sample: {first_paper.get('title', 'No title')[:60]}...")
else:
print(f"â No results or error: {result}")
except Exception as e:
print(f"â Exception: {str(e)[:100]}...")
print(f"\nđ Total papers found: {len(all_results)}")
# Save results to file
with open("literature_search_results.json", "w") as f:
json.dump(all_results, f, indent=2, ensure_ascii=False)
print("đŸ Results saved to literature_search_results.json")
if __name__ == "__main__":
main()
This tutorial provides a comprehensive guide to using ToolUniverseâs literature search tools. For more specific examples and advanced usage patterns, refer to the individual tool documentation in the API reference.
Troubleshooting¶
Common Issues and Solutions:
API Rate Limits: - Some services have rate limits - Implement delays between requests - Use error handling for 429 errors
Missing Results: - Check query spelling and keywords - Try different search terms - Verify tool parameters
Authentication Errors: - Some tools require API keys - Check your .env file configuration - Verify API key validity
Network Issues: - Check internet connection - Implement retry logic - Handle timeout exceptions
For more help, see the troubleshooting guide.
Quick Reference¶
Common Parameters¶
Most search tools accept these common parameters:
Parameter |
Type |
Description |
Example |
---|---|---|---|
query |
string |
Search terms |
âmachine learningâ |
limit |
integer |
Max results |
5 |
max_results |
integer |
Max results (alternative) |
5 |
sort_by |
string |
Sort order |
ârelevanceâ |
sort_order |
string |
Sort direction |
âdescendingâ |
filter |
string |
Result filters |
âtype:journal- articleâ |
year_from |
integer |
Start year |
2020 |
year_to |
integer |
End year |
2024 |
open_access |
boolean |
Open access only |
True |
type |
string |
Content type |
âarticlesâ |
string |
Contact email (required) |
âuser@example. comâ |
Quick Usage Examples¶
ArXiv Search: .. code-block:: python
- result = tu.run({
ânameâ: âArXiv_search_papersâ, âargumentsâ: { âqueryâ: âmachine learningâ, âlimitâ: 5, âsort_byâ: ârelevanceâ }
})
Crossref Search: .. code-block:: python
- result = tu.run({
ânameâ: âCrossref_search_worksâ, âargumentsâ: { âqueryâ: âartificial intelligenceâ, âlimitâ: 5, âfilterâ: âtype:journal-articleâ }
})
PubMed Search: .. code-block:: python
- result = tu.run({
ânameâ: âPubMed_search_articlesâ, âargumentsâ: { âqueryâ: âcancer researchâ, âlimitâ: 5 }
})
Semantic Scholar Search: .. code-block:: python
- result = tu.run({
ânameâ: âSemanticScholar_search_papersâ, âargumentsâ: { âqueryâ: âdeep learningâ, âlimitâ: 5 }
})
OpenAlex Search: .. code-block:: python
- result = tu.run({
ânameâ: âopenalex_literature_searchâ, âargumentsâ: { âsearch_keywordsâ: âartificial intelligenceâ, âmax_resultsâ: 5, âyear_fromâ: 2020, âopen_accessâ: True }
})
Preprint Archives: .. code-block:: python
# BioRxiv (Biology) result = tu.run({
ânameâ: âBioRxiv_search_preprintsâ, âargumentsâ: { âqueryâ: âCRISPRâ, âmax_resultsâ: 5 }
})
# MedRxiv (Medical) result = tu.run({
ânameâ: âMedRxiv_search_preprintsâ, âargumentsâ: { âqueryâ: âCOVID-19â, âmax_resultsâ: 5 }
})
# HAL (French Archive) result = tu.run({
ânameâ: âHAL_search_archiveâ, âargumentsâ: { âqueryâ: âmathematicsâ, âmax_resultsâ: 5 }
})
Open Access Tools: .. code-block:: python
# DOAJ Search result = tu.run({
ânameâ: âDOAJ_search_articlesâ, âargumentsâ: { âqueryâ: ârenewable energyâ, âmax_resultsâ: 5, âtypeâ: âarticlesâ }
})
# Unpaywall Check result = tu.run({
ânameâ: âUnpaywall_check_oa_statusâ, âargumentsâ: { âdoiâ: â10.1038/nature12373â, âemailâ: âyour-email@example.comâ }
})
CORE Search¶
Search the worldâs largest collection of open access research papers:
# Basic CORE search
result = tu.run({
"name": "CORE_search_papers",
"arguments": {
"query": "machine learning",
"limit": 5
}
})
# CORE search with year filter
result = tu.run({
"name": "CORE_search_papers",
"arguments": {
"query": "artificial intelligence",
"limit": 3,
"year_from": 2020,
"year_to": 2024
}
})
# CORE search with language filter
result = tu.run({
"name": "CORE_search_papers",
"arguments": {
"query": "climate change",
"limit": 5,
"language": "en"
}
})
PMC Search¶
Search PubMed Central full-text biomedical literature:
# Basic PMC search
result = tu.run({
"name": "PMC_search_papers",
"arguments": {
"query": "cancer research",
"limit": 5
}
})
# PMC search with date filter
result = tu.run({
"name": "PMC_search_papers",
"arguments": {
"query": "COVID-19",
"limit": 3,
"date_from": "2020/01/01",
"date_to": "2024/12/31"
}
})
# PMC search with article type filter
result = tu.run({
"name": "PMC_search_papers",
"arguments": {
"query": "diabetes treatment",
"limit": 5,
"article_type": "research-article"
}
})
Zenodo Search¶
Search Zenodo for research data, publications, and datasets:
# Basic Zenodo search
result = tu.run({
"name": "Zenodo_search_records",
"arguments": {
"query": "machine learning",
"max_results": 5
}
})
# Zenodo search with community filter
result = tu.run({
"name": "Zenodo_search_records",
"arguments": {
"query": "climate change",
"max_results": 3,
"community": "zenodo"
}
})
# Zenodo search for COVID-19 datasets
result = tu.run({
"name": "Zenodo_search_records",
"arguments": {
"query": "COVID-19 dataset",
"max_results": 5
}
})
Tool Selection Guide¶
Choose the right tool for your research field:
Physics/Math/CS: ArXiv
Medical/Life Sciences: PubMed, Europe PMC, PMC
General Academic: Crossref, Semantic Scholar, OpenAlex
Computer Science: DBLP
Preprints: BioRxiv, MedRxiv, HAL
Open Access: DOAJ, Unpaywall, CORE
Full-Text Biomedical: PMC
Comprehensive Open Access: CORE
Research Data & Datasets: Zenodo
Optimized Tool Examples¶
Here are examples showing the enhanced features of the optimized tools:
Europe PMC with Enhanced Data: .. code-block:: python
# Europe PMC now provides comprehensive metadata result = tu.run({
ânameâ: âEuropePMC_search_articlesâ, âargumentsâ: { âqueryâ: âmachine learningâ, âlimitâ: 2 }
})
- if isinstance(result, list) and result:
paper = result[0] print(fâTitle: {paper.get(âtitleâ)}â) print(fâAuthors: {paper.get(âauthorsâ)}â) print(fâJournal: {paper.get(âjournalâ)}â) print(fâDOI: {paper.get(âdoiâ)}â) print(fâCitations: {paper.get(âcitationsâ)}â) print(fâOpen Access: {paper.get(âopen_accessâ)}â) print(fâKeywords: {paper.get(âkeywordsâ)}â) print(fâData Quality: {paper.get(âdata_qualityâ)}â)
OpenAlex with Abstract Reconstruction: .. code-block:: python
# OpenAlex now reconstructs abstracts from inverted index result = tu.run({
ânameâ: âopenalex_literature_searchâ, âargumentsâ: { âsearch_keywordsâ: âartificial intelligenceâ, âmax_resultsâ: 2 }
})
- if isinstance(result, list) and result:
paper = result[0] print(fâTitle: {paper.get(âtitleâ)}â) print(fâAbstract: {paper.get(âabstractâ)[:200]}âŠâ) print(fâAuthors: {paper.get(âauthorsâ)}â) print(fâVenue: {paper.get(âvenueâ)}â) print(fâCitation Count: {paper.get(âcitation_countâ)}â) print(fâKeywords: {paper.get(âkeywordsâ)}â)
Semantic Scholar with Rate Limit Handling: .. code-block:: python
# Semantic Scholar now handles rate limits gracefully result = tu.run({
ânameâ: âSemanticScholar_search_papersâ, âargumentsâ: { âqueryâ: âdeep learningâ, âlimitâ: 2, âapi_keyâ: âyour_api_key_hereâ # Optional for higher limits }
})
- if isinstance(result, list) and result:
paper = result[0] print(fâTitle: {paper.get(âtitleâ)}â) print(fâAbstract: {paper.get(âabstractâ)}â) print(fâJournal: {paper.get(âjournalâ)}â) print(fâData Quality: {paper.get(âdata_qualityâ)}â)
Crossref with HTML Cleaning: .. code-block:: python
# Crossref now cleans HTML tags from abstracts result = tu.run({
ânameâ: âCrossref_search_worksâ, âargumentsâ: { âqueryâ: âmachine learningâ, âlimitâ: 2 }
})
- if isinstance(result, list) and result:
paper = result[0] print(fâTitle: {paper.get(âtitleâ)}â) print(fâClean Abstract: {paper.get(âabstractâ)}â) print(fâAuthors: {paper.get(âauthorsâ)}â) print(fâPublisher: {paper.get(âpublisherâ)}â) print(fâArticle Type: {paper.get(âarticle_typeâ)}â)
Data Quality Analysis: .. code-block:: python
- def analyze_data_quality(results):
âââAnalyze data quality across multiple tools.âââ if not isinstance(results, list): return
total_papers = len(results) quality_stats = { âhas_abstractâ: 0, âhas_authorsâ: 0, âhas_doiâ: 0, âhas_citationsâ: 0, âhas_keywordsâ: 0 }
for paper in results: if âdata_qualityâ in paper:
for field, available in paper[âdata_qualityâ].items():
- if field in quality_stats and available:
quality_stats[field] += 1
print(fâData Quality Analysis ({total_papers} papers):â) for field, count in quality_stats.items(): percentage = (count / total_papers) * 100 print(fâ {field}: {count}/{total_papers} ({percentage:.1f}%)â)
# Use with any search results result = tu.run({
ânameâ: âopenalex_literature_searchâ, âargumentsâ: { âsearch_keywordsâ: âmachine learningâ, âmax_resultsâ: 5 }
})
- if isinstance(result, list):
analyze_data_quality(result)