Header

Search

Text & Data Mining

What Is Text and Data Mining?

Large collections of texts are systematically searched and analyzed to identify unknown, overlapping patterns and relations (dependency chains) (e.g., in computational linguistics, biomedicine, or bioinformatics).

What Is the Problem with This?

This text and/or data extraction is based on the systematic duplication and storage of content that may be protected by copyright. For this reason, providers usually classify this practice as impermissible. Publishers monitor access frequency and will block access for individual IP addresses or even the entire UZH IP range in such cases. This applies even if the provider does not explicitly publish download limits on its website or in its contract terms.

What Are You Allowed to Do?

Text and data mining (TDM) often requires additional license-based access via an application programming interface (API). APIs provide TDM-capable infrastructures and machine-readable data formats. In addition to openly available data services, some can be used under the UZH-ZB licenses, sometimes with an individual API key. 

Please review the list of APIs below, along with the corresponding legal and technical usage conditions, before starting any project. If a platform is not listed or if you are unsure whether access is permitted, please contact us early at emedia@ub.uzh.ch.

Cambridge University Press

License

✅ Included in the UZH-ZB license

✅ Non‑commercial research, teaching, and learning purposes only

✅ Full texts accessible within existing subscriptions 

Content & formats 

  • Cambridge Core full texts and metadata from journals, e-books, and most reference works (coverage may depend on the specific titles for content licensed via Zentralbibliothek)
  • No REST API is provided: access via browser
  • Formats: HTML, PDF (XML upon request)
Limitations

❌ bulk download is not permitted and monitored.

⚠️Excessive downloads may result in blocking access for all users.

⚠️For bulk TDM projects, see below.

Documentation Text and Data Mining Policy and F&Q 

How to request access 

No API key is required. Contact e-medien@zb.uzh.ch

Large-scale or bulk TDM projects (XML feed) require prior permission from the publisher. Contact e-medien@zb.uzh.ch with details to:

  • Scope of content required, e.g., journals, books, collections
  • Approximate volume and timeline
  • Preferred file format, e.g., XML and/or PDF
  • Whether you have an FTP server available 

EBSCO Databases

License

❌ As an aggregator, EBSCO integrates content from multiple publishers. For copyright reasons, text and data mining (TDM) is not permitted. 

✅The API is intended for the integration of search, discovery, and bibliographic metadata functions from licensed EBSCOhost databases into local research and teaching applications. 

Content & formats 

  • Bibliographic metadata (title, authors, abstracts, subject terms), result lists  
  • Depending on the license, access to full texts  
  • Output formats: XML  
  • Access via REST or SOAP 
Limitations

This API is not identical to the EDS API (EBSCO Discovery Service).

API usage requires a local technical infrastructure for deploying and using the EBSCO Integration Toolkit (EIT). 

Not all EBSCOhost databases are API‑enabled (see list)

Redistribution and storage of full texts are restricted and subject to copyright conditions (see copyright notice in the respective full text). 

Bulk downloads, analysis of large full‑text corpora, and training of AI models are not permitted. 

Documentation

EBSCOhost API (EBSCO Connect) 
Making Requests with REST 
Making Requests with SOAP 

How to request access 

Requires an EBSCOhost / EIT profile. Please contact your library: emedia@ub.uzh.ch / e‑medien@zb.uzh.ch

EMBASE (Elsevier)

License

❌ API usage is not included in the UZH/ZB license. Text and data mining is strictly prohibited. 

Factiva (DowJones)

License

❌ API usage is not included in the UZH-ZB license. Text and data mining is strictly prohibited. 

Limitations

⚠️ Excessive downloads may result in blocking access for all users. Dow Jones does not publish numeric limits. Only download as many articles as you can read without automated tools. 

IEEE Xplore

License

✅ Included in the UZH-ZB license. Depending on TDM volume, costs may occur.

✅ Non-commercial research, teaching, and learning purposes only.

⚠️Using AI with licensed full texts (Journals) 

Content & formats 

  • Metadata Search REST API with abstracts (XML, JSON)  
  • Open Access API: OA full-text articles 
  • Full‑Text Access API: licensed full texts (PDF, XML) 
  • DOI Lookup API: metadata queries 
  • Dynamic query tool for basic searches; software development kits for PHP, Python 3, Python 2 and Java 
Limitations

DOI Lookup API: max. 25 DOIs/query; no numerical rate limit, typical API throttling applies

Documentation

Availabe APIs & Use Cases 
Interactive Documentation 

How to request access 

API key required on registration. Requests must be made to emedia@ub.uzh.ch. Use is subject to acceptance of the terms of use. Please submit a project description and the planned data‑mining scope for provider approval. 

How To Get Started

IOP

License

✅ Included in the UZH-ZB license

✅ Non-commercial research, teaching, and learning purposes only.

⚠️Using AI with licensed full texts (Journals) 

Content & formats 

  • Direct requests using a DOI list: metadata (XML) are free of charge; full texts (XML: for more recent articles, excluding the Conference Series; PDF: available for most articles) are fee‑based. 
  • Data delivery via sFTP. 
Limitations

Systematic downloading (scraping) will be blocked.

Documentation

TDM Policy 

How to request access 

Requests must be made to IOP with project details and required formats (PDF/XML). Please provide: name, email address, licensing institution (UZH), planned data-mining scope (list of DOIs, date ranges per journal). Contact: contentsupport@ioppublishing.org Cc. emedia@ub.uzh.ch.

LexisNexis API Web Services

License

✅ Included in the UZH-ZB license

✅ Non-commercial research, teaching, and learning purposes only 

Content & formats 

REST API for LexisNexis full texts and alerts (based on a saved search, a topic, a publication or a regulatory category).

Formats: JSON (data exchange), XML 

Limitations

12'000 search queries; 600'000 Dokumente/24h 

No bulk download

Documentation

Good Python skills are required. The provider does not supply documentation. A Jupyter Notebook developed by the UB Economics is available upon request. 

How to request access 

⚠️ To obtain API access, a usage agreement must be signed with the University Library and the Zentralbibliothek Zürich. Please send your project description to betriebswirtschaft@ub.uzh.ch

Oxford University Press

License

Content to follow

Content & formats 

Content to follow
Limitations

Content to follow

Documentation

Content to follow

How to request access 

Content to follow

Reaxys (Elsevier)

License

❌ API usage is not included in the UZH-ZB license. Text and data mining is strictly prohibited. 

Science Direct API (Elsevier)

License

✅ Included in the UZH-ZB license

✅ Non-commercial research, teaching, and learning purposes only. Full texts within existing subscriptions.

⚠️ Using AI with licensed full texts  

Content & formats 

REST-APIs: 

  • ScienceDirect Search v2 for metadata search (JSON/XML) 
  • Article Retrieval API for detailed metadata and full texts of articles or book chapters (structured XML, plain text), where access rights permit 
  • ❌ Using the EMBASE, Reaxys and SciVal APIs requires an additional paid license. Scopus API: see below 

Fetch-API: ScienceDirect Journals Data for DOI lookup 

Limitations

Quota limits reset every 7 days (limits differ by API; see Response-Header for details) 

TDM permitted only via the API (no web scraping) 

Documentation

ScienceDirect Search v2 API 

Article Retrieval 

ScienceDirect Journals Data 

Übersicht APIs 

Software development kit for Elsevier Developers (GitHub) 

How to request access 

An account in the Elsevier Developer Portal and an API key are required, as well as an X‑ELS‑Insttoken for off‑campus access (VPN and EZproxy are not supported). (How to Get Started

Please contact your library: emedia@ub.uzh.ch / e-medien@zb.uzh.ch

SciVal (Elsevier)

License

❌ UZH‑ZB does not have access to either the API or the user interface. 

Scopus (Elsevier)

License

✅ Included in the UZH-ZB license

✅ Non-commercial research, teaching, and learning purposes only.  

Content & formats 

REST API for metadata and abstracts with full texts where access rights permit
Limitations

Quota limits reset every 7 days (limits differ by API; see Response-Header for details) 

TDM permitted only via the API (no web scraping) 

Documentation

Scopus Search API 

Available APIs 

Software development kit for Elsevier Developers (GitHub) 

How to request access 

An account in the Elsevier Developer Portal and an API key are required, as well as an X‑ELS‑Insttoken for off‑campus access (VPN and EZproxy are not supported). (How to Get Started

Please contact your library: emedia@ub.uzh.ch / e-medien@zb.uzh.ch 

Springer Nature

License

✅ Included in the UZH-ZB license

✅ Non-commercial research, teaching, and learning purposes only. Full texts within existing subscriptions.

⚠️Using AI with licensed full texts (E-books) 

Content & formats 

  • Meta API: metadata and abstracts (journal articles, e‑books, protocols), free of charge 
  • Open Access API: Open‑access full texts, free of charge 
  • Full Text API: Licensed full‑text corpus for TDM projects (JATS XML), fee‑based
Limitations

1 query/sec without API key; 150 queries/min with API key. API account limits: Basic = 100 records, Premium = 500 records per pagination cycle

Documentation

Springer Nature TDM Policy  

Getting API Access 

API-Client for TDM available for installation (full-featured Python client library) 

Python Wrapper covering all APIs (with demo and example code) 

API subscription plans 

How to request access 

API key required 

Swissdox

License

❌API usage is not included in the UZH-ZB license. See Specialized Databases (Law). Two versions are available: Swissdox Essentials (with legal restrictions) and Swissdox Professional (more extensive search capabilities). 
✅For bulk downloads or TDM projects, the Swissdox@LiRI API developed by the Linguistic Research Infrastructure (LiRI) must be used. Non‑commercial research only. No redistribution to third parties. Local hosting of the data is required. 

Content & formats 

Swissdox@LiRI API provides query‑specific full texts subject to copyright restrictions. Most articles are in German and French, with fewer in Italian, Romansh, and English, primarily from the last 25 years (media coverage). Queries are submitted in YAML format; results are made available via a download link. 

Limitations

For larger research projects, sufficient computing capacity is required, query runtime scales with the volume data. 

Documentation

LiRI info page (LiRI) 

API Swissdox@LiRI 

API-Wiki 

Questions regarding queries should be directed to Swissdox@LiRI‑Plattform

How to request access 

Project registration required: see How to Get Started. Please observe the terms of use. 

Swissdox@LiRI: Login is restricted to members of supporting institutions or via a project voucher. 
Swissdox Professional (UZB-ZB license): temporary access (5 days) available upon request (e-medien@zb.uzh.ch

Web of Science Expanded (Clarivate)

License

❌ API usage is not included in the UZH-ZB license. 

Content & formats 

Full bibliographic datasets in JSON from the Web of Science Core Collection with times-cited counts of Web of Science documents. Suitable for TDM projects. 

Limitations

5 queries/sec, 5’000/24h

Documentation

API Expanded

Available APIs

How to request access 

Access requires a Developer Portal account and an API key with UZH email address (How to get started). Alternatively, you can log into your WoS account (login).  

Register the application, then request an API key. Select and subscribe to the desired API plan

⚠️ You must accept the Terms of Use and Product / Service Terms. Access credentials may require administrative approval.  

Web of Science Starter API (Clarivate)

License

✅ Included in the UZH-ZB license

✅ Non-commercial research, teaching, and learning purposes only. 

Content & formats 

Does not provide complete bibliographic records. Intended for metadata checks, basic searches and validation. 

Queryable metadata includes DOI, author names, source title / journal title, basic publication information (year, volume, issue, pages), abstract (available in many but not all cases), ISSN, and ISBN. 

Output format: JSON

Limitations

5 queries/sec, 5’000/24h

Documentation

Starter API

Available APIs

How to request access 

Access requires a Developer Portal account and an API key with UZH email address (How to get started). Alternatively, you can log into your WoS account (login).  

Register the application, then request an API key. Select and subscribe to the desired API plan

⚠️ You must accept the Terms of Use and Product / Service Terms. Access credentials may require administrative approval.

Wiley Cochrane Library

License

❌ API usage is not included in the UZH-ZB license. Text and data mining is strictly prohibited. 

Wiley Online Library

License

✅ Included in the UZH-ZB license. Non-Journals (e-books & reference works) not included

✅ Non-commercial research, teaching, and learning purposes only. Full texts within existing subscriptions.

⚠️Using AI with licensed full texts (Journals) 

Content & formats 

  • Journal PDF full texts 
  • DOI‑based queries 
  • PDF downloads for individual or multiple articles (bulk) 
  • TDM API endpoint with optional Python client  
Limitations

3 queries/sec  

Non-journal TDM projects require a paid TDM agreement (XML feed) (see below) 

Documentation

Wiley TDM Policy 

API GitHub Documentation

Python Client 

How to request access 

  • A Crossref TDM token is required for text and data mining. The use of Text and Data Mining is governed by the UZH‑ZB consortial framework agreement and follows the Wiley Text and Data Mining Agreement. However, accepting this Wiley click‑through license is necessary in order to obtain an API token.(How To Get Started

Additional Information

Questions about Text & Data Mining?

Wikipedia Articles