Text & Data Mining
What Is Text and Data Mining?
Large collections of texts are systematically searched and analyzed to identify unknown, overlapping patterns and relations (dependency chains) (e.g., in computational linguistics, biomedicine, or bioinformatics).
What Is the Problem with This?
This text and/or data extraction is based on the systematic duplication and storage of content that may be protected by copyright. For this reason, providers usually classify this practice as impermissible. Publishers monitor access frequency and will block access for individual IP addresses or even the entire UZH IP range in such cases. This applies even if the provider does not explicitly publish download limits on its website or in its contract terms.
What Are You Allowed to Do?
Text and data mining (TDM) often requires additional license-based access via an application programming interface (API). APIs provide TDM-capable infrastructures and machine-readable data formats. In addition to openly available data services, some can be used under the UZH-ZB licenses, sometimes with an individual API key.
Please review the list of APIs below, along with the corresponding legal and technical usage conditions, before starting any project. If a platform is not listed or if you are unsure whether access is permitted, please contact us early at emedia@ub.uzh.ch.
Cambridge University Press
| License |
✅ Included in the UZH-ZB license ✅ Non‑commercial research, teaching, and learning purposes only ✅ Full texts accessible within existing subscriptions |
|
Content & formats |
|
| Limitations |
❌ bulk download is not permitted and monitored. ⚠️Excessive downloads may result in blocking access for all users. ⚠️For bulk TDM projects, see below. |
| Documentation | Text and Data Mining Policy and F&Q |
|
How to request access |
No API key is required. Contact e-medien@zb.uzh.ch Large-scale or bulk TDM projects (XML feed) require prior permission from the publisher. Contact e-medien@zb.uzh.ch with details to:
|
EBSCO Databases
| License |
❌ As an aggregator, EBSCO integrates content from multiple publishers. For copyright reasons, text and data mining (TDM) is not permitted. ✅The API is intended for the integration of search, discovery, and bibliographic metadata functions from licensed EBSCOhost databases into local research and teaching applications. |
|
Content & formats |
|
| Limitations |
This API is not identical to the EDS API (EBSCO Discovery Service). API usage requires a local technical infrastructure for deploying and using the EBSCO Integration Toolkit (EIT). Not all EBSCOhost databases are API‑enabled (see list) Redistribution and storage of full texts are restricted and subject to copyright conditions (see copyright notice in the respective full text). Bulk downloads, analysis of large full‑text corpora, and training of AI models are not permitted. |
| Documentation |
EBSCOhost API (EBSCO Connect) |
|
How to request access |
Requires an EBSCOhost / EIT profile. Please contact your library: emedia@ub.uzh.ch / e‑medien@zb.uzh.ch |
EMBASE (Elsevier)
| License |
❌ API usage is not included in the UZH/ZB license. Text and data mining is strictly prohibited. |
Factiva (DowJones)
| License |
❌ API usage is not included in the UZH-ZB license. Text and data mining is strictly prohibited. |
| Limitations |
⚠️ Excessive downloads may result in blocking access for all users. Dow Jones does not publish numeric limits. Only download as many articles as you can read without automated tools. |
IEEE Xplore
| License |
✅ Included in the UZH-ZB license. Depending on TDM volume, costs may occur. ✅ Non-commercial research, teaching, and learning purposes only. |
|
Content & formats |
|
| Limitations |
DOI Lookup API: max. 25 DOIs/query; no numerical rate limit, typical API throttling applies |
| Documentation | |
|
How to request access |
API key required on registration. Requests must be made to emedia@ub.uzh.ch. Use is subject to acceptance of the terms of use. Please submit a project description and the planned data‑mining scope for provider approval. |
IOP
| License |
✅ Included in the UZH-ZB license ✅ Non-commercial research, teaching, and learning purposes only. |
|
Content & formats |
|
| Limitations |
Systematic downloading (scraping) will be blocked. |
| Documentation | |
|
How to request access |
Requests must be made to IOP with project details and required formats (PDF/XML). Please provide: name, email address, licensing institution (UZH), planned data-mining scope (list of DOIs, date ranges per journal). Contact: contentsupport@ioppublishing.org Cc. emedia@ub.uzh.ch. |
LexisNexis API Web Services
| License |
✅ Included in the UZH-ZB license ✅ Non-commercial research, teaching, and learning purposes only |
|
Content & formats |
REST API for LexisNexis full texts and alerts (based on a saved search, a topic, a publication or a regulatory category). Formats: JSON (data exchange), XML |
| Limitations |
12'000 search queries; 600'000 Dokumente/24h No bulk download |
| Documentation |
Good Python skills are required. The provider does not supply documentation. A Jupyter Notebook developed by the UB Economics is available upon request. |
|
How to request access |
⚠️ To obtain API access, a usage agreement must be signed with the University Library and the Zentralbibliothek Zürich. Please send your project description to betriebswirtschaft@ub.uzh.ch. |
Oxford University Press
| License |
Content to follow |
|
Content & formats |
Content to follow |
| Limitations |
Content to follow |
| Documentation |
Content to follow |
|
How to request access |
Content to follow |
Reaxys (Elsevier)
| License |
❌ API usage is not included in the UZH-ZB license. Text and data mining is strictly prohibited. |
Science Direct API (Elsevier)
| License |
✅ Included in the UZH-ZB license ✅ Non-commercial research, teaching, and learning purposes only. Full texts within existing subscriptions. |
|
Content & formats |
REST-APIs:
Fetch-API: ScienceDirect Journals Data for DOI lookup |
| Limitations |
Quota limits reset every 7 days (limits differ by API; see Response-Header for details) TDM permitted only via the API (no web scraping) |
| Documentation | |
|
How to request access |
An account in the Elsevier Developer Portal and an API key are required, as well as an X‑ELS‑Insttoken for off‑campus access (VPN and EZproxy are not supported). (How to Get Started) Please contact your library: emedia@ub.uzh.ch / e-medien@zb.uzh.ch |
SciVal (Elsevier)
| License |
❌ UZH‑ZB does not have access to either the API or the user interface. |
Scopus (Elsevier)
| License |
✅ Included in the UZH-ZB license ✅ Non-commercial research, teaching, and learning purposes only. |
|
Content & formats |
REST API for metadata and abstracts with full texts where access rights permit |
| Limitations |
Quota limits reset every 7 days (limits differ by API; see Response-Header for details) TDM permitted only via the API (no web scraping) |
| Documentation | |
|
How to request access |
An account in the Elsevier Developer Portal and an API key are required, as well as an X‑ELS‑Insttoken for off‑campus access (VPN and EZproxy are not supported). (How to Get Started)
Please contact your library: emedia@ub.uzh.ch / e-medien@zb.uzh.ch |
Springer Nature
| License |
✅ Included in the UZH-ZB license ✅ Non-commercial research, teaching, and learning purposes only. Full texts within existing subscriptions. |
|
Content & formats |
|
| Limitations |
1 query/sec without API key; 150 queries/min with API key. API account limits: Basic = 100 records, Premium = 500 records per pagination cycle |
| Documentation |
API-Client for TDM available for installation (full-featured Python client library) Python Wrapper covering all APIs (with demo and example code) |
|
How to request access |
API key required |
Swissdox
| License |
❌API usage is not included in the UZH-ZB license. See Specialized Databases (Law). Two versions are available: Swissdox Essentials (with legal restrictions) and Swissdox Professional (more extensive search capabilities). |
|
Content & formats |
Swissdox@LiRI API provides query‑specific full texts subject to copyright restrictions. Most articles are in German and French, with fewer in Italian, Romansh, and English, primarily from the last 25 years (media coverage). Queries are submitted in YAML format; results are made available via a download link. |
| Limitations |
For larger research projects, sufficient computing capacity is required, query runtime scales with the volume data. |
| Documentation |
LiRI info page (LiRI) Questions regarding queries should be directed to Swissdox@LiRI‑Plattform. |
|
How to request access |
Project registration required: see How to Get Started. Please observe the terms of use.
Swissdox@LiRI: Login is restricted to members of supporting institutions or via a project voucher. |
Web of Science Expanded (Clarivate)
| License |
❌ API usage is not included in the UZH-ZB license. |
|
Content & formats |
Full bibliographic datasets in JSON from the Web of Science Core Collection with times-cited counts of Web of Science documents. Suitable for TDM projects. |
| Limitations |
5 queries/sec, 5’000/24h |
| Documentation | |
|
How to request access |
Access requires a Developer Portal account and an API key with UZH email address (How to get started). Alternatively, you can log into your WoS account (login). Register the application, then request an API key. Select and subscribe to the desired API plan. ⚠️ You must accept the Terms of Use and Product / Service Terms. Access credentials may require administrative approval. |
Web of Science Starter API (Clarivate)
| License |
✅ Included in the UZH-ZB license ✅ Non-commercial research, teaching, and learning purposes only. |
|
Content & formats |
Does not provide complete bibliographic records. Intended for metadata checks, basic searches and validation. Queryable metadata includes DOI, author names, source title / journal title, basic publication information (year, volume, issue, pages), abstract (available in many but not all cases), ISSN, and ISBN. Output format: JSON |
| Limitations |
5 queries/sec, 5’000/24h |
| Documentation | |
|
How to request access |
Access requires a Developer Portal account and an API key with UZH email address (How to get started). Alternatively, you can log into your WoS account (login). Register the application, then request an API key. Select and subscribe to the desired API plan. ⚠️ You must accept the Terms of Use and Product / Service Terms. Access credentials may require administrative approval. |
Wiley Cochrane Library
| License |
❌ API usage is not included in the UZH-ZB license. Text and data mining is strictly prohibited. |
Wiley Online Library
| License |
✅ Included in the UZH-ZB license. Non-Journals (e-books & reference works) not included ✅ Non-commercial research, teaching, and learning purposes only. Full texts within existing subscriptions. |
|
Content & formats |
|
| Limitations |
3 queries/sec Non-journal TDM projects require a paid TDM agreement (XML feed) (see below) |
| Documentation | |
|
How to request access |
|