Find and Use Data

Bild Screen mit Plattform re3data

Researchers produce a lot of data every day. Some of that data is deposited in so-called repositories, i.e. digital archives. There are thousands of different repositories that are openly accessible and that you can choose from to deposit your own data or use to find existing datasets.

How Do I Find Existing Data?

There are generic, institutional and discipline-specific repositories. The best way to find existing data for reuse is by searching in discipline-specific repositories. How to start your search:
 

  • Ask the community: Which repositories are generally used by other researchers in your discipline?
     
  • re3data.org is another good starting point for your research. re3data.org is currently the most important and largest registry of research repositories world-wide.

A Quick Way to Relevant Data

Use the following filters on re3data.org

  • AID (author identifier) = authors are uniquely identifiable
  • Data Licenses = data are licensed for reuse
  • Data Access = data are openly accessible
  • Metadata Standards = established metadata standards are used
  • PID (persistent identifier) = the repository assigns persistent identifiers to its objects, e.g. DOIs.

How Do I Use and Process Data?

In order to be able to work with your (found) data, you have to first preprocess it for further analysis. Such preprocessing can include.

1. Prepare and save data

  • digitize (e.g. digital editions)
  • translate and transcribe
  • store on servers for further analysis
  • validate
  • reprocess (e.g. change format or use subparts)
  • anonymize

2. Document and describe data

  • How was the data annotated?
  • In what format is the data available?
  • Which software was used?
  • How does the data link to other existing data?
  • What is in the data (e.g. variables, content)?
  • Provenance of the data (why was it created and by whom)?

3. Analyse the data

In this stage, you interpret and visualize the data, write up results for publication and cite data sources. You also prepare the data for sharing, publishing and long-term archiving.


How to Cite Data Sources?

Sometimes the repository indicates how the data should be cited. In all other cases we recommend the the specification by DataCite (p. 9):
 

Creator (PublicationYear): Title. Publisher. (resource type). Identifier. 

Example:
Max Muster (01.01.2019): Survey about xy. MegaRepository. Dataset. http://doi.org/12.3456/7.8910.