Data Documentation

Good data documentation is essential for research reproducibility and data reusability. Data documentation provides information about the context, the structure, the provenance and the content of a dataset (or a file) with the aim to increase its usefulness. Data documentation is therefore a crucial part of making data FAIR.

What Is Data Documentation?

Data documentation is also sometimes called metadata — data about data. Metadata describes basic characteristics of the data, such as:

Who created the data?
What does the data file contain?
When was the data generated?
Where was the data generated?
Why was the data generated?
How was the data generated?

Metadata or Data Documentation?

Metadata can either be maintained through a data archive/repository where you have to describe the characteristics of the data according to the information the repository requires from you.

Alternatively, you can create a data documentation (README file), which contains additional information for the reuse of your data.

As a rule, both are recommended: the information in the data repository is machine-readable and can thus be used for meta-analyses, while the README file facilitates the further use of the data by humans.

Start with data documentation when collecting the data.

How to Create Data Documentation?

Start your data documentation already when you collect your data. This will make it easier for you to track the complete data generation process later and will help you to create well-structured data documentation at the time of publishing.

Structure the documentation the first time: It is not necessary to have your data documentation fully structured right from the start. However, certain structures can help you gather all the metadata you need for your data to be reusable from the start.

The Stanford Libraries provide a good introduction.

Use metadata standards: Well-structured metadata or data documentation supports the long-term discoverability, understandability, and preservation of your research data. Discipline-specific repositories typically require highly structured metadata to enable highly granular searching of the repository.

Use templates to create your metadata.

Templates for Data Documentation

Templates for creating data documentation can be found here:

Cornell University's README file : this is a Word document that asks the most important questions for comprehensive data documentation. From this, you can then generate a PDF and share it together with your data.

CESSDA Metadata Schema : This template allows you to capture project-level information about your data. To do this, answer the questions under "Project-level documentation".

DataCite Metadata Generator: This online tool lets you create XML-based data documentation for you based on the questions you answer in the generator.

Metadata Standards

Metadata standards are also referred to as "schemas". Schemas can be either generic or discipline-specific.

Well-known metadata standards include DublinCore — a set of 15 terms (such as creator, title, etc.). The Data Documentation Initiative (DDI) provides an XML-based schema for the content, transport, representation, and archiving of metadata in the social sciences. To find discipline-specific metadata schemas, look at: