Codebook

A codebook is a reference document that describes the content, structure, and layout of a dataset. It provides essential information about the study, data file(s), variables, categories, and other elements that make up a complete dataset. Its primary purpose is to make data understandable and reusable.

A codebook typically explains:

  • What the dataset contains
  • How the data were collected or created
  • What each variable or field represents
  • How values are coded
  • Any special considerations or limitations

Codebooks are widely used across disciplines, including social sciences, humanities, health sciences, engineering, and natural sciences. They are particularly useful when working with structured or tabular data but can also be adapted for qualitative, experimental, or computational datasets.

Why create a codebook?

Creating a readable codebook to accompany your dataset contributes to making your data well understood and reusable well into the future. It provides authoritative, citable information, as well as guidance on how to read, analyse, interpret, and verify data for accuracy and replication purposes.

A well-maintained codebook also supports the FAIR principles (Findable, Accessible, Interoperable, Reusable): it makes your data interpretable by others without requiring direct contact with the original researcher, and it facilitates data archiving and sharing.

Without a codebook, datasets can quickly become difficult (or even impossible) to interpret.

When Should You Create a Codebook?

Ideally, you should begin creating your codebook at the start of your project and update it throughout the research lifecycle. This ensures that important information is not lost and reduces documentation work at the end of the project.

What Should a Codebook Contain?

The level of detail will depend on your discipline and project, but a codebook commonly includes the following sections:

Dataset Information
  • Dataset title
  • Project description
  • Authors and contributors
  • Institution
  • Contact information
  • Date(s) of data collection or creation
  • Version information
  • Related publications or outputs
Data Collection or Creation
  • Methodology or experimental design
  • Instruments or tools used
  • Software used
  • Sampling strategy (if applicable)
  • Data sources
  • Processing steps (if relevant)
File Structure

Describe the organization of your data:

  • File names and descriptions
  • Folder structure
  • Relationships between files
  • File formats

Variable-Level Documentation

Variable name
The short name as it appears in the dataset (usually no spaces or special characters)

Variable label / description
A description of what the variable measures

Variable type
Whether the variable is categorical (nominal/ordinal), continuous, binary, text, date, etc.

Values and value labels
The possible values (e.g. numeric codes) and their meaning (e.g. 1 = « Female », 2 = « Male », 3 = « Non-binary »)

Unit of measurement
The unit in which the variable is expressed (e.g. kg, %, number of participants)

Missing value codes
How missing data are coded, and their type (e.g. non-response, filter question, data entry error, etc.)

Derived/computed variables
Whether the variable results from a computation, recoding, standardisation, or transformation of other variables

Source or classification
Whether the variable is based on an existing standard, classification, or external source (e.g. ISCO, NUTS, ICD codes)

Variable base / universe
Which observations or sub-population the variable applies to (e.g. applies only to respondents aged 18+)

Links to related variables
Whether the variable must be interpreted alongside other variables (e.g. components of a multiple-choice question, follow-up questions)

Weights
Whether any weighting variable applies, how it was constructed, and when it should be used

ID variables
Which variable(s) serve as unique observation identifiers

Technical information
Variable format, width, number of decimals, software-specific encoding details

Codebooks for qualitative data

Codebooks are not limited to quantitative research. When coding qualitative data, a codebook serves to clearly document the meaning of the codes you create. In this context, a codebook typically includes:

  • Code name: A short label for the code
  • Definition: A precise description of what the code captures
  • Inclusion and exclusion criteria: What kinds of content should (and should not) be assigned this code
  • Example quotes or passages: Illustrative excerpts from the data
  • Notes: Relationships to other codes, hierarchy, or evolution of the code during the analysis

Qualitative analysis tools (e.g. Atlas.ti, NVivo, and MAXQDA) allow users to create a codebook by exporting codes and related comments.

When depositing data in a repository, include your codebook as a separate file alongside the dataset.

Tools

Several tools can help you create or generate a codebook:

  • SPSS — can generate a codebook automatically from a .sav file via Analyze > Reports > Codebook or using the DISPLAY DICTIONARY command
  • Stata / SAS / R — offer built-in or package-based commands to extract variable metadata
  • Colectica — a codebook creation and management tool supporting the DDI standard, with a plugin for Excel
  • Nesstar Publisher — produces PDF codebooks and DDI-Codebook XML
  • DDI standard — if you want your codebook to be machine-readable and interoperable, the Data Documentation Initiative (DDI) provides a widely used metadata standard for describing social, behavioural, and other research data

Useful resources

Plus d’articles sur cette thématique

  • Illustration de l’article Introduction to RDM and FAIR data

    Introduction to RDM and FAIR data

    Research data management
  • Illustration de l’article Can I / Should I share my data openly?

    Can I / Should I share my data openly?

    Research data management
  • Illustration de l’article License your data

    License your data

    Research data management
  • Illustration de l’article Data Sharing Agreement

    Data Sharing Agreement

    Research data management
  • Illustration de l’article Add an embargo period

    Add an embargo period

    Research data management
  • Illustration de l’article Select data for publication

    Select data for publication

    Research data management
  • Illustration de l’article Publish a data paper

    Publish a data paper

    Research data management
  • Illustration de l’article Add metadata

    Add metadata

    Research data management
  • Illustration de l’article Choose a data repository

    Choose a data repository

    Research data management
  • Illustration de l’article Research Involving Human Cells or Tissues

    Research Involving Human Cells or Tissues

    Research data management
  • Illustration de l’article Research involving Animals

    Research involving Animals

    Research data management
  • Illustration de l’article Research on human participants

    Research on human participants

    Research data management
  • Illustration de l’article Ethics

    Ethics

    Research data management
  • Illustration de l’article Going further

    Going further

    Research data management
  • Illustration de l’article Type, format and volume of data

    Type, format and volume of data

    Research data management
  • Illustration de l’article Data Quality

    Data Quality

    Research data management
  • Illustration de l’article File Organization and Naming Conventions

    File Organization and Naming Conventions

    Research data management
  • Illustration de l’article Metadata

    Metadata

    Research data management
  • Illustration de l’article Document your data

    Document your data

    Research data management
  • Illustration de l’article Search for existing datasets

    Search for existing datasets

    Research data management
  • Illustration de l’article Sampling strategies

    Sampling strategies

    Research data management
  • Illustration de l’article Questionnaire design

    Questionnaire design

    Research data management
  • Illustration de l’article Compass to Research Data Management

    Compass to Research Data Management

    Research data management
  • Illustration de l’article Experimental planning

    Experimental planning

    Research data management
  • Illustration de l’article Write your DMP on DMPonline.be

    Write your DMP on DMPonline.be

    Research data management
  • Illustration de l’article Plan data management cost

    Plan data management cost

    Research data management
  • Illustration de l’article Data Management Plan (DMP)

    Data Management Plan (DMP)

    Research data management
  • Illustration de l’article Research Data Management

    Research Data Management

    Research data management
  • Illustration de l’article FAIR data principles

    FAIR data principles

    Research data management
  • Illustration de l’article Data Cleaning

    Data Cleaning

    Research data management
  • Illustration de l’article Data Collection

    Data Collection

    Research data management
  • Illustration de l’article Publish and share your data

    Publish and share your data

    Research data management
  • Illustration de l’article Qui sont vos personnes ressources pour la gestion des données de recherche ? DPOs

    Qui sont vos personnes ressources pour la gestion des données de recherche ? DPOs

    Research data management
  • Illustration de l’article Managing Your Research Data

    Managing Your Research Data

    Research data management
  • Illustration de l’article Qui sont vos personnes ressources pour la gestion des données de recherche ?

    Qui sont vos personnes ressources pour la gestion des données de recherche ?

    Research data management