Le 27 janvier 2026
A codebook typically explains:
Codebooks are widely used across disciplines, including social sciences, humanities, health sciences, engineering, and natural sciences. They are particularly useful when working with structured or tabular data but can also be adapted for qualitative, experimental, or computational datasets.
Creating a readable codebook to accompany your dataset contributes to making your data well understood and reusable well into the future. It provides authoritative, citable information, as well as guidance on how to read, analyse, interpret, and verify data for accuracy and replication purposes.
A well-maintained codebook also supports the FAIR principles (Findable, Accessible, Interoperable, Reusable): it makes your data interpretable by others without requiring direct contact with the original researcher, and it facilitates data archiving and sharing.
Without a codebook, datasets can quickly become difficult (or even impossible) to interpret.
Ideally, you should begin creating your codebook at the start of your project and update it throughout the research lifecycle. This ensures that important information is not lost and reduces documentation work at the end of the project.
The level of detail will depend on your discipline and project, but a codebook commonly includes the following sections:
Describe the organization of your data:
Variable name
The short name as it appears in the dataset (usually no spaces or special characters)
Variable label / description
A description of what the variable measures
Variable type
Whether the variable is categorical (nominal/ordinal), continuous, binary, text, date, etc.
Values and value labels
The possible values (e.g. numeric codes) and their meaning (e.g. 1 = « Female », 2 = « Male », 3 = « Non-binary »)
Unit of measurement
The unit in which the variable is expressed (e.g. kg, %, number of participants)
Missing value codes
How missing data are coded, and their type (e.g. non-response, filter question, data entry error, etc.)
Derived/computed variables
Whether the variable results from a computation, recoding, standardisation, or transformation of other variables
Source or classification
Whether the variable is based on an existing standard, classification, or external source (e.g. ISCO, NUTS, ICD codes)
Variable base / universe
Which observations or sub-population the variable applies to (e.g. applies only to respondents aged 18+)
Links to related variables
Whether the variable must be interpreted alongside other variables (e.g. components of a multiple-choice question, follow-up questions)
Weights
Whether any weighting variable applies, how it was constructed, and when it should be used
ID variables
Which variable(s) serve as unique observation identifiers
Technical information
Variable format, width, number of decimals, software-specific encoding details
Codebooks are not limited to quantitative research. When coding qualitative data, a codebook serves to clearly document the meaning of the codes you create. In this context, a codebook typically includes:
Qualitative analysis tools (e.g. Atlas.ti, NVivo, and MAXQDA) allow users to create a codebook by exporting codes and related comments.
When depositing data in a repository, include your codebook as a separate file alongside the dataset.
Several tools can help you create or generate a codebook: