Codebook

A codebook is a document (typically a table) describing the variables found in a data set. Its purpose is to record detailed information on each variable.

Back to Document your data 
Back to Managing Your Research Data
Back to RDM Home page

 

The following information can commonly be found in a codebook:

  • ID variable(s): which variable(s) contain the unique observation identifier (number or alphanumeric combination)?
  • Data collection variables: which variables contain data collection information (date of collection, place, researcher, etc.)?
  • Variable name and description: what is the variable name in the data set? What is its full description? Variable names are typically short to facilitate the analysis and need to follow software-specific rules (for instance not include any special characters or spaces). Full descriptions are useful to identify the variable more in detail and may include definitions or explanations of acronyms. If the variable is a survey question, the exact wording of the question and instructions may also be indicated here.
  • Variable type: is a variable categorical, ordinal, continuous or text? This is important to check that the variable is identified as such in the software used for storage or analysis.
  • Variable values: what are the possible values of the variable (categories or numeric range)? If the variable is categorical, what are the labels corresponding to each category? For instance, gender may be encoded as 1/2, with 1 corresponding to « Women » and 2 to « Men ».
  • Variable unit: what is the variable unit (percentage, kilograms, number of people, etc.)?
  • Missing values: how are missing values indicated? This is important to check that the values are identified as such in the software used for storage or analysis. Different types of missing values can be indicated in different ways, for instance to distinguish observations for which a specific variable should be empty (for consistency reasons or due to a filter) from variables where a value was expected but none was encoded (data input mistake, non-response, etc.).
  • Variable processing: is the variable the result of a data processing step? Is it a score, index or the results of a computation? Was it recoded based on other variables? Was it standardised or otherwise transformed?
  • Variable base: which population is the variable based on? Is the data filtered or limited to a sub-group of observations? What is the base size?
  • Variable links: is the variable standalone or should it be analysed together with other variables? For instance, a multiple-choice question in a survey needs to be encoded in several related variables and a follow-up question needs to be analysed taking into account the previous answer.
  • Weights: are there any weight variables? How were they created? When should they be used?
  • Typologies or classifications: is the variable based on an existing classification? What is it and what are the sources or references?
  • Technical information: what is the variable width and specific variable type in the software used for storage/analysis? What are the decimal and thousands separators? What is the number of decimals?

Useful resources:

A semi-automated codebook generator plugin for Excel: https://www.colectica.com/

Additional reading:

https://ukdataservice.ac.uk/media/622417/managingsharing.pdf

https://libguides.library.kent.edu/SPSS/Codebooks

http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/CodebookCookbook/CodebookCookbook.pdf

 

Back to Document your data
Back to Managing Your Research Data
Back to RDM Home page

Plus d’articles sur cette thématique

  • Illustration de l’article Going further

    Going further

    Research data management
  • Illustration de l’article Type, format and volume of data

    Type, format and volume of data

    Research data management
  • Illustration de l’article Data Quality

    Data Quality

    Research data management
  • Illustration de l’article File Organization and Naming Conventions

    File Organization and Naming Conventions

    Research data management
  • Illustration de l’article Metadata

    Metadata

    Research data management
  • Illustration de l’article Document your data

    Document your data

    Research data management
  • Illustration de l’article Search for existing datasets

    Search for existing datasets

    Research data management
  • Illustration de l’article Sampling strategies

    Sampling strategies

    Research data management
  • Illustration de l’article Questionnaire design

    Questionnaire design

    Research data management
  • Illustration de l’article Compass to Research Data Management

    Compass to Research Data Management

    Research data management
  • Illustration de l’article Experimental planning

    Experimental planning

    Research data management
  • Illustration de l’article Write your DMP on DMPonline.be

    Write your DMP on DMPonline.be

    Research data management
  • Illustration de l’article Plan data management cost

    Plan data management cost

    Research data management
  • Illustration de l’article Data Management Plan (DMP)

    Data Management Plan (DMP)

    Research data management
  • Illustration de l’article Research Data Management

    Research Data Management

    Research data management
  • Illustration de l’article FAIR data principles

    FAIR data principles

    Research data management
  • Illustration de l’article Data Cleaning

    Data Cleaning

    Research data management
  • Illustration de l’article Data Collection

    Data Collection

    Research data management
  • Illustration de l’article Publish and share your data

    Publish and share your data

    Research data management
  • Illustration de l’article Qui sont vos personnes ressources pour la gestion des données de recherche ? DPOs

    Qui sont vos personnes ressources pour la gestion des données de recherche ? DPOs

    Research data management
  • Illustration de l’article Managing Your Research Data

    Managing Your Research Data

    Research data management
  • Illustration de l’article Qui sont vos personnes ressources pour la gestion des données de recherche ?

    Qui sont vos personnes ressources pour la gestion des données de recherche ?

    Open Data