PINDARE

Back to Managing Your Research Data
Back to RDM Home page

This page outlines key considerations to help you manage these differences effectively throughout your research lifecycle.

1. Physical format and structure

Research data often comes as digital files containing numbers or text, but it may also include non-digital or non-standard data formats—such as sound recordings, high-resolution images, video, biological samples, or archaeological artefacts.

Digital files may be:

Structured: Data organized in a tabular or relational model (e.g., spreadsheets, databases)
Unstructured: Content without a predefined schema (e.g., text corpora, multimedia, web content)

Regardless of data format:

Assign unique identifiers to each physical or digital item
Create a digital inventory with detailed descriptions and metadata to support traceability and reuse

2. Volume of data

The volume of data refers to the number of data points, items, or observations you collect—not just their total size in megabytes or gigabytes. Volume influences:

Data cleaning and preprocessing time
Complexity of data modeling or statistical analysis
Required tools and computing capacity

Examples:

High-volume, small-size data: survey responses from thousands of users (lightweight text files)
Low-volume, large-size data: a few high-resolution MRI scans or satellite images (heavy files)

Anticipating volume helps:

Structure your data collection protocol
Select appropriate storage solutions and database models
Determine when automation or advanced data management tools are necessary

3. File size and storage needs

Data size refers to the amount of digital storage space your data occupies. This has direct implications for:

Storage infrastructure: local drives, institutional servers, or cloud services
Data backup strategies
Accessibility and transfer times

Estimate storage requirements in MB, GB, or TB at different project stages. Plan ahead for growth, especially in data-intensive disciplines like genomics, digital imaging, or remote sensing.

Choose efficient file formats for large datasets. Compressed or binary formats can optimize performance without loss of fidelity.

4. Digital file formats

Choosing the right file format ensures long-term usability, interoperability, and preservation. Favor:

Open, non-proprietary formats (e.g., .csv, .xml, .json, .txt)
Lossless compression where data integrity is critical

When proprietary software is required (e.g., .sav, .psd, .mat), also produce:

Portable backups in widely supported formats to safeguard accessibility

Selection criteria include:

Team expertise
Accepted standards in your research community
Compatibility with repository or funder requirements

5. How to select a data format (Adapted from ANDS)

Follow these best practices:

Decide early: Agree on formats before data collection begins
Compare proprietary and open formats for accessibility, functionality, and sustainability
Anticipate obsolescence: software and formats may not be supported forever
Dual-format storage: consider saving in both proprietary and open formats to reduce risk
High-resolution data may require format conversion for online display or transmission
Ask colleagues or your data steward about preferred formats in your field

Recommended universal backup formats: .csv, .tab, .txt, .rtf

> Need help choosing a format? Consult DMP – data formats for preservation

6. Variable types in structured data

Correctly identifying variable types improves how your data is interpreted and analyzed by software tools.

Quantitative variables

Discrete: Whole numbers (e.g., number of publications)
Continuous: Real numbers on a scale (e.g., time, distance)

Qualitative (categorical) variables

Nominal: Unordered categories (e.g., language, country)
Ordinal: Ordered categories (e.g., Likert scale, academic level)

Many tools also support:

String or character variables: Free text entries, notes, open-ended responses

Clearly documenting variable types ensures accurate processing, facilitates interoperability, and supports statistical integrity.

7. Discipline-specific formats and integrated metadata

Certain research disciplines use specialized file formats that already integrate structured metadata directly within the file. These formats:

Facilitate automated metadata extraction
Enhance interoperability with community-specific tools and platforms
Support standardized documentation, boosting reuse and reproducibility

Common examples by discipline:

Discipline	Format	Metadata Features
Social Sciences	DDI (.xml)	Documents study-level metadata, variable-level details, methodology
Genomics / Bioinformatics	FASTQ, BAM, VCF	Includes sequencing information, read quality, genome annotations
Geospatial Sciences	GeoTIFF, NetCDF, Shapefile	Captures geolocation, spatial resolution, time stamps
Digital Humanities	TEI (.xml)	Encodes text structure, annotations, provenance
Engineering / CAD	STEP, IGES, DXF	Stores design metadata, units, geometry standards
Astronomy	FITS	Integrates metadata headers with observational data
Imaging (Medical)	DICOM	Embeds patient, modality, and capture metadata
Environmental Science	HDF5, NetCDF	Handles multidimensional sensor datasets with metadata

Back to Managing Your Research Data
Back to RDM Home page

Plus d’articles sur cette thématique

Le 27 novembre 2025

Research Involving Human Cells or Tissues

Research data management
Le 27 novembre 2025

Research involving Animals

Research data management
Le 27 novembre 2025

Research on human participants

Research data management
Le 26 novembre 2025

Ethics

Research data management
Le 13 novembre 2025

Going further

Research data management
Le 13 novembre 2025

Data Quality

Research data management
Le 13 novembre 2025

File Organization and Naming Conventions

Research data management
Le 13 novembre 2025

Metadata

Research data management
Le 13 novembre 2025

Codebook

Research data management
Le 13 novembre 2025

Document your data

Research data management
Le 13 novembre 2025

Search for existing datasets

Research data management
Le 13 novembre 2025

Sampling strategies

Research data management
Le 13 novembre 2025

Questionnaire design

Research data management
Le 13 novembre 2025

Compass to Research Data Management

Research data management
Le 13 novembre 2025

Experimental planning

Research data management
Le 4 novembre 2025

Write your DMP on DMPonline.be

Research data management
Le 4 novembre 2025

Plan data management cost

Research data management
Le 4 novembre 2025

Data Management Plan (DMP)

Research data management
Le 4 novembre 2025

Research Data Management

Research data management
Le 15 octobre 2025

FAIR data principles

Research data management
Le 3 octobre 2025

Data Cleaning

Research data management
Le 18 septembre 2025

Data Collection

Research data management
Le 18 septembre 2025

Publish and share your data

Research data management
Le 17 septembre 2025

Qui sont vos personnes ressources pour la gestion des données de recherche ? DPOs

Research data management
Le 15 septembre 2025

Managing Your Research Data

Research data management
Le 16 mai 2025

Qui sont vos personnes ressources pour la gestion des données de recherche ?

Open Data