Le 13 novembre 2025
Back to RDM Home Page Back to Data collection
A major step in preparing for data collection is identifying the population of interest. This means determining what unit or observation should be studied, as well as specifying the relevant timeframe, geographical location, or study conditions. For instance, the population of interest could be rainbow trout over 20 cm found in the Mackenzie River between 2016 and 2018, or Belgian men who are patients at a specific hospital and take medication for high blood pressure.
In some cases, the entire population can be studied. However, this is often not feasible due to logistical, time, budgetary, or ethical constraints. In such cases, a sampling stage is necessary. Sampling involves selecting a subset of the population that will be used to estimate the characteristics of the whole.
To sample from a population, a sampling frame must be identified. This is a list of all units in the population from which a sample can be drawn. For example, possible sampling frames for the above cases might be all trout in the Mackenzie River that meet the criteria, or a list of patient addresses obtained from the hospital.
An important characteristic of a sample is its representativity. If the sample is to be used to estimate characteristics of the entire population, it must adequately reflect the population’s traits. This means all observations in the sample must belong to the population of interest and capture its diversity.
Ideally, a sample is representative of the population in terms of all relevant variables. Random sampling is one method that helps achieve this. When random sampling is not possible or appropriate, alternative strategies aim to ensure representativity based on key parameters. For instance, a sample of rainbow trout might be representative in terms of weight, gender, or age. A sample of high blood pressure male patients might be representative in terms of age, body mass index, and education level.
Sample size directly affects the accuracy and validity of research results. It is important to determine in advance what constitutes a sufficient sample size for your study.
Ideal sample size can be calculated before data collection based on factors such as expected effect size, population variability, desired level of significance, and acceptable margin of error. In general, larger sample sizes lead to more accurate results but also require more resources. The ideal sample size is a trade-off between statistical precision and practical constraints.
When a sampling frame is available, different sampling methods can be used to select a representative sample. Common methods include:
These probabilistic methods aim to estimate population parameters and can be combined into complex, multi-stage sampling designs for more elaborate studies.
Back to RDM Home Page Back to Data collection