This chapter outlines the decisions that need to be made when designing a cross-cultural probability survey sample and encourages
cross-cultural survey organizers to allow sample designs to differ among participating countries, but at the same time ensure
standardization on the principles of probability sampling.
Please note that this chapter assumes that the reader has a basic understanding of statistics and terms such as variance and S2. Please
refer to Further Reading or an introductory
statistics textbook if a statistics refresher is needed.
Additional Coverage Information
- Suggestions for creating area probability sampling frames
- Characteristics of good PSUs:
- Possess clearly identifiable boundaries that are stable over time.
- Cover the target population completely.
- Establish a measure of size for sampling purposes.
- Maintain auxiliary data for stratification (see Guideline 3) purposes.
- Are large in number.
- Establish a uniform definition of what constitutes a housing unit. A commonly used definition in the US is: "A physical structure intended as a dwelling that has its own entrance separate from other units in the structure and an area where meals may be prepared and served" [8].
- Excluding ineligible elements.
- Before the sample selection, these elements can be easily removed from the sampling frame. After sample selection, a measurement question is needed, such as "Is this structure a housing unit?" or "Is this telephone number a household number?" Exclude all ineligible elements found after selection from the survey data set.
- Removing duplicate elements from the sampling frame.
- These elements can be easily removed from the sampling frame prior to sample selection. After selection, a measurement question is needed, such as "Does this housing unit have more than one household?" or "Is there more than one telephone number serving this household?" Weight duplicated elements by the inverse of their chance of selection (See Data Processing and Statistical Adjustment).
- Identifying clustered elements prior to selection.
- It is nearly impossible to identify clustered elements before selection. After sample selection, a measurement question is needed, such as "How many adults live in this household?" Weight all clustered elements by the inverse of their chance of selection (see Data Processing and Statistical Adjustment).
- Ways to reduce undercoverage [8].
- Redefine the target population.
- If all participating countries are having the same difficulty in capturing a subgroup of the target population (e.g., persons living in military bases and prisons), redefine the target population so those groups are excluded across all countries.
- An increasingly common form of housing seen in international studies is workers' quarters. Survey designers may want to explicitly state in the definition of the target population whether workers' quarters should be included or excluded.
- Half-Open Interval (heavily used technique in area probability samples).
- Procedure most often used to help correct for missed housing units in the original listing of an area probability frame.
- Steps:
- Construct the frame in a prespecified order.
- Define a system that links the ordered elements on the frame in either a forward (top-down) or backward (bottom-up) direction. Make the list circular, so that the first and last units on the list are linked.
- Sample from the frame.
- For each sampled element, check for missing elements between the selected element and the next linked element on the sampling frame. For example, when using an area probability frame, the survey organization will instruct interviewers to look closely for any missed housing units between the selected housing units and then next one on the list, before attempting to interview the selected housing unit.
- If missing elements are found, include these elements as part of the sample selection.
- If a large number of missing elements are found (e.g., a new apartment complex in between a sampled housing unit and the next housing unit on the frame), select a subsample of the missing elements.
- The major weakness of the Half-Open Interval is that it increases the uncertainty of the sample size.
- Using Multiple Sampling Frames.
- Network or Multiplicity Sampling (a method to connect missing elements to elements on the frame, using a network rule)
- This method can help cover hard-to-reach populations such as the homeless, but is rarely used in practice.
- Steps:
- Choose the most efficient sampling frame
- Draw a sample
- Devise a well-defined network, e.g., families defined by blood relations and adoption
- For each sampled element, gather a list of all eligible members of its network and select all or a subsample of the network members.
- There are two methods to deal with clustering:
- Select all eligible sampling units within the cluster.
- Select a subsample of units from within each cluster.
- Example: Surveys often need to randomly select one or multiple people from within a household.
- Within-household respondent selection probability methods. Random selection of any household member (no demographic controls). Full listing of eligible household members listed by interview and one selected at random.
- Kish grid with age and gender controls. Interviewers ask for the ages of all males who live in the household and then for the ages of all females who live in the household. Interviewers consult selection tables developed by Kish [13] that select an adult member of the household depending on the number of adult males and females.
- Stratified probability-based subsampling with unequal probabilities of selection (e.g., white males over the age of 35 might have a probability of selection of 0.17 while African-American females under the age of 17 might have a probability of selection of 0.84). Survey data are weighted to account for unequal probabilities of selection.
- Other quasi-probability or non-probability methods, such as last birthday, youngest male, or convenience selections, should be avoided to maintain the ability to make accurate inferences about the target population.
- Last (or next) birthday method.
- Interviewer asks to speak to the person in the household who had the most recent birthday or who has the next birthday.
- Not necessarily a probability sampling method because the people who first answer the survey request are more likely to say that they had most recent birthday or will have next birthday.
Additional Information on Different Sampling Techniques
Simple Random Sampling (SRS)
- SRS uses a sampling frame numbered 1 to N (the total number of elements on the frame). Random numbers from 1 to N are selected from a table of random numbers or a random number generator.
- Formula for estimating the sampling variance of a simple random sample:

- The finite population correction indicates that unlike the assumption made in standard statistical theory that population is infinite, the survey population is finite in size and the sample is selected without replacement [11].
Systematic Sampling
- Steps of Systematic Sampling.
- Compute the selection interval (k) as the ratio of the population size, N, to the sample size, n. In a
formula,
.
- Choose a random number 1 to k.
- Select the element of that random number from the frame and every kth element thereafter.
- Example 1.
- Imagine the size of the sampling frame is 10,000 and the sample size is 1,000, making the sampling interval,
k,
. The sampler then selects a random number between 1 and 10, for instance, 6. The sampler will then make selections in this order — 6, 16, 26, 36...9996.
- Additional steps if the selection interval is a fraction:
- Compute the selection numbers by adding the fractional sampling interval each time.
- Drop the decimal portion of the selection numbers.
- Example 2.
- The size of the sampling frame is 10,400 and the sample size is 1,000, making the sampling interval,
k,
. The sampler selects a random number between 1 and 10.4, for instance, 6. The selection numbers would then be — 6, 16.4, 26.8, 37.2...10395.6. After rounding, the selection numbers become — 6,16,26,37...10395.
Stratified Sampling
- Stratified sampling steps:
- Find information for every element on the frame that can be used to partition the elements into strata. Use information that is correlated to the measure(s) of interest. Each element on the frame can be placed in one and only one group.
- Sort the frame by strata.
- Compute a sample size (see Guideline 4).
- Determine the number of sample selections in each respective stratum (allocation).
- Select the number of sample selections in each respective stratum (allocation).
- There are 3 main types of allocation:
- Proportionate allocation.
- Selecting the sample so that elements within each stratum with the same probabilities of selection. Another way to conceive a proportionate allocations is that the sampler selects a sample of size nk from each stratum h such that the proportion of elements in the sample from stratum
h,
, is the same as the proportion of elements on the frame from
stratum Nk,
.
- Equal allocation.
- An allocation where the same number of elements are selected from each stratum.
- If one knows that all strata have equal distributions of the statistic of interest on the sampling frame, an equal allocation will create the highest level of precision in the sample estimate.
- Optimal allocation.
- An allocation that produces the highest precision (i.e., narrowest confidence intervals) for the sample mean of any statistic of interest.
- The sampler needs accurate estimates of the distributions of the frame elements for each stratum on the statistic of interest.
- Limitations of optimal allocation.
- Does not produce large gains in precision when the statistic of interest is a proportion.
- Creates gains in precision for only one variable at a time.
Cluster Sampling
- Within-cluster homogeneity.
- When selecting humans, it is important to consider that humans within a cluster tend to be more similar than humans across clusters because of:
- Environment.
- Self-selection.
- Interaction with one another.
- Since elements within a cluster tend to be alike, we receive less new information about the population when we select another element
from that cluster rather than from another cluster. This lack of new information makes a cluster sample less precise than a stratified or
even simple random sample. The rate of homogeneity (roh) is a way to measure this clustering effect.
Design Effect
- For a cluster sample, the design effect is the effect of having chosen sampled clusters instead of elements. Due to within-cluster homogeneity,
a clustered sample cannot assure representation of specified population subgroups as well as SRS, and will have a higher design effect.
- In general, clustering increases the design effect, while stratification decreases it.
- Formulas:
- In order to estimate the design effect for a new study, the roh is calculated from an earlier survey on a similar topic within a similar
target population.
- Subsampling within selected clusters (multi-stage sampling).
- n = a*b, where n is the sample size, a is the number of clusters selected and b is the number of selections
within each cluster.
- Pros: reduces the design effect and makes estimates more precise.
- Cons: increases total costs because need to send interviewers to more areas.
- Situations where clusters are all of equal size rarely occur. PPS can control the sample size while insuring that each element on the sampling frame has an equal chance of selection.
- Probabilities at either the first or second stage can be changed to ensure equal probabilities of selection for all elements.
- Imagine a two-stage cluster design where the clusters were blocks and the elements were housing units. The PPS formula would be:
Example
| Block # | Housing Units in Block | Cumulative Housing Units |
| 1 | 25 | 25 |
| 2 | 30 | 55 |
| 3 | 35 | 90 |
| 4 | 40 | 130 |
| 5 | 20 | 150 |
- The sampler has the above list of blocks and wants to select three blocks (a), keep the sample size constant at 15 housing units and ensure that
each housing units has the same probability of selection of one in ten (f=15/150). Using cumulative totals, numbers can be assigned to each block.
Block 1 is assigned numbers 1-25, Block 2 26-55, Block 3 56-90, Block 4 91-130 and Block 5 131-150. From here, systematic sampling can be used to
obtain a simple, without replacement sample
of blocks based on the housing units within each block. Based on the frame size of 150
and the number of selections being three, the selection
interval is 50. Suppose the sampler chooses a random start of 29. In this case, the selection numbers would be 29, 79, and 129 corresponding to
selections of Block 2, Block 3 and Block 4. To determine the selection probability of the housing units within Block 2
use
the formula:
Since the selection probability of housing units within Block 2 is 1/6, the number of housing units selected within Block 2 (b) will be 30*1/6 or 5.
Going through the same calculations for Blocks 3 and 4 will show that each block will have five selections.
- Potential problems and solutions with PPS sampling.
- Problem: The same cluster may be chosen more than once.
Solution: Use systematic selection with PPS [14].
- Problem: Some of the clusters may not be large enough to produce subsamples of the required size.
Solution: Link clusters to create new clusters that are all of sufficient size.
- Problem: Some of the clusters are too large and the probability of selecting the cluster is greater than one.
Solution: Remove the cluster from the list and choose elements from it directly.
Two-Phase Sampling
- Suggested steps [8]:
- Phase 1 — Conduct a survey on a probability sample, using a relatively cheap data collection method subject to higher nonresponse rates than more expensive methods (see Data Collection).
- Once the survey is completed, select a probability subsample of the nonrespondents to the Phase 1 survey.
- Phase 2 — Use a more expensive method that generally produces lower nonresponse on the subsample.
- Combine the results of the two surveys, with appropriate selection weights to account for unequal probabilities of selection between the selected respondents.
Panel Designs
- Three concerns about panel designs:
- The effort and costs of tracking down respondents who move over the duration of the panel survey.
- The change in the elements on the sampling frame over time. For example, in a cross-cultural panel survey of persons age 65 and older, some members of the original sampling frame will die, while other people will become eligible for selection.
- The repeated questioning of the same subjects over time may change how the subjects act and answer the questions (i.e., panel conditioning effect).
Sample Size Determination
- Recommended steps.
- Have the survey sponsor specify the desired level of precision.
- Convert these 95% confidence intervals into a sampling
variance of the mean,
- Example: The survey sponsor indicates they would like a 95% confidence interval of .08 around the statistic of interest.
Since the half width of a 95% confidence interval (CI)
This formula can be rearranged with
basic algebra to calculate the precision (sampling variance of the mean) from this confidence interval:
- Obtain an estimate of S2 (population element variance).
- If the statistic of interest is not a proportion find an estimate of S2 from a previous survey on the same target population or a small pilot test.
- If the statistic of interest is a proportion, the sampler can use the expected value of the proportion (p), even if it is a guess, to estimate S2 by using the formula s2= p(1-p).
- Estimate the needed number of completed interviews for a simple random sample (SRS) by dividing the estimate of S2 by the sampling variance of the mean.
- Example: the obtained estimate of S2 is .6246. Therefore the needed number of completed interviews for a SRS (nsrs) is:
- Multiply the number of completed interviews by the design effect to account for a non SRS design.
- Example: the design effect of a stratified clustered sample is 1.25. Taking into account the design effect, the number of completed
interviews for this complex (i.e., stratified clustered) sample is:

- Calculate the necessary sample size by dividing the number of completed interviews by the expected response rate, eligibility rate, and coverage rate.
- The sample size must account for three additional factors:
- Not all sampled elements will want to participate in the survey.
- Not all sampled elements, given the target population, will be eligible to participate.
- The frame will likely fail to cover all elements in the survey population.
- The sampler can estimate these three rates by looking at the rates obtained in previous surveys with the same survey
population and survey design.
- Example: The expected response rate is 75%, the expected eligibility rate is 90%, and the expected coverage rate is 95%.
Therefore, the necessary sample size is:
Footnotes
1 Not all survey methodologists agree with the opinions expressed by these authors regarding enumeration in rural, poor areas. Those who disagree argue that the poor enumerations are mainly due to low expectations and insufficient training and supervision.
Glossary
- Convenience sample
- A sample of elements that are selected because it is convenient to use them, not because they are representative of the target population.
- Coverage
- The proportion of the target population that is accounted for on the sampling frame.
- Coverage rate
- The number of elements on the sampling frame divided by the estimated number of elements in the target population.
- Design effect
- The impact of the complex survey design on sampling variance measured as the ratio of the sampling variance under the complex design to the sampling variance computed as a simple random sample.
- Element
- A single unit of the sampling frame.
- Eligibility Rate
- The number of eligible sample elements divided by the total number of elements on the sampling frame.
- Fixed panel design
- A longitudinal study which attempts to collect survey data on the same sample elements at intervals over a period of time. After the initial sample selection, no additions to the sample are made.
- Fixed panel plus births design
- A longitudinal study in which a panel of individuals is interviewed at intervals over a period of time and additional elements are added to the sample.
- Interviewer Variance
- That component of overall variability in survey statistics that can be accounted for by the interviewers.
- Majority Country
- A country with low per capita income (the majority of countries).
- Nonresponse
- A failure to elicit responses from sample persons due to lack of contact or cooperation.
- Panel survey
- A survey in which data are obtained from the same respondents over time.
- Primary Sampling Unit(PSU)
- A unit sampled at the first stage of selection.
- Probability proportional to size
- A sampling method that assures that sample estimates of totals or percentages (e.g. the estimate of the percentage of men living in Mexico based on the sample) equal population totals or percentages (e.g. the estimate of the percentage of men living in Mexico based on Census data). The adjustment cells for postratification are formed in a similar way as strata in sample selection, but variables can be used that were not on the original sampling frame at the time of selection.
- Probability sampling
- A sampling method where each element on the sampling frame has a known, non-zero chance of selection.
- Quota Sampling
- A non-probability sampling method that sets specific sample size quotas or target sample sizes for subclasses of the target population. The sample quotas are generally based on simple demographic characteristics, (e.g., quotas for gender, age groups and geographic region subclasses).
- Random-digit-dialing (RDD)
- A method of selecting telephone numbers in which the target population consists of all possible telephone numbers, and all telephone numbers have an equal probability of selection.
- Repeated panel design
- A series of fixed panel surveys that may or may not overlap in time. Generally, each panel is designed to represent the same target population definition applied at a different point in time.
- Replicates
- Probability subsamples of the full sample design
- Residency rule
- A rule to help interviewers determine which persons to include in the household listing, based on what the informant reports.
- Response rate
- The number of completed interviews divided by the total estimated number of eligible sample persons.
- Rotating panel design
- A study where elements are repeatedly measured a set number of times, then replaced by new randomly chosen elements. Typically, the newly-chosen elements are also measured repeatedly for the appropriate number of times.
- Sampling frames
- Lists or materials used to identify all elements (e.g., persons, households, establishments) of a survey population from which the sample will be selected. These lists or materials can include maps of areas in which the elements can be found, lists of members of a professional association, and registries of addresses or persons.
- Sampling units
- Elements or clusters of elements considered for selection in some stage of sampling. For a sample with only one stage of selection, the sampling units are the same as the elements. In multi-stage samples (e.g., enumeration areas, then households within selected enumeration areas, and finally adults within selected households), different sampling units exist, while only the last is an element. The term primary sampling units (PSUs) refers to the sampling units chosen in the first stage of selection. The term secondary sampling units (SSUs) refers to sampling units within the PSUs that are chosen in the second stage of selection.
- Sampling variance
- A measure of the variability of the sample estimates of a population parameter, if all possible samples of the same size were selected from the sampling frame.
- Secondary Sampling Unit (SSU)
- A unit sampled at the second stage of selection.
- Split panel design
- A design that contains a blend of cross-sectional and panel samples at each new wave of data collection.
- Strata
- Non-overlapping groups that comprise all of the elements on the sampling frame.
- Substitution
- A technique where each nonresponding sample element from the initial sample is replaced by another element of the target population, typically not an element selected in the initial sample.
- Survey population
- The actual population from which the survey data are collected, given the restrictions from data collection operations.
- Target population
- The finite population for which the survey sponsor wants to make inferences using the sample statistics.
References
[1] Bergsten, J. W. (1980). Some sample survey designs in Syria, Nepal and Somalia. Paper presented at the Proceedings of the Survey Research Methods Section, American Statistical Association.
[2] Binder, D. (1998). Longitudinal surveys: Why are these surveys different from all other surveys? Survey Methodology, 24(2), 101-108
[3] Chikwanha, A. B. (2005). Conducting surveys and quality control in Africa—Insights from the Afrobarometer. Paper presented at the WAPOR/ISSC Conference.
[4] Cochran, W. G. (1977). Sampling Techniques. New York: Wiley & Sons.
[5] Dunzhu, S., Wang, F. S., Courtright, P., Liu, L., Tenzing, C., Noertjojo, K., et al. (2003). Blindness and eye diseases in Tibet: Findings from a randomised, population based survey. Br J Ophthalmol 87(12), 1443-1448.
[6] Häder, S., & Gabler, S. (2003). Sampling and estimation. In J. Harkness et al. (Eds.), Cross-Cultural Survey Methods. New York: Wiley.
[7] Groves, R. M. (1989). Survey Errors and Survey Costs. Hoboken, NJ: Wiley & Sons.
[8] Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey Methodology. Hoboken, NJ: Wiley & Sons.
[9] Heeringa, S. G., & O'Muircheartaigh, C. Sampling design for cross-cultural and cross-national studies. Paper to be presented at 3MC Conference, Berlin, Germany, June 2008.
[10] Inglehart, R. (1997). Modernization and postmodernization: Cultural, economic and political change in 43 societies. Princeton, NJ: Princeton Univ. Press
[11] Kalton, G. (1983). Introduction to Survey Sampling. Newbury Park, CA: Sage Publications.
[12] Kasprzyk, D. (1988). The Survey of Income and Program Participation: An overview and discussion of research issues. Washington, DC: U.S. Bureau of the Census.
[13] Kish, L. (1949). A procedure for objective respondent selection within the household. Journal of the American Statistical Association, 44, 380-387.
[14] Kish, L. (1965). Survey Sampling. New York: Wiley & Sons.
[15] Kish, L. (1987). Statistical Design for Research. New York: Wiley & Sons.
[16] Kish, L. (1994). Multipopulation survey designs: Five types with seven shared aspects. International Statistical Review, 62, 167—186.
[17] Lavallée, P., Michaud, S., & Webber, M. (1993). The Survey of Labour and Income Dynamics, Design issues for a new longitudinal survey in Canada. Bulletin of the International Statistical Institute, 49th Session, Contributed Papers, Book 2, 99-100.
[18] Lynn, P. (2005). Longitudinal surveys methodology. Retrieved May 23, 2008, from http://www.eustat.es /prodserv/datos/Sem45_i.pdf.
[19] Lynn, P., Häder, S., Gabler, S., & Laaksonen, S. (2007). Methods for achieving equivalence of samples in cross-national surveys: The European Social Survey Experience. Journal of Official Statistics, 23(1), 107—124.
[20] Okafor, R., Adeleke, I., & Oparac, A. (2007). An appraisal of the conduct and provisional results of the Nigerian Population and Housing Census of 2006. Paper presented at the Proceedings of the Survey Research Methods Section, American Statistical Association.
[21] Peracchi, F. (2002). The European Community Household Panel: A review. Empirical Economics, vol. 27, 63-90.
[22] Principles and recommendations for population and housing censuses, revision 1 (1998), para. 2.330. New York: United Nations.
[23] Tucker, C., Lepkowski, J. M., & Piekarski, L. (2002). The current efficiency of list assisted telephone sampling designs. Public Opinion Quarterly, 66, 321-38.
[24] Üstun, T. B., Chatterji, S., Mechbal, A., & Murray, C. J. L. (2005). Chapter X: Quality assurance in surveys: Standards, guidelines, and procedures. In United Nations Statistical Division, United Nations Department of Economic and Social Affairs (Eds.), Household Surveys in Developing and Transition Countries. New York: United Nations.
[25] Yansaneh, I. (2005). Chapter 2: Overview of sample design issues for household surveys in developing and transition countries. In United Nations Statistical Division, United Nations Department of Economic and Social Affairs (Eds.), Household Surveys in Developing and Transition Countries. New York: United Nations.
Further Reading
Sampling
Cochran, W. G. (1977). Sampling Techniques. New York: Wiley & Sons.
Kalton, G. (1983). Introduction to Survey Sampling. Newbury Park, CA: Sage
Publications.
Kish, L. (1965). Survey Sampling. New York: Wiley & Sons.
Statistics
Snedecor, G. W., & Cochran, W. G. (1989). Statistical Methods, Eighth Edition, Iowa State University Press.