XI. Data Collection

Webpage last modified: 2008-Sep-10

Introduction

Collecting comparable data in multiple nations and cultures is a highly complex task, in which one can expect to encounter a variety of languages and cultural contexts. Even in a single locale, the target population may not be one homogenous population but a collection of language and cultural groups. Some of the languages involved may not even have a standard written form. The study may need to take wide variations in respondent literacy into account. The geographic topography may be difficult (e.g., remote islands or mountainous regions). Weather and seasonal impediments (e.g., monsoons) may make the harmonization of fielding times across different locales impractical. Some populations may be inaccessible because of migration patterns or only accessible under special circumstances (e.g., miners in camps, or populations in which the men go on long hunting or fishing trips). Other individuals may have refugee or undocumented status. People living in shanty-type housing may not be included on a given sample frame. While homeless populations are often not included by definition, the number and definition of the "homeless" may differ considerably from location to location. Outside events such as natural disasters or political upheavals may also pose major challenges for data collection.

Countries also vary widely in both their survey research infrastructures and in their laws and unwritten rules and customs pertaining to data collection and data access. Certain modes of administration may be inappropriate or not feasible in some situations. In addition, the size and composition of nonresponse will likely vary due to differences in contactibility and cooperation. Some countries officially prohibit survey research (e.g., North Korea and Burma) or, to date, severely restrict data collection on some topics, or restrict publication of results (e.g., China and Iran) [19].

While a survey conducted in a single country might face one or more of the challenges mentioned above, the probability of encountering multiple hurdles is much higher in a large-scale, cross-national study. What is atypical in the one-country context often becomes the norm in cross-national contexts. Moreover, the assumed homogeneity and common ground that may, broadly speaking, hold for a one-country study contrasts with the obvious heterogeneity of populations, languages, and contexts encountered in multinational studies. Because of the heterogeneity of target populations in cross-cultural surveys, allowing some flexibility in data collection protocols can reduce costs and error.

These guidelines are intended to advise data collection decision-makers within each participating country. However, it should be noted that, in some cases, a coordinating center dictates data collection decisions across all countries involved. The European Social Survey, for example, mandates the mode in each country, while the ISSP allows a certain amount of flexibility. See Study, Organizational, and Operational Structure for more details.

Because difficulties in data collection can be extreme in majority countries, these guidelines heavily emphasize the challenges of data collection in such contexts.

Guidelines

Goal: To collect data which is comparable across survey locations while minimizing total survey error and survey costs.

  1. Assess the feasibility of conducting the research in each target country and culture.
    Rationale
    Procedural steps
    Lessons learned
  2. Select a mode of administration that is appropriate for the survey topic and feasible for the country or culture.
    Rationale
    Procedural steps
    Lessons learned
  3. If face-to-face interviewing is selected, establish procedures for dealing with issues specific to this mode.
    Rationale
    Procedural steps
    Lessons learned
  4. Establish a clear protocol for managing the survey sample.
    Rationale
    Procedural steps
    Lessons learned
  5. Reduce nonresponse as much as possible.
    Rationale
    Procedural steps
    Lessons learned
  6. Time data collection activities appropriately.
    Rationale
    Procedural steps
    Lessons learned
  7. Institute and follow appropriate quality control measures.
    Rationale
    Procedural steps
    Lessons learned
  8. Document data collection activities.
    Rationale
    Procedural steps
  9. When possible, conduct validation studies to estimate bias.
    Rationale
    Procedural steps
    Lessons learned

Appendix A

Appendix B

Appendix D

Glossary

Behavior coding
Systematic coding of the interviewer-respondent interaction in order to identify problems that arise during the question-answer process.
Bias
A systematic difference between the survey estimate of the population parameter and the true value in the population.
Call record
A written record of the time and outcome of each call attempt to a sample case.
Cluster sample (clustering)
A sample design in which a group of population elements in geographically proximal locations are selected as a whole.
Coverage
The proportion of the target population that is accounted for on the sampling frame.
Coverage bias
Bias due to a mismatch between the target population and the sampling frame.
Coversheet
Electronic or printed materials associated with each case that identify information about the case, e.g., the sample address, the unique identification number associated with a case, and the interviewer to whom a case is assigned. The coversheet often also contains an introduction to the study, instructions on how to screen sample members and randomly select the respondent, and space to record the date, time, outcome, and notes for every attempt.
Disposition code
A code that indicates the result of a specific call attempt or the outcome assigned to a sample element at the end of data collection (e.g., noncontact, refusal, ineligible, complete interview).
Focus group
Small group discussions under the guidance of a moderator, often used in qualitative research, that can also be used to test survey questionnaires and survey protocols.
Gross sample
All eligible and ineligible elements of a sample.
Half open interval
A method of updating lists of addresses by adding previously omitted units to the sample when the units are identified geographically next to a selected unit.
Hours Per Interview (HPI)
A measure of study efficiency, calculated as the total number of interviewer hours spent during production (including travel, reluctance handling, listing, completing an interview, and other administrative tasks) divided by the total number of interviews.
Imputation
Computational methods that assign one or more estimated answers for each item that previously had missing, incomplete or implausible data.
Item nonresponse
The lack of information on individual data items for a sample element where other data items were successfully obtained.
Majority country
A country with low per capita income (the majority of countries).
Measurement error
Survey error (variance or bias) due to the measurement process; that is, error introduced by the survey instrument, the interviewer, or the respondent.
Minority country
A country with high per capita income (the minority of countries).
Mode
Method of data collection.
Noncontact rate
The proportion of cases selected in a sample that could not be reached.
Non-interview
A sample element is selected, but an interview does not take place (for example, due to noncontact, refusal, or ineligibility).
Nonresponse
A failure to elicit responses from sample persons due to lack of contact or cooperation.
Nonresponse bias
Bias that is introduced when not all sample members participate in the survey and those that do not (the nonrespondents) differ from the respondents on the measure of interest.
Outcome rate
Response rate, refusal rate, or noncontact rate.
Outlier
An atypical observation which does not appear to follow the distribution of the rest of a dataset.
Paradata
Process data collected during data collection, such as timestamps, keystrokes, interviewer observations, etc.
Post-survey adjustments
Adjustments to reduce the impact of error on estimates.
Probability sample
A sample in which every element of the target population has a known, non-zero probability of being selected.
Process indicator
An indicator that refers to aspects of data collection (e.g., HPI, refusal rates, etc.).
Progress indicator
An indicator that refers to aspects of reaching the goal (e.g., number of complete interviews).
Randomized response technique (RRT)
A technique to reduce social desirability bias and item nonresponse due to sensitive questions. In this technique, the interviewer asks the respondent two questions—a sensitive question and a question believed to be not sensitive; both questions contain the same response options. One of these questions is randomly selected, but the interviewer is not aware of the outcome of the selection; thus, the impact of the interviewer on the response to the sensitive question is minimized.
Recontact
Having another staff member (often a supervisor) attempt to speak with the respondent after the interview is reported, in order to verify that the interview was completed according to the specified protocol.
Refusal rate
The proportion of all sample elements in which a housing unit or potential respondent refuses to take part in the study
Reinterview
The process or action of interviewing the same respondent twice to assess reliability (simple response variance).
Reluctance aversion (techniques)
Techniques that can reduce reluctance to participate in potential respondents, thereby increasing the overall response rate.
Response latency
A method of examining potential problems in responding to particular items, measured by the time between the interviewer asking a question and the response.
Response rate
The number of completed interviews divided by the total estimated number of eligible sample persons.
Sample element
A selected unit of the target population that may be eligible or ineligible.
Sample management system
A computerized and/or paper-based system used to assign and monitor sample cases and record documentation for sample records (e.g., time and outcome of each contact attempt).
Sample persons
Persons selected from a sampling frame to participate in a particular survey.
Sampling frame
Lists or materials used to identify all sample elements (e.g., persons, households, establishments) of a survey population from which the sample will be selected. These lists or materials can include maps of areas in which the elements can be found, lists of members of a professional association, and registries of addresses or persons.
Silent monitoring
Monitoring without the awareness of the interviewer.
Social desirability bias
A tendency for respondents to overreport desirable attributes or attitudes and underreport undesirable attributes or attitudes.
Standardized interviewing technique
An interviewing technique in which interviewers read every question exactly as worded, cannot interpret questions or responses, and cannot offer much clarification.
Statistical process control charts
Charts that use statistical techniques to identify problems in processes and opportunities for improvement of processes.
Survey error
The total error of a survey statistic; specifically, the sum of the variance and the bias squared.
Survey estimate
The value yielded by a survey.
Target population
The finite population for which the survey sponsor wants to make inferences using the sample statistics.
Vignettes
Brief stories/scenarios describing hypothetical situations or persons and their behaviors to which respondents are asked to react in order to allow the researcher to explore contextual influences on respondent's response formation processes.
Weighting
A post-survey adjustment that may account for differential coverage, sampling, and/or nonresponse processes.

References

[1] American Association of Public Opinion Research. (2003). Interviewer falsification in survey research: Current best methods for prevention, detection and repair of its effects. Lenexa, KS: AAPOR.

[2] Biemer, P., & Lyberg, L. (2003). Introduction to survey quality. Hoboken, NJ: Wiley.

[3] Carlson, R. O. (1958). To talk with kings. Public Opinion Quarterly, 22(3), 224.

[4] Chikwanha, A. B. (2005). Conducting surveys and quality control in Africa: Insights from the Afrobarometer. Ljubljana, Slovenia: WAPOR/ISSC Conference.

[5] Choldin, H. M., Kahn, A. M., & Ara, B. H. (1983). Cultural complications in fertility interviewing. In M. Bulmer & D. P. Warwick (Eds.), Social research in developing countries. New York: Wiley.

[6] Cohen, G., & Duffy, J. C. (2002). Are nonrespondents to health surveys less healthy than respondents? Journal of Official Statistics, 18(1), 13-23.

[7] Couper, M. P., Holland, L., & Groves, R. M. (1992). Developing systematic procedures for monitoring in a centralized telephone facility. Journal of Official Statistics, 8(1), 63-76.

[8] de Leeuw, E. D. (2005). To mix or not to mix data collection modes in surveys. Journal of Official Statistics, 21(2), 233-255.

[9] de Leeuw, E. D., & de Heer, W. (2002). Trends in household survey nonresponse: A longitudinal and international comparison. In R. M. Groves, D. Dillman, J. Eltinge, & R. Little (Eds.), Survey nonresponse (pp. 41-54). New York: Wiley.

[10] Forsman, G., & Schreiner, I. (1991). The design and analysis of reinterview: An overview. In P. P. Biemer, R. M. Groves, L. E. Lyberg, N. A. Mathiowetz, & S. Sudman (Eds.), Measurement errors in surveys (pp. 279-301). New York: Wiley.

[11] Gallup, Inc. (2007). Gallup World Poll research design. Retrieved April 27, 2007, from http://media.gallup.com/WorldPoll/PDF
/WPResearchDesign091007bleeds.pdf

[12] Gasquet, I., Falissard, B., & Ravaud, P. (2001). Impact of reminders and method of questionnaire distribution on patient response to mail-back satisfaction survey. Journal of Clinical Epidemiology, 54, 1174-1180.

[13] Groves, R. M. (2006). Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70(5), 646-675.

[14] Groves, R. M., Cialdini, R. B., & Couper, M. P. (1992). Understanding the decision to participate in a survey. Public Opinion Quarterly, 56(4), 475-495.

[15] Groves, R. M., & Couper, M. P. (1998). Nonresponse in household interview surveys. New York: Wiley.

[16] Groves, R. M., Dillman, D., Eltinge, J. L., & Little, R. J. A. (2002). Survey nonresponse. New York: Wiley.

[17] Groves, R. M., & Heeringa, S. G. (2006). Responsive design for household surveys: Tools for actively controlling survey errors and costs. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3), 439-457.

[18] Harkness, J. A. (1999). In pursuit of quality: Issues for cross-national survey research. International Journal of Social Research Methodology, 2(2), 125-140.

[19] Heath, A., Fisher, S., & Smith, S. (2005). The globalization of public opinion research. Annual Review of Political Science, 8, 297-333.

[20] Howell, D. (forthcoming). Enhancing quality and comparability in the comparative study of electoral systems. Unpublished manuscript.

[21] Institute for Democracy in South Africa, & Center for Democracy and Development-Ghana, Michigan State University. (2005-2006). Afro-barometer survey manual: Round 3 surveys.

[22] Kish, L. (1949). A procedure for objective respondent selection within the household. Journal of the American Statistical Association, 44(247), 380-387.

[23] Lavrakas, P. J. (1993). Telephone survey methods: Sampling, selection, and supervision (2nd ed.). Thousand Oaks, CA: Sage Publications, Inc.

[24] Lin, I. F., & Schaeffer, N. C. (1995). Using survey participants to estimate the impact of nonparticipation. Public Opinion Quarterly, 59(2), 236-258.

[25] Mudryk, W., Burgess, M. J., & Xiao, P. (1996). Quality control of CATI operations in Statistics Canada. Unpublished manuscript.

[26] Office of Management and Budget. (2006). Standards and guidelines for statistical surveys. Washington, DC: Office of Information and Regulatory Affairs, OMB. Retrieved May 15, 2008, from http://www.whitehouse.gov
/omb/inforeg/statpolicy/standards_stat_surveys.pdf

[27] Singer, E. (2002). The use of incentives to reduce nonresponse in household surveys. In R. M. Groves, D. A. Dillman, J. L. Eltinge & R. J. A. Little (Eds.), Survey nonresponse (pp. 163-178). New York: Wiley.

[28] Skjåk, K. K., & Harkness, J. A. (2003). Data collection methods. In J. A. Harkness, F. J. R. Van de Vijver, & P. P. Mohler (Eds.), Cross-cultural survey methods (pp. 179-193). Hoboken, NJ: Wiley.

[29] Stoop, I. A. L. (2005). The hunt for the last respondent: Nonresponse in sample surveys. The Hague: Social and Cultural Planning Office.

[30] United Nations. (2005). Household surveys in developing and transition countries (Series F No. 96). New York: United Nations.

[31] van den Brakel, J. A., Vis-Visschers, R., & Schmeets, J. J. G. (2006). An experiment with data collection modes and incentives in the Dutch Family and Fertility Survey for Young Moroccans and Turks. Field Methods, 18(3), 321-334.

[32] Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63-69.

Further Reading

Axinn, W. G. (1989). Interviewers and data quality in a less developed setting. Journal of Official Statistics, 5(3), 265-280.

Axinn, W. G., & Pearce, L. D. (2006). Mixed method data collection strategies. New York: Cambridge University Press.

Casley, D. J., & Lury, D. A. (1981). Data collection in developing countries. Oxford: Oxford University Press.

de Leeuw, E. D. (1992). Data quality in mail, telephone, and face to face surveys. Unpublished doctoral dissertation, Vrije Universiteit, Amsterdam.

Groves, R. M. (1989). Survey errors and survey costs. New York: Wiley.

Groves, R. M. (2006). Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70(5), 646-675.

Groves, R. M., Couper, M. P., Presser, S., Singer, E., Tourangeau, R., Acosta, G. P., et al. (2006). Experiments in producing nonresponse bias. Public Opinion Quarterly, 70(5), 720-736.

Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey methodology. New York: Wiley

Jäckle, A., Roberts, C., & Lynn, P. (2006). Telephone versus face-to-face interviewing: Mode effects on data quality and likely causes. (No. ISER Working Paper 2006-41). Colchester: University of Essex.

Lyberg, L. E., Biemer, P., Collins, M., de Leeuw, E. D., Dippo, C., Schwarz, N., et al. (1997). Survey measurement and process quality. New York: Wiley.

Saris, W. E. (1998). The effects of measurement error in cross cultural research. Cross-cultural survey equivalence. Mannheim: ZUMA.

Return to top

Previous chapter | Next chapter | Home

© 2008 The authors of the Guidelines hold the copyright. Please contact us if you wish to\n publish any of this material in any form.