Webpage last modified: 2008-Sep-25
The following guidelines detail the steps taken after the data are collected (see Data Collection). Each country's data must be processed (coded, entered, and edited), and then statistical adjustment (response rate calculation, missing value imputation, survey weight creation, and variance estimation) can be performed. After the processing activities, the data from each country can be harmonized with those from the other countries and, after the adjustment activities, the data can be disseminated to the public as a cross-cultural dataset (see Harmonization of Survey and Statistical Data and Dissemination of Survey and Statistical Data). Substantive analysis can then be performed on the disseminated dataset.
Although the steps are the same, the flow involved in processing the survey data for paper versus computer-assisted questionnaires differs. For paper surveys, the sequential steps are as follows: code data, enter data, perform edit checks, impute missing values, create weights, build data files, and estimate variances. For computer-assisted surveys, entering the data, performing edit checks, and building data files occur while the data are being collected. Then the remaining steps occur in the following order: code data, impute missing values, create weights, and estimate variances. Much burden can be eliminated with the parallel processing capabilities of computer-assisted interviewing (e.g., limited additional keying and built-in consistency checks).
Unfortunately, processing and adjustment activities often are not given adequate attention and are thus under-budgeted (e.g., editing could consume up to 40% of an entire survey budget) [3] [14]. As at other stages of survey research, coders, editors, and other data processing operators may potentially produce error in the data, possibly even systematic error [3]. Additionally, it is common for only a few errors to be responsible for the majority of changes in estimates [14]. To lessen the effort (and possibly minimize error), checks could be performed throughout the field period (while respondent is still available) rather than waiting until the end of data collection [14].
These guidelines are broken down into Data Processing Steps and Statistical Adjustment Steps. Quality control and documentation guidelines are applicable to both steps.
Goal: To convert the data collected during the field period into a file that can (1) be used within the organization for quality assessment of the survey implementation and (2) be made accessible to outside users for substantive research.
To statistically analyze raw responses, they must be converted into a meaningful numeric form. This process is coding. During questionnaire and instrument development, precoding should occur; that is, coding conventions and formats should be determined based on prior knowledge of the survey items (see Survey Instrument Design). Upon the collection of the data, coding decisions are revisited and possibly revised to appropriately characterize the data. Coding can be automated and/or manual. Both automated and manual coding should be evaluated at the variable, code, and coder level to detect potential error [3].
Like coding, data entry/capture is necessary for statistical analysis. One advantage of computer-assisted questionnaires is the elimination of a separate data entry step, thus reducing the likelihood of additional processing error. When computer-assisted questionnaires are not possible, keying is often the first method of data entry that comes to mind. As technology advances, however, there are other alternatives that should be considered, such as optical character recognition, intelligent character recognition, mark character recognition, voice recognition entry, and touchtone data entry. Similarly, with developing technology, there are additional data capture possibilities, such as facsimile transmission, electronic data interchange, and e-mail transmission.
Editing during pre-production and data collection is a better allocation of resources than fixing errors during post-production. There can be several stages of editing [3]. In computer-assisted surveys, the application can notify the interviewers (or respondents, if self-administered) of inconsistent or implausible responses. This gives respondents a chance to review, clarify, or correct their responses. Paper surveys can include instructions telling respondents to review their responses. Prior to data entry/capture, survey organizations can manually look for obvious errors, such as blanks. Then, during data entry/capture, editing software can be used to check for errors at both the variable and case level. Most editing takes place after data entry/capture and is described below.
Goal: To facilitate estimates of target population attributes based on sample survey data.
Response rates are one measure of survey quality and can be used to adjust survey estimates to help correct for nonresponse. Therefore, reporting response rates and other outcome rates based on an established industry standard is an important part of dissemination and publication.
Depending upon the quality of the sampling frame, the sample design, and patterns of nonresponse, the distribution among groups of observations in a survey data set may be much different from the distribution in the population. These group differences are usually called "over representation" or "under representation." Sampling statisticians create weights to reduce the sampling bias of the estimates and to compensate for noncoverage and nonresponse. An overall survey weight for each interviewed element typically contains three adjustments: 1) a base weight to adjust for unequal probabilities of selection (wbase); 2) an adjustment for sample nonresponse (adjnr); and 3) a poststratification adjustment (adjps) for the difference between the weighted sample distribution and population distribution on variables that are considered to be related to key outcomes. If all three adjustments are needed, the overall weight is the product of these three adjustments, or:
However, it is not always necessary to create all three weight adjustments when creating an overall survey weight. Create the adjustments only as needed. For example, if all elements had equal probabilities of selection, a base weight would not be necessary. The overall survey weight would then be the product of any nonresponse adjustment and any poststratification adjustment.
Imputation is most often used to replace item-missing data and not unit nonresponse. The aim is to reduce the bias in the estimate of the statistic of interest caused by item-missing data.
The survey sample design determines the level of precision (i.e., the extent of sampling variance). Unfortunately, many statistical texts only discuss the sampling variance formulae for simple random sampling without replacement. Similarly, statistical software packages assume simple random sampling without replacement unless otherwise instructed by the user. However, compared to a simple random sample design, stratification generally decreases sampling variance while clustering increases it (see Sample Design for an in depth explanation of simple random sample, stratification, and clustering). If the correct formulas or appropriate statistical software procedures and commands are not applied, the calculation of the precision (i.e., sampling variance) of the statistic(s) of interest can be underestimated or overestimated. Therefore, analysts are cautioned to ensure that they are applying the correct methods to calculate sampling variance, based on the sampling design.
Ensuring quality is a vital part of each stage of the survey lifecycle. Even after data collection is complete, the survey organization must continue to implement quality control measures to help reduce or eliminate any errors that could arise during the post-production procedures discussed above. If the emphasis on quality is relaxed during these latter activities, all of the time and money spent on maintaining quality during the previous stages of the survey lifecycle will be compromised.
Over the course of many years, various researchers will analyze the same survey dataset. In order to provide these different users with a clear sense of how and why the data were collected, it is critical that all properties of the dataset be documented.
Documentation will help secondary data users better understand post-survey statistical adjustments that can become quite complex, such as imputation and sample weighting. A better understanding of these adjustments will help ensure that secondary data users correctly interpret the data. In addition, post-survey documentation will indicate whether the survey organization that conducted the survey met benchmarks agreed to in the contract by the coordinating center and the survey organization.
| Half Sample | Stratum | |||
| 1 | 2 | 3 | 4 | |
| 1 | + | + | + | - |
| 2 | + | - | - | - |
| 3 | - | - | + | - |
| 4 | - | + | - | - |
[1] AAPOR Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, Version 4. Retrieved January 9, 2008, from http://www.aapor.org/uploads /Standard_Definitions_04_08_Final.pdf
[2] Bethlehem, J. G. (2002). Weighting nonresponse adjustments based on auxiliary information. In R. Groves, D. Dillman, J. Eltinge, & R. Little, (Eds.) Survey Nonresponse, (chap. 18). New York: Wiley.
[3] Biemer, P., & Lyberg, L. (2003). Introduction to survey quality. Hoboken, NJ: Wiley.
[4] Federal Committee on Statistical Methodology. (1983). Statistical policy working paper 9: Contracting for surveys. Washington, DC: Office of Management and Budget.
[5] Fellegi, I. P., & Holt, D. (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71(353), 17-35.
[6] Judkins, D.R. (1990). Fay's method for variance estimation. Journal of Official Statistics, 6(3), 223-239.
[7] Kalton G. (1983). Compensating for missing survey data. University of Michigan, Survey Research Center, Institute for Social Research.
[8] Kalton, G., & Kasprzyk, D. (1986). Treatment of missing survey data. Survey Methodology, 12, 1-16.
[9] Kish, L. (1965). Survey sampling. New York: Wiley & Sons.
[10] Lepkowski, J., & Bowles, J. (1996). Sampling error software for personal computers. The Survey Statistician, 35, 10-17.
[11] Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2nd edition, New York: John Wiley.
[12] Marker, D. A, Judkins, D. R., & Winglee, M. (2001). Large-scale imputation for complex surveys. In Groves, R., Dillman, D., Eltinge, J., & Little, R. (Eds.), Survey nonresponse (chap. 22). New York: Wiley, 2001.
[13] Office of Management and Budget. (2006). Standards and guidelines for statistical surveys. Washington, DC: Office of Information and
Regulatory Affairs, OMB. Retrieved June 9, 2008, from http://www.whitehouse.gov/omb/inforeg/statpolicy
/standards_stat_surveys.pdf
[14] Statistics Canada. (2003). Statistics Canada quality guidelines. Montreal: Statistics Canada. Retrieved June 9, 2008, from
http://www.statcan.ca/english/freepub
/12-539-XIE/index.htm
[15] United Nations. (2005). Household surveys in developing and transition countries. NY: United Nations, Department of Economic and Social Affairs.
[16] Wurdeman, K. (1993). Quality of data keying for major operations of the 1990 census. Unpublished manuscript.
Groves, R. M. (1989). Survey errors and survey costs. Hoboken, NJ: Wiley & Sons.
Groves, R. M., Dillman, D. A., Eltinge, J. L. & Little, R. J. A. (Eds.), (2002). Survey nonresponse. New York: Wiley.
Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004). Survey methodology. Hoboken, NJ: Wiley & Sons.
Horvitz, D. G., & Thompson, D..J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663-665.
Kish, L., & Hess, I. (1959). On variances of ratios and their differences in multistage samples. Journal of the American Statistical Association, 54, 416-446.
Lessler, J., & Kalsbeek, W. (1992). Nonsampling error in surveys. New York: Wiley.
Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N., et al. (Eds.). (1997). Survey measurement and process quality. New York: Wiley.
Raghunathan, T. E., Lepkowski, J. M., van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27, 85-95.
Smith, T. W. (2003). A review of methods to estimate the status of cases with unknown eligibility. Report of the Standard Definitions Committee for the American Association for Public Opinion Research.
Worcester, R., Lagos, M., & Basanez, M. (2000). problems and progress in cross-national studies: Lessons learned the hard way. Paper presented at the WAPOR/AAPOR annual conference, Portland, OR.
Previous chapter | Next chapter | Home
© 2008 The authors of the Guidelines hold the copyright. Please contact us if you wish to publish any of this material in any form.