| Pretesting Method | What it is | Strengths | Weaknesses | Most Common Use | |
| Field Methods |
Field pilot study (for an overview, see [10]) |
A miniature version of the main data collection |
realistic; allows for testing all field procedures; allows for feedback from interviewers, field managers, respondents and data analysts |
very costly; requires large sample size relative to the other techniques needs to be planned and conducted in advance to allow time for changes |
field work test |
|
Interviewer debriefings (for an overview, see [9]) |
Small group discussion with interviewers to talk about their experiences |
uses interviewers’ expertise on what makes a question difficult in a particular situation and with particular types of respondents |
interviewers themselves may be responsible for the respondents’ confusion/problem with a question |
field work test | |
| Respondent debriefings | Respondents' comments on specific questions or the survey as a whole (usually collected during a field pilot study); |
cheap - conducted as part of the field pilot study; allows for identification of question-specific problems; large sample size allows for confidence in results; realistic (field setting) |
in some cultures, respondents may not want to admit
confusion and inability to understand a question; increases respondent burden as the length of the interview increases; may be hard to recall items that were problematic |
field work test | |
|
Behavior coding (e.g., [13]; also, [10]) |
Systematic coding of the interviewer-respondent
interaction in order to identify problems that arise during the question-answer process |
direct
observation of the question-answer process; comparability when standard codes are employed; replicable; allows for use of universal codes, but also study specific; quantitative; requires medium sample size (30 interviews are considered sufficient to detect problems) |
time and labor intensive; requires well trained coders and consisted use of the coding scheme; does not identify the exact problem in a question with many codes |
questionnaire testing; field management |
|
|
Focus groups (see [6] for an overview; also [10]) |
Small group of people brought together to discuss specific
topics in a relatively unstructured manner, led by a moderator who ensures the flow of the conversation is in the intended direction |
useful when there is no information on the topic of
interest; uses the same types of respondents who are the target population for the survey; allows for immediate follow up; requires small sample size (10-12 participants) |
mainly qualitative; results should be carefully interpreted due to small sample size; requires well trained moderators; small group dynamics may influence the results |
questionnaire development | |
|
Cognitive Laboratory Methods (see [9]) |
Vignettes (e.g., [19]) |
Brief stories/scenarios describing hypothetical situations
or persons and their behaviors to which respondents are asked to react in order to allow the researcher to explore contextual influences on respondent’s response formation processes |
allows for quantitative analyses; suitable for sensitive topics; requires small sample size relative to the other techniques |
disconnect between a hypothetical situation and
respondent’s actual views and behaviors; cultures may differ in their ability to think hypothetically (e.g., [3]) |
questionnaire development; concept understanding test; |
|
Concurrent think-aloud [2][6] |
Respondents' report of the thoughts they are having while answering a survey question |
open format with potential for unanticipated information; lack of interviewer bias when probes are not used |
unnatural; high respondent burden; may affect the natural response formation process, thus provide unrealistic picture of how respondents answer questions in the field; coding may be burdensome; assumes respondents are able to indentify and report what information they used to come up with a response to the survey question; respondents may begin to over interpret the questions and come up with problems that do not exist in the natural context |
questionnaire development | |
|
Retrospective think-aloud [1] |
Interview with respondents after they have completed a survey about how they came up with answers to specific questions | does not interfere with the response formation process |
assumes respondents are able to indentify and report what
information they used to come up with a response to the survey question; assumes information is still available in short-term memory |
questionnaire development | |
| Other |
Expert review (for an overview, see [10]) |
Review of draft materials by experienced methodologists, analysts, translators |
cost efficient; quick; can identify a wide variety of problems in the survey questionnaire (from typos to skip patterns); requires very small sample of experts (usually 2-3) |
subjective; no "real" respondents involved |
questionnaire development |
|
Question Appraisal System (for example, [23]) |
A systematic appraisal of survey questions that allows the user to identify potential problems in the wording or structure of the questions that may lead to difficulties in question administration, miscommunication, or other failings. |
cost efficient; provides sense of reliability due to standardization |
identifies a problem without pointing out to a solution | questionnaire development | |
|
Usability Testing [11][22] |
Testing of the functionalities of CAPI, CATI, sample management systems or printed materials such as respondent and interviewer booklet, show cards, etc. |
direct user assessment of the tools that will be used
during data collection; can be cheap - can be conducted with employs of the survey organization; requires small sample sizes |
time consuming; | field work test | |
|
Statistical Modeling |
Multi-trait-multi-method Database (see [20]) |
Database of MTMM studies that provides estimates of reliability and validity for over 1000 questionnaire items |
provides quantitative measures of question quality |
costly and labor intense; questions are considered in isolations, so question order effects might be ignored |
|
|
Item Response Theory Approach[18] |
Statistical models that allow to examine how different items discriminate across respondents with the same value on a trait |
provides a quantitative measure of item functioning;
suitable for scale development |
requires data collection; questions considered in isolation |