Return to Pretesting

Table 1. Pretesting methods, their strengths, and weaknesses.

  Pretesting Method What it is Strengths Weaknesses Most Common Use
Field Methods Field pilot study
(for an overview, see [10])
A miniature version of
the main data collection
realistic;
allows for testing all field procedures;
allows for feedback from interviewers, field managers, respondents and data analysts
very costly;
requires large sample size relative to the other techniques
needs to be planned and conducted in advance to allow time for changes
field work test
  Interviewer debriefings
(for an overview, see [9])
Small group discussion with interviewers to talk about their experiences uses interviewers’ expertise on what makes
a question difficult in a particular situation and with particular types of respondents
interviewers themselves may be responsible for the
respondents’ confusion/problem with a question
field work test
  Respondent debriefings Respondents' comments on specific questions or the survey as a whole (usually collected during a field pilot study); cheap - conducted as part of the field pilot study;
allows for identification of question-specific problems;
large sample size allows for confidence in results;
realistic (field setting)
in some cultures, respondents may not want to admit confusion and inability to understand a question;
increases respondent burden as the length of the interview increases;
may be hard to recall items that were problematic
field work test
  Behavior coding
(e.g., [13]; also, [10])
Systematic coding of the interviewer-respondent interaction
in order to identify problems that arise during
the question-answer process
 direct observation of the question-answer process;
comparability when standard codes are employed;
replicable;
allows for use of universal codes, but also study specific;
quantitative;
requires medium sample size (30 interviews are considered sufficient to detect problems)
time and labor intensive;
requires well trained coders and consisted use of the coding scheme;
does not identify the exact problem in a question with many codes
questionnaire testing;
field management
  Focus groups
(see [6] for an overview; also [10])
Small group of people brought together to discuss specific topics
in a relatively unstructured manner, led by a moderator who ensures
the flow of the conversation is in the intended direction
useful when there is no information on the topic of interest;
uses the same types of respondents who are the target population for the survey;
allows for immediate follow up;
requires small sample size (10-12 participants)
mainly qualitative;
results should be carefully interpreted due to small sample size;
requires well trained moderators;
small group dynamics may influence the results
questionnaire development
Cognitive
Laboratory
Methods (see [9])
Vignettes
(e.g., [19])
Brief stories/scenarios describing hypothetical situations or persons
and their behaviors to which respondents are asked to react in order
to allow the researcher to explore contextual influences on respondent’s
response formation processes
allows for quantitative analyses;
suitable for sensitive topics;
requires small sample size relative to the other techniques
disconnect between a hypothetical situation and respondent’s actual views and behaviors;
cultures may differ in their ability to think hypothetically (e.g., [3])
questionnaire development;
concept understanding test;
  Concurrent think-aloud
[2][6]
Respondents' report of the thoughts they are having while answering a survey question open format with potential for unanticipated information;
lack of interviewer bias when probes are not used
unnatural;
high respondent burden;
may affect the natural response formation process, thus provide unrealistic picture of how respondents answer questions in the field;
coding may be burdensome;
assumes respondents are able to indentify and report what information they used to come up with a response to the survey question;
respondents may begin to over interpret the questions and come up with problems that do not exist in the natural context
questionnaire development
  Retrospective think-aloud
[1]
Interview with respondents after they have completed a survey about how they came up with answers to specific questions does not interfere with the response formation process assumes respondents are able to indentify and report what information they used to come up with a response to the survey question;
assumes information is still available in short-term memory
questionnaire development
Other Expert review
(for an overview, see [10])
Review of draft materials
by experienced methodologists, analysts, translators
cost efficient;
quick;
can identify a wide variety of problems in the survey questionnaire (from typos to skip patterns);
requires very small sample of experts (usually 2-3)
subjective;
no "real" respondents involved
questionnaire development
  Question Appraisal System
(for example, [23])
A systematic appraisal of survey questions that allows the user to identify potential problems in the wording or structure of the questions that may lead to difficulties in question administration, miscommunication, or other failings. cost efficient;
provides sense of reliability due to standardization
identifies a problem without pointing out to a solution questionnaire development
  Usability Testing
[11][22]
Testing of the functionalities of
CAPI, CATI, sample management systems or printed materials such as respondent and interviewer booklet, show cards, etc.
direct user assessment of the tools that will be used during data collection;
can be cheap - can be conducted with employs of the survey organization;
requires small sample sizes
time consuming; field work test
Statistical
Modeling
Multi-trait-multi-method Database
(see [20])
Database of MTMM studies
that provides estimates of reliability
and validity for over 1000 questionnaire items
provides quantitative measures of question quality costly and labor intense;
questions are considered in isolations, so question order effects might be ignored
 
  Item Response Theory
Approach[18]
Statistical models that
allow to examine how different items
discriminate across respondents with the same value on a trait
provides a quantitative measure of item functioning;
suitable for scale development
requires data collection; questions considered in isolation   

Return to Pretesting