Glossary

Webpage last modified: 2008-Jun-20

Consolidated list of definitions across all modules

Adaptation
Changing existing materials (e.g., management plans, contracts, training manuals, questionnaires, etc.) by deliberately altering some content or design component to make the resulting materials more suitable for another sociocultural context or a particular population.
Adaptive behavior
Interviewer behavior that is tailored to the actual situation encountered.
ADQ (Ask different questions)
An approach to question design where researchers collect data across populations or countries using the most salient-population-specific questions on a given topic that or demonstrated to tap a construct that is germane or shared across populations.
ASCII files
Data files in American Standard Code for Information Interchange (ASCII) format.
ASQ (Ask the same questions)
An approach to question design where researchers collect data across populations/countries by first deciding on a common source questionnaire in one language and then producing whatever other language versions are needed on the basis of translation.
Attitudinal question
A question asking a respondent to evaluate a particular entity with some degree of favor or disfavor.
Audio computer-assisted self-interview (ACASI)
A mode in which the respondent uses a computer which displays the question on screen and plays audio recordings of the questions to the respondent, who then enters his/ answers.
Base weight
The inverse of the probability of selection.
Behavior coding
Systematic coding of the interviewer-respondent interaction in order to identify problems that arise during the question-answer process.
Behavioral question
A question asking a respondent to report actions or behaviors.
Bias
A systematic difference between the survey estimate of the population parameter and the true value in the population.
Bid (Consortium bid)
Bottom coding
A type of coding in which values that exceed a predetermined minimum value are reassigned to that minimum value or are recoded as missing data.
Bridge language
A language, common to both interviewers and respondents, that is used for data collection but may not be the first language of either person.
Call record
A written record of the time and outcome of each call attempt to a sample case.
Closed-ended question
A survey question in which the respondent is presented with a set of response alternatives from which to choose an answer.
Cluster sample (clustering)
A sample design in which a group of population elements in geographically proximal locations are selected as a whole.
Clustering
A sample design where the elements of the sampling frame are grouped into clusters. The clusters are then sampled and data is collected from one or more elements within each sampled cluster.
Codebook
A document that provides question-level metadata that are matched to variables in a dataset. Metadata include the elements of a data dictionary, as well as basic study documentation, question text, universes (the characteristics of respondents who were asked the question), the number of respondents who answered the question, and response frequencies or statistics.
Coding
Translating nonnumeric data into numeric fields.
Cognitive interviews
A pretesting method designed to uncover problems in survey items by having respondents think out loud while answering the question.
Cohen's kappa
A statistical measure that accounts for chance.
Complex survey data (or designs)
survey data sets (or designs) based on stratified single or multistage samples with weights designed to compensate for unequal probabilities of selection or nonresponse.
Concurrent mode
a mixed mode design in which one group of respondents uses one mode and another group of respondents uses another.
Confidential, confidentiality
Securing the identity of and any information provided by the respondent to ensure to the greatest extent possible that public identification of an individual participating in the study and/or his individual responses does not occur.
Consent, informed consent, written consent, oral consent
A process by which a sample member voluntarily confirms his or her willingness to participate in a study, after having been informed of all aspects of the study that are relevant to the sample member's decision to participate. Informed consent can be obtained with a written consent form or orally (or implied if the respondent returns a mail survey), depending on the study protocol.
Constructed variable
A recoded variable, one created by data producers or archives based on the data originally collected. An example might be the creation of a variable called POVERTY from information collected on the income of respondents.
Contact rate
The proportion of all cases in which some responsible member of the housing unit was reached by the survey.
Context effects
The impact of question context, such as the order or layout of questions, on survey responses.
Convenience sample
A sample of elements that are selected because it is convenient to use them, not because they are representative of the target population.
Conversational interviewing
Interviewing style in which interviewers read questions as they are worded but are allowed to use their own words to clarify the meaning of the questions.
Conversion process
Data processing procedures used to create harmonized variables from original input variables.
Cooperation rate
The proportion of all cases interviewed of all eligible units ever contacted.
Coordinating center
A research center that facilitates and organizes cross-national research activities.
Coverage
The proportion of the target population that is accounted for on the sampling frame.
Coverage bias
Bias due to a mismatch between the target population and the sampling frame.
Coverage rate
The number of elements on the sampling frame divided by the estimated number of elements in the target population.
Coversheet
Electronic or printed materials associated with each case that identify information about the case, e.g., the sample address, the unique identification number associated with a case, and the interviewer to whom a case is assigned. The coversheet often also contains an introduction to the study, instructions on how to screen sample members and randomly select the respondent, and space to record the date, time, outcome, and notes for every contact attempt.
Crosswalks
A description, usually presented in tabular format, of all the relationships between variables in individual data files and their counterparts in the harmonized file.
Data capture
Method of data collection.
Data cleaning
Identifying and correcting errors (defined by editing rules) in the dataset.
Data dictionary
Question or variable-level metadata, including variable names,labels, and data types.
Data Documentation Initiative (DDI)
An international effort to establish a standard for technical documentation describing social science data. A membership-based Alliance is developing the DDI specification, which is written in XML.
Data entry
The process of transferring verbal or written responses to an electronic form, for use by a computer.
Data life cycle
The history of a data collection from initial proposal planning and writing to final dissemination of the data, research findings, and preservation strategies.
Decentering
A model of comparative question design in which two different cultures are asked the same questions but the questions are developed simultaneously in each language. Thus, the process removes culture-specific elements from both versions.
Design effect
The impact of the complex survey design on sampling variance measured as the ratio of the sampling variance under the complex design to the sampling variance computed as a simple random sample.
dif (differential item functioning)
Item bias as a result of systematic differences in responses across cultures due to features of the item or measure itself, such as poor translation or ambiguous wording.
Disclosure analysis
The process of protecting the confidentiality of data. It involves limiting the amount of detailed information disseminated and/or masking data via noise addition, data swapping, generation of simulated or synthetic data, etc.
Disposition code
A code that indicates the result of a specific call attempt or the outcome assigned to a sample element at the end of data collection (e.g., noncontact, refusal, ineligible, complete interview).
Double-barreled (questions)
Survey questions that inadvertently ask about two objects at once.
Editing
Altering data recorded by the interviewer or respondent to improve the quality of the data (e.g., checking consistency, correcting mistakes, following up on suspicious values, deleting duplicates, etc.). Sometimes this term also includes coding and imputation, or the placement of a number into a field where data were missing.
Element
A single unit of the sampling frame.
Eligibility Rate
The number of eligible sample elements divided by the total number of elements on the sampling frame.
Emic (culture-specific) question
A question based on concepts or constructs that are culture-specific in constellation or significance and cannot be assumed to be shared across populations.
Ethics review committee or human subjects review board:
A group or committee that is given the responsibility by an institution to review that institution's research projects involving human subjects. The primary purpose of the review is to assure the protection of the safety, rights and welfare of the human subjects.
Etic (common) question
A question based on concepts or constructs that are universal and shared across cultures.
Ex-ante
The process of creating harmonized variables at the outset of data collection, based on using the same questionnaire or agreed definitions in the harmonization process.
Ex-post
The process of creating harmonized variables from data that already exists.
Fact sheet
A sheet, pamphlet, or brochure that provides important information about the study to assist respondents in making an informed decision about participation. Elements of a fact sheet may include the following: the purpose of the study, sponsorship, uses of the data, role of the respondent, sample selection procedures, benefits and risks of participation, and confidentiality.
Factual question
A question in which a true value exists for a particular respondent.
Field pilot study
Small scale rehearsals of the data collection conducted before the main survey.
Fixed panel design
A longitudinal study which attempts to collect survey data on the same sample elements at intervals over a period of time. After the initial sample selection, no additions to the sample are made.
Fixed panel plus births design
A longitudinal study in which a panel of individuals is interviewed at intervals over a period of time and additional elements are added to the sample.
Focus group
Small group discussions under the guidance of a moderator, often used in qualitative research, that can also be used to test survey questionnaires and survey protocols.
Full translation
Each translator translates all the material to be translated.
Gross sample
All eligible and ineligible elements of a sample.
Hadamard matrix
Square arrays of + and — that define balanced half samples. Such matrices exist for any multiple of four. Pluses [+] mean keep the first PSU and minuses [-] keep the second PSU in the stratum. Therefore, the first half sample identified in the matrix below keeps the first PSU in strata 1, 2, 3 and the second PSU in stratum 4.
Half open interval
A method of updating lists of addresses by adding previously omitted units to the sample when the units are identified geographically next to a selected unit.
Hours Per Interview (HPI)
A measure of study efficiency, calculated as the total number of interviewer hours spent during production (including travel, reluctance handling, listing, completing an interview, and other administrative tasks) divided by the total number of interviews.
Imputation
Computational methods that assign one or more estimated answers for each item that previously had missing, incomplete or implausible data.
Inconsistent responses
Inappropriate responses to branched questions. For instance, one question might ask if the respondent attended church last week; a response of "no" should skip the questions about church attendance and code the answers to those questions as "inapplicable." If questions after church attendance were coded any other way than "inapplicable," this would be inconsistent with the skip patterns of the survey instrument.
Interface design
Aspects of computer-assisted survey design focused on the interviewer's or respondent's experience and interaction with the computer and instrument.
Interviewer design effect (Deffint)
The extent to which interviewer variance increases the variance of the sample mean of a simple random sample.
Interviewer effect
Measurement errors, both systematic and variable, for which interviewers are responsible.
Interviewer falsification
Intentionally departing from the interviewer guidelines that could result in the contamination of the data. Falsification includes: 1) Fabricating all or part of an interview—the recording of data that are not provided by a designated survey respondent, and reporting them as answers of that respondent; 2) Deliberately misreporting disposition codes and falsifying process data (e.g., the recording of a refusal case as ineligible for the sample; reporting a fictitious contact attempt); 3) Deliberately miscoding the answer to a question in order to avoid follow-up questions; 4) Deliberately interviewing a nonsampled person in order to reduce effort required to complete an interview; or intentionally misrepresenting the data collection process to the survey management.
Interviewer variance
That component of overall variability in survey statistics that can be accounted for by the interviewers.
IRT (Item response theory)
A theory that guides statistical techniques used to detect survey or test questions that have item bias or differential response functioning (see dif).
Item-missing data:
The absence of information on individual data items for a sample case successfully measured on other items.
Loaded questions/words
Questions that are worded in such a way that invite respondents to respond in a particular way.
Majority country
A country with low per capita income or developing country (the majority of countries).
Mean square error (MSE)
The total error of a survey statistic; specifically, the sum of the variance and the bias squared.
Measurement equivalence
Equivalence of the calibration system used in the questionnaire and the translation.
Measurement error
Survey error (variance or bias) due to the measurement process; that is, error introduced by the survey instrument, the interviewer, or the respondent.
Metadata
Data that describes other data. The term encompasses a broad spectrum of information about the survey, from study title to sample design, details such as interviewer briefing notes, contextual data and/or information such as legal regulations, customs, and economic indicators.
Microdata
Data about variables within a behavioral unit, such as an individual or a corporation. Micro-data is often contrasted with aggregate data, which is about groups of behavioral units, such as individuals grouped by race, sex, or class, or corporations grouped by economic sector.
Minority country
A country with high per capita income or developed country (the minority of countries).
Mode
Method of data collection.
MTMM (Multi-trait multi-method)
A structural equation modeling technique used for construct validation and testing the equivalence of measures in cross-cultural research (constructs are the "traits" and "methods" are the cultures).
Nomenclatures
Set of code numbers.
Noncontact rate
The proportion of cases selected in a sample that could not be reached.
Non-interview
A sample element is selected, but an interview does not take place (for example, due to noncontact, refusal, or ineligibility).
Nonresponse
A failure to elicit responses from sample persons due to lack of contact or cooperation.
Nonresponse bias
Bias that is introduced when not all sample members participate in the survey and those that do not (the nonrespondents) differ from the respondents on the measure of interest.
Nonresponse error
Error (variance or bias) that is introduced when not all sample members participate in the survey (unit nonresponse) or not all survey items are answered (item nonreponse) by a sample member.
Open-ended question
A survey or interview format that allows respondents to answer questions in their own words. Unlike a closed question format, it does not provide a limited set of predefined answers.
Outcome rate
Response rate, refusal rate, or noncontact rate.
Outlier
An atypical observation which does not appear to follow the distribution of the rest of a dataset.
Overediting
extensive editing that becomes too financially costly for the amount of error that is being reduced
Paradata
Process data collected during data collection, such as call records, interviewer observations, etc.
Pilot study
A pretesting technique that involves all procedures and materials that will be involved in data collection; a dress rehearsal before the actual data collection begins.
Pledge of confidentiality
An agreement (typically in written form) to maintain the confidentiality of survey data that is signed by persons involved in data collection, post-survey processing or analysis.
'Portable' file
A file that can be used by a variety of software packages on a variety of hardware platforms.
Poststratification
A statistical adjustment that assures that sample estimates of totals or percentages (e.g. the estimate of the percentage of men in living in Mexico based on the sample) equal population totals or percentages (e.g. the estimate of the percentage of men living in Mexico based on Census data). The adjustment cells for poststratification are formed in a similar way as strata in sample selection, but variables can be used that were not on the original sampling frame at the time of selection.
Post-survey adjustments
Adjustments to reduce the impact of error on estimates.
Precoding
When designing the questionnaire and survey instrument, determine coding conventions and formats of survey items (especially the close-ended questions) based on existing coding frames or prior knowledge of the survey population.
Prescribed (behaviors)
Interviewer behaviors that must be carried out exactly as specified.
Pretesting
A collection of techniques and activities that allow researchers to evaluate survey questions and/or survey procedures before data collection begins.
Primacy
Context effects in which the placement of the item at the beginning of a list of response options increases the likelihood that it will be selected by the respondent.
Primary sampling unit (PSU)
A unit sampled at the first stage of selection.
Probability proportional to size
A sampling method that changes "the first- and second- stage selection chances in such a way that when multiplied together the probability is equal for every element, and the sample size is the same from one sample to the next."
Probability sample
A sample in which every element of the target population has a known, non-zero probability of being selected.
Process indicator
An indicator that refers to aspects of data collection (e.g., HPI, refusal rates, etc.).
Progress indicator
An indicator that refers to aspects of reaching the goal (e.g., number of complete interviews).
Proxy interview
An interview with anyone other than the person about whom information is being sought (e.g., parent, spouse).
Public use data files
A data file, stripped of respondent identifiers, that is distributed for the public to analyze.
Quality assurance
Statement of confidence that quality requirements will be fulfilled.
Quality control
Process focused on fulfilling quality requirements.
Quota Sampling
A non-probability sampling method that sets specific sample size quotas or target sample sizes for subclasses of the target population. The sample quotas are generally based on simple demographic characteristics, (e.g., quotas for gender, age groups and geographic region subclasses).
Random-digit-dialing (RDD)
A method of selecting telephone numbers in which the target population consists of all possible telephone numbers, and all telephone numbers have an equal probability of selection.
Randomized response technique (RRT)
A technique to reduce social desirability bias and item nonresponse due to sensitive questions. In this technique, the interviewer asks the respondent two questions—a sensitive question and a question believed to be not sensitive; both questions contain the same response options. One of these questions is randomly selected, but the interviewer is not aware of the outcome of the selection; thus, the impact of the interviewer on the response to the sensitive question is minimized.
Recency
Context effects in which the placement of the item at the end of a list of response options increases the likelihood that it will be selected by the respondent.
Recontact
Having another staff member (often a supervisor) attempt to speak with the respondent after the interview is reported, in order to verify that the interview was completed according to the specified protocol.
Refusal rate
The proportion of all sample elements in which a housing unit or potential respondent refuses to take part in the study.
Reinterview
The process or action of interviewing the same respondent twice to assess reliability (simple response variance).
Reluctance aversion (techniques), reluctance handling
Techniques that can reduce reluctance to participate in potential respondents, thereby increasing the overall response rate.
Repeated panel design
A series of fixed panel surveys that may or may not overlap in time. Generally, each panel is designed to represent the same target population definition applied at a different point in time.
Replicates
Probability subsamples of the full sample design
Residency rule
A rule to help interviewers determine which persons to include in the household listing, based on what the informant reports.
Response distributions
A description of the values and probabilities that a particular response was selected.
Response latency
A method of examining potential problems in responding to particular items, measured by the time between the interviewer asking a question and the response.
Response rate
The number of completed interviews divided by the total estimated number of eligible sample persons.
Restricted-use data files
A file that includes individually identifiable information that is confidential and may be protected by law. Restricted-use data files are not required to include variables that have undergone coarsening disclosure risk edits. These files are available to researchers under controlled conditions.
Rotating panel design
A study where elements are repeatedly measured a set number of times, then replaced by new randomly chosen elements. Typically, the newly-chosen elements are also measured repeatedly for the appropriate number of times.
Sample design
Information on the target and final sample sizes, strata definitions and the sample selection methodology.
Sample element or sample line
A selected unit of the target population that may be eligible or ineligible.
Sample management system
A computerized and/or paper-based system used to assign and monitor sample cases and record documentation for sample records (e.g., time and outcome of each contact attempt).
Sample persons
Persons selected from a sampling frame to participate in a particular survey.
Sampling bias
The systematic difference between the expected value (over all conceptual trials) of an unweighted sample estimate and the target population value.
Sampling error computational units (SECUs)
PSUs in 'one PSU per stratum' sampling designs that are grouped in pairs, after data collection, for purposes of estimating approximate sampling variances.
Sampling frame/sample frame
Lists or materials used to identify all elements (e.g., persons, households, establishments) of a survey population from which the sample will be selected . These lists or materials can include maps of areas in which the elements can be found, lists of members of a professional association and registries of addresses or persons.
Sampling units
Elements or clusters of elements considered for selection in some stage of sampling. For a sample with only one stage of selection, the sampling units are the same as the elements. In multi-stage samples (e.g., enumeration areas, then households within selected enumeration areas, and finally adults within selected households), different sampling units exist, while only the last is an element. The term primary sampling units (PSUs) refers to the sampling units chosen in the first stage of selection. The term secondary sampling units (SSUs) refers to sampling units within the PSUs that are chosen in the second stage of selection.
Sampling variance
A measure of the variability of the sample estimates of a population parameter, if all possible samples of the same size were selected from the sampling frame.
Sequential mixed mode
a mixed mode design in which additional modes are offered as part of a nonresponse follow-up program.
Silent monitoring
Monitoring without the interviewer being aware of the monitoring.
Social desirability bias
A tendency for respondents to overreport attributes or attitudes considered socially desirable (e.g. voting) and underreport undesirable attributes or attitudes (e.g. illegal behavior).
Socio-demographic questions
Background questions about respondent characteristics such as age, marital status, employment status, and education.
Source language
The language in which a questionnaire is available from which a translation is made. This is usually but not always the language in which the questionnaire was designed.
Source variables
Original variables chosen as part of the harmonization process.
Split panel design
A design that contains a blend of cross-sectional and panel samples at each new wave of data collection.
Split translation
Each translator translates only a part of the total material to be translated.
Standardized interviewing technique
An interviewing technique in which interviewers read every question exactly as worded, cannot interpret questions or responses, and cannot offer much in the way of clarification if it is not scripted.
Statistical data
Data from a survey or administrative source used to produce statistics.
Statistical process control charts
Charts that use statistical techniques to identify problems in processes and opportunities for improvement of processes.
Strata
Non-overlapping groups that comprise all of the elements on the sampling frame.
Stratification
A sample design that divides the sampling frame into mutually exclusive and exhaustive strata and places each element on the frame into one of the strata. Independent selections are then made from each strata, one by one, to ensure representation of subgroups of the population in the sample.
Stratum
A mutually exclusive group of elements on a sampling frame.
Substitution
A technique where each nonresponding sample element from the initial sample is replaced by another element of the target population, typically not an element selected in the initial sample.
Survey data
Information collected by researchers, which encompasses any measurement procedures that involve asking questions of respondents.
Survey error
The total error of a survey statistic; specifically, the sum of the variance and the bias squared.
Survey estimate
The value yielded by a survey.
Survey population
The actual population from which the survey data are collected, given the restrictions from data collection operations.
Survey weight
A statistical adjustment created to compensate for complex survey designs with features including, but not limited to, unequal likelihoods of selection, differences in response rates across key subgroups, and deviations from distributions on critical variables found in the target population from external sources, such as a national Census.
Surveyspeak
The special features of survey language (lower pronominal anaphor, for example) as found in source and target language questionnaires.
Tailor(ing)
The practice of adapting interviewer behavior to the respondent's expressed concerns and other cues, in order to provide feedback to the respondent that addresses his or her perceived reasons for not wanting to participate.
Target language
The language a questionnaire is translated into.
Target population
The finite population for which the survey sponsor wants to make inferences using the sample statistics.
Target variables
Variables created during the harmonization process.
Top coding
A type of coding in which values that exceed a predetermined maximum value are reassigned to that maximal value or are recoded as missing data.
Transformation algorithms
Changing all the values of a variable by using some mathematical operation.
Translation reviewer
In a team translation procedure, the person knowledgeable about surveys and also translation who leads and coordinates the review sessions held to refine draft translations.
Trusted digital repository
A repository whose mission is to provide reliable, long-term access to managed digital resources to its designated community, both now and in the future.
Undocumented codes
Codes that are not authorized for a particular question. For instance, if a question that records the sex of the respondent has documented codes of "1" for female and "2" for male and "9" for "missing data," a code of "3" would be an "undocumented code."
Unit nonresponse
A sample case that has little or no information because the individual declined the invitation to participate in the survey. Also known as a nonrespondent.
Universe
Another term for population. A group of persons (or institutions, events, or other subjects of study) that one wishes to describe or about which one wishes to generalize. To generalize about a population, one often studies a sample that is meant to be representative of the population.
Usability evaluation
Evaluation of a computer-assisted survey instrument to assess the impact of design on interviewer or respondent performance. Methods of evaluation include review by usability experts and observation of users working with the computer and survey instrument.
Vignettes
Brief stories/scenarios describing hypothetical situations or persons and their behaviors to which respondents are asked to react in order to allow the researcher to explore contextual influences on respondent's response formation processes.
Weighting
A post-survey adjustment that may account for differential coverage, sampling, and/or nonresponse processes.
Working group
Experts working together to oversee the implementation of a particular aspect of the survey lifecycle (e.g., sampling, questionnaire design, training, quality control, etc.)
XML (eXtensible Markup Language)
The eXtensible Markup Language (XML) is a simple dialect of SGML. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML was designed for ease of implementation and for interoperability with both SGML and HTML.

Return to top

Home

© 2008 The authors of the Guidelines hold the copyright. Please contact us if you wish to\n publish any of this material in any form.