XIV. Dissemination of Survey and Statistical Data

Webpage last modified: 2008-Sep-15

Introduction

Dissemination of survey and statistical data requires careful consideration of several aspects of the process of making data and documentation files available to secondary analysts. More is involved in the dissemination process than merely sending files stored on removable media to interested researchers, or putting files up on a server for others to download. Data producers and archives must assure analysts that the data they provide accurately reflects the efforts of the data collection process and is trustworthy, fully documented, and securely preserved for future use. Many international organizations also embrace these objectives. Although focused on micro-economic data, The International Monetary Fund, for example, established a set of guidelines on macro-economic (data for member countries to follow in order to provide the public with "comprehensive, timely, accessible, and reliable economic, financial, and socio-demographic data" [3].

Guidelines

Goal: To ensure that survey and statistical research teams in all countries involved in a project follow accepted standards for the preservation and dissemination of data to members of the social science research community.

  1. Preserve copies of all key data and documentation files produced at the end of the data collection process, as well as those made available for secondary analyses.
    Rationale
    Procedural steps
    Lessons learned
  2. Conduct effective disclosure analysis to protect respondent confidentiality.
    Rationale
    Procedural steps
    Lessons learned
  3. Consider the production of both public- and restricted-use data files.
    Rationale
    Procedural steps
    Lessons learned
  4. Produce data files that are easy for researchers to use.
    Rationale
    Procedural steps
    Lessons learned
  5. Develop finding aids to guide users in their quest to locate data collections they want to use.
    Rationale
    Procedural steps
    Lessons learned
  6. Create comprehensive training, outreach, and user support programs to inform the research community about the dataset.
    Rationale
    Procedural steps
    Lessons learned
  7. Produce comprehensive documentation for all public-use data files.
    Rationale
    Procedural steps
    Lessons learned
  8. Make quality control an integral part of all dissemination steps.
    Rationale
    Procedural steps
    Lessons learned

Glossary

ASCII files
Data files in American Standard Code for Information Interchange (ASCII) format.
Bottom coding
A type of coding in which values that exceed the predetermined minimum value are reassigned to that minimum value or are recoded as missing data.
Confidentiality
Securing the identity of, as well as any information provided by, the respondent, in order to ensure to the greatest extent possible that public identification of an individual participating in the study and/or his individual responses does not occur.
Constructed variable
A recoded variable, one created by data producers or archives based on the data originally collected. An example might be the creation of a variable called POVERTY from information collected on the income of respondents.
Data Documentation Initiative (DDI)
An international effort to establish a standard for technical documentation describing social science data. A membership-based Alliance is developing the DDI specification, which is written in XML.
Data life cycle
The history of a data collection from initial proposal planning and writing to final dissemination of the data, research findings, and preservation strategies.
Disclosure analysis
The process of protecting the confidentiality of data. It involves limiting the amount of detailed information disseminated and/or masking data via noise addition, data swapping, generation of simulated or synthetic data, etc.
Inconsistent responses
Inappropriate responses to branched questions. For instance, one question might ask if the respondent attended church last week; a response of "no" should skip the questions about church attendance and code the answers to those questions as "inapplicable." If those questions were coded any other way than "inapplicable," this would be inconsistent with the skip patterns of the survey instrument.
Metadata
Data that describes other data. The term encompasses a broad spectrum of information about the survey, from study title to sample design, details such as interviewer briefing notes, contextual data and/or information such as legal regulations, customs, and economic indicators.
Microdata
Data about variables within a behavioral unit, such as an individual or a corporation. Micro-data is often contrasted with aggregate data, which is about groups of behavioral units, such as individuals grouped by race, sex, or class, or corporations grouped by economic sector.
Missing data
The lack of information on individual data items for a sample element where other data items were successfully obtained.
'Portable' file
A file that can be used by a variety of software on a variety of hardware platforms.
Public use data files
A data file, stripped of respondent identifiers, that is distributed for the public to analyze.
Restricted-use data files
A file that includes individually identifiable information that is confidential and protected by law. Restricted-use data files are not required to include variables that have undergone coarsening disclosure risk edits. These files are available to researchers under controlled conditions.
Statistical data
Data from a survey or administrative source used to produce statistics.
Survey data
Information collected by researchers which encompasses any measurement procedures that involve asking questions of respondents.
Top coding
A type of coding in which values that exceed the predetermined maximum value are reassigned to that maximal value or are recoded as missing data.
Trusted digital repository
A repository whose mission is to provide reliable, long-term access to managed digital resources to its designated community, both now and in the future.
Undocumented codes
Codes that are not authorized for a particular question. For instance, if a question that records the sex of the respondent has documented codes of "1" for female and "2" for male and "9" for "missing data," a code of "3" would be an "undocumented code."
XML (eXtensible Markup Language)
The eXtensible Markup Language (XML) is a simple dialect of SGML. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML was designed for ease of implementation and for interoperability with both SGML and HTML.

References

[1] Data Documentation Initiative (DDI). Retrieved Sept. 15, 2008 from http://www.ddialliance.org/

[2] HIPAAps. Examples of Privacy Violations. Retrieved Sept. 15, 2008 from http://www.hipaaps.com/examples.html

[3] International Monetary Fund’s Dissemination Standards Bulletin Board. Retrieved Sept. 15, 2008 from http://dsbb.imf.org/Applications/web/dsbbhome/

[4] Inter-University Consortium for Political and Social Research (ICPSR). Data Sharing for Demographic Research. Retrieved Sept. 15, 2008 from http://www.icpsr.umich.edu/DSDR/rduc/

[5] National Digital Archive of Datasets (NDAD). Retrieved Sept. 15, 2008 from http://www.ndad.nationalarchives.gov.uk/

[6] O'Rourke, J. M., Roehrig, S., Heeringa, S. G., Reed, B. G., Birdsall, W.C., Overcashier, M., et al. (2006). Solving problems of disclosure risk while retaining key analytic uses of publicly released microdata. Journal of Empirical Research on Human Research Ethics, 1(3), 63-84.

[7] Royal Statistical Society & the UK Data Archive. (2002). Preserving & sharing statistical material. UK Data Archive: Essex. Retrieved Sept. 15, 2008 from http://www.data-archive.ac.uk/news/publications/PreservingSharing.pdf

[8] Sicinski, A. (1970). "Don't Know" answers in cross-national surveys, The Public Opinion Quarterly, 34(1), 126-129.

[9] Van Diessen, R. & Steenbergen, J. (2002). Long Term Preservation Study of the DNEP Project — an Overview of the Results. Amsterdam: IBM Netherlands. Retrieved Sept. 15, 2008 from http://www-05.ibm.com/nl/dias/resource/overview.pdf

Further Reading

Allum, P. & and Mehmut A. Economic Data Dissemination What Influences Country Performance On Frequency and Timeliness? November 2001 IMF Working Paper No. 01/173. Retrieved Sept. 15, 2008 from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=880222

Handbook on Civil Registration and Vital Statistics Systems. Policies and Protocols for the Release and Archiving of Individual Records. Department of Economic and Social Affairs, United Nations Statistics Division. Handbooks on Civil Registration and Vital Statistics Systems. Studies in Methods Series F, No. 70, 1998. Retrieved Sept. 15, 2008 from http://unstats.un.org/unsd/publication/SeriesF/SeriesF_70E.pdf

International Federation of Data Organizations Data Access and Conditions. Retrieved Sept. 15, 2008 from http://www.ifdo.org/data/data_access_conditions.html

Inter-university Consortium for Political and Social Research (ICPSR). The Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle, Version 3, 2005. Retrieved Sept. 15, 2008 from http://www.icpsr.com/ICPSR/access/dataprep.pdf

The Dataverse Network Project. Retrieved Sept. 15, 2008 from http://thedata.org/

United Nations Statistics Division. Retrieved Sept. 15, 2008 from http://unstats.un.org/unsd/default.htm

Return to top

Previous chapter | Next chapter | Home

© 2008 The authors of the Guidelines hold the copyright. Please contact us if you wish to\n publish any of this material in any form.