Social Science Dataset Preparation Services

Over the last 18 years, Sociometrics has developed a comprehensive set of procedures for the preparation and archiving of large social and behavioral science datasets that provide users of secondary data with substantial added value and ensure the highest data quality. Our data preparation and archiving systems utilize database applications, statistical programs, and proprietary executable files that automate most archiving processes, and establish data quality standards that have been achieved by very few organizations.

Sociometrics is now making available custom dataset preparation services to researchers and data holders who wish to produce a fully documented version of their dataset. Sociometrics' data preparation procedures are designed to produce datasets that are suitable for public sharing and distribution. Data prepared with Sociometrics' standardized procedures are easy to understand and use, and are accessible to researchers at all levels of experience. We offer two levels of dataset preparation: 1) basic, which results in a fully documented and easy to use dataset (see description of services below); and 2) comprehensive, which adds variable search and retrieval and data extract capabilities to a basic dataset. The costs of dataset preparation can be estimated from the chart we provide below, which is based on the number of variables and the level of preparation.

Dataset Submissions

Submissions should conform to the following criteria:

  1. Dataset file(s) should be available in one of the following formats:
    SPSS Portable
    SAS Unix Dataset and format library
    Raw data file and SPSS or SAS Syntax Statements that define the raw data file
    Raw data file and a machine-readable codebook
  2. The dataset file should include nominal variable labels and value labels for each variable, especially where constructed or calculated variables have been added to the dataset. Datasets with partially documented variables but complete codebooks that define all variables will be evaluated for potential submission.
  3. Paper or machine-readable documentation should accompany the dataset, including: codebook, data collection instrument, and reports detailing the sampling rationale and study description. Additional documents are welcomed (e.g., publications).

Basic Data Preparation Services

  • Data consistency and relational data checks, investigator-specified recodes and other data corrections

  • Evaluation and/or protection of data confidentiality

  • Application of Sociometrics substantive Topics and Types to each variable in a dataset. Topic and Type codes define key substantive areas addressed in a dataset (e.g., Race/Ethnicity, Childbearing, Contraception, Marriage and Cohabitation). Substantive coding is used to index variables coded under the same topic and facilitates searching and retrieving variables in large datasets.

    Click here to view a sample Topic and Type Code list from Sociometrics' Data Archive on Adolescent Pregnancy and Pregnancy Prevention.
  • Adding or editing variable and value labels for consistency and accuracy

  • Creation of new raw data file in standard 80 character record format

  • Creation of SPSS and SAS Syntax Programs that define the raw data file with variable names, variable labels, value labels, missing data declarations, recodes, and formats.

  • Creation of SPSS Portable, SAS Dataset, or STATA Dataset

  • Creation of SPSS Univariate Frequency Output and Data Dictionary

  • Creation of User's Guide to the Machine-Readable Files

  • Creation of PDF and MSWord codebook, data collection instrument and other documentation

Comprehensive Data Preparation Services

  • Linking of instrument pages or codebook pages to variables

  • Application of Sociometrics' Search and Retrieval software interface that allows data users to search an entire dataset, select variables for analysis, and export a list of variables for use in Sociometrics' Data Extract Program. Variable searching may be performed by keyword or substantive topic and type.

  • Application of Sociometrics' Data Extract Program that uses output from the Search & Retrieval program to produce SPSS or SAS syntax programs with selections of variables from a dataset: syntax programs created by the Data Extract program include all of the syntax required to define variables selected via the Search & Retrieval program; syntactic elements include variable names, variables labels, value labels, missing value declarations, recodes, and formats.

    Together, the Search & Retrieval/ Data Extract programs allow users to peruse very large datasets (or collections of datasets), select a smaller subset of variables from a large dataset, and then create complete SPSS or SAS syntax for the subset of selected variables. These two programs facilitate analysis on personal computing systems by allowing the user to create smaller extracts of very large datasets.

Cost Estimates

The cost of data preparation will be determined during the time that services are discussed. The cost of file preparation is calculated by considering the following elements: the number of variables in the dataset; the file complexity; the level of archiving desired (basic or comprehensive); whether existing documentation such as codebooks, instruments, and methodological descriptions is sufficient; and whether supporting documentation must be digitized or scanned.

Contact Information

For further information or to confirm or request an estimate, contact:

Sociometrics
650.949.3282
650.949.3299 (fax)
socio@socio.com
Sociometrics Corporation
170 State Street, Suite 260
Los Altos, California 94022