Contents of the downloadable zip or gzip dataset files, including basic file naming conventions

Individual files within a zipped or gzipped file are based on a combination of identifiers. The extension of a file indicates its content.

Filenames are a combination of: Archive Short Name + Data Set # + . + Extension

Archive Short Name is a three-letter name or acronym for the archive:
AGEData Archive on Social Research on Aging (DASRA)
CAMComplementary and Alternative Medicine Data Archive (CAMDA)
CDAContextual Data Archive (CDA)
CWPChild Well-Being and Poverty Archive (CWPDA)
DAPData Archive on Adolescent Pregnancy and Pregnancy Prevention (DAAPPP)
FAMAmerican Family Data Archive (AFDA)
MDAMaternal Drug Abuse Data Archive (MDA)
PETPromoting Evaluation, Teaching, and Research on AIDS Data Archive (PETRA)
RADResearch Archive on Disability in the U.S. (RADIUS)
STDAIDS/STD Data and Instrument Archive (AIDS)

Data Set # is a 2-or 4 character with leading zeros if numeric:
0103Data set 01-03
04Data set 04
A1Data set A1
AAData set AA

Extension indicates the type of file contents:
RAWRaw data file
DATRaw data file (for some datasets produced before 1996)
SPSSPSS syntax program - Windows and Unix (for all datasets produced after 2003 and Add Health Wave III and merged)
SPXSPSS syntax program - UNIX
SPWSPSS syntax program - Windows
SPCSPSS syntax program - PC+ (for datasets produced before 1996)
SASSAS syntax program
DICSPSS dictionary file
FRQSPSS frequency/statistics file
PORSPSS Portable system file
TPT or XPTSAS Transport file (for datasets produced after 2003 and Add Health Wave 3 and merged)
PDFUser's Guides, Instruments, and Codebooks are in PDF format

Data set contents are compressed into the following forms:
ZIPWindows zipped set of all ASCII/DOS product files and Word Documents
EXEWindows compressed set of all ASCII/DOS product files and Word Documents
TARUnix compressed set of all ASCII/DOS product files and Word Documents
GZUnix zipped set of all ASCII/DOS product files and Word Documents

Raw data file: Raw data files are named with the extension .RAW. The raw data file is an ASCII file. Many raw data files are too large to open in a word processing or database application. The text format of this file is described in the "da ta list" section of the SPSS syntax program file and the "Input" section of the SAS syntax program file. Very old datasets from the DAAPPP archive may contain raw data files with the extension .DAT.

SPSS Syntax Programs: Each dataset may include several versions of SPSS syntax files. Each SPSS file will read in the raw data file and create an active SPSS system file. All SPSS files are TEXT files.

Files named with the extension .SPC are very old SPSS/PC+ syntax programs; SPC files require editing to run on current Windows or UNIX operating systems.

Files named with the extension .SPX are SPSS/UNIX syntax programs. Older SPX files have different job control language at the top of the program and may require minor editing to run on SPSS/Windows.

Files named with the extension .SPW are SPSS/Windows syntax programs. Data sets produced after 2003 contain a single SPSS file named with the extension .SPS.

All SPSS syntax programs are text files that may be edited in the customer's statistical application or a word processing application. These files must be saved as text files if they are edited. Sociometrics will provide research support for older s yntax files and can assist users with editing these files.

SAS Syntax Program: SAS syntax files are named with extension .SAS. The SAS file consists of syntax to read the raw data file and create a SAS data set file. Editing of SAS files is described below. All SAS files are TEXT files.

Data Dictionary: Files named with extension .DIC contain a sequential list of variable and value labels. The DIC file consists of DISPLAY DICTIONARY output from the dataset's SPSS system file. Variable names and labels, value labels, missing value designations, print and write formats are clearly displayed. The dictionary file is a TEXT file.

Statistics: Files named with the extension .FRQ contain unweighted frequencies or other descriptive statistics for each variable. Descriptive statistics only are provided for variables with more than 50 value categories, such as respondent identification number. The frequency output file is a TEXT file.

SPSS Portable System File: Data sets include SPSS portable system files named with the extension .POR. Portable system files may be imported into SPSS/Windows or SPSS/UNIX. Syntax for importing portable system files into the SPSS application is provided below. The portable file is a ASCII file (but we don't recommend opening it in a text editor).

SAS Transport Files: Several data sets include SAS Transport dataset files. The extension for SAS transport files is .TPT. All datasets produced after 2003 will include SAS Transport Files. SAS transport files are created for cross-platform compatibility. The original SAS dataset was created in SAS version 8.01. StatTransfer was used to create the transport file. Opening transport files is described below. SAS transport files are binary files.

PDF Codebook and Instrument Files: PDF codebooks and instruments, where available, are included in the download file. Very old datasets (pre 1996) may not have machine-readable instruments; Sociometrics retains paper versions of all dataset i nstruments. These may be ordered from Sociometrics.

User's Guides: Data set specific users guides are included in each download file. The User's Guide provides information on study methodology, sampling and weighting, and a complete list of variable names and labels. Guides are in PDF format a nd are named with the dataset identification number.

Return to the Help/FAQ Page

or

Go to the Next Question