Data cleaning was the final step in the processing of the data. In general terms, cleaning involved the standardization of values, including:
- correction of spelling errors
- standardization of variant or archaic spellings
- standardization of punctuation
- standardization of abbreviations
- standardization of word order
Also important was the exclusion of values that did not fit the form or logical possible content of a particular variable, such as when a number had been input for a variable requiring an alphabetic value. Whenever possible, errors of this sort have been corrected by consulting the original records, a process that continues today. Some variables (such as residence and occupation) also have been subjected to a coding process, undertaken for the purposes of clarifying and simplifying the range of data values. Any coding of a variable will be described in detail in the codebooks, which contain the most intricate information about the variables in the data sets.