Now Available!

Data 101
  • Designed for new users
  • Subsets of our best variables
  • Tailored for popular research topics


The Early Indicators project created the following three data sets:

The census data is collected from U.S. Federal Census schedules from 1850-1940. The U.S. Constitution requires that a population census be taken every 10 years in order to apportion seats in the U.S. House of Representatives and determine the number of votes in the Electoral College and appointments in state and local legislatures. Census data provide a detailed picture of family structure and socio-economic conditions.

Due to privacy issues, Congress has stipulated a 72-year restriction to access Federal Census schedules. Because of this restriction the most recent census manuscript we have access to is from 1940. The 1850 census was the first to enumerate by name people other than heads of household, as well as their ages, occupations, birthplaces, and the value of their real estate. The majority of Civil War soldiers in our samples were born around 1840, so we use the 1850 census to examine their early lives.

When Early Indicators began census linking in 1992, the most recent census publicly available was from 1910. The Union Army sample of 332 companies was searched and linked to 1850, 1860, 1900, and 1910. Linking depended upon the technology available at that time, which included microfilm and a Soundex indexing system. In time, however, advances in technology, additional census releases, and the advent of Ancestry.com and FamilySearch.org enabled researchers to link soldiers more efficiently and with greater accuracy. Thus, they were able to link the last companies of the Union Army sample (27 companies from Indiana and Wisconsin) and all subsequent samples to every census decade available from 1850 to 1930 (all but the lost 1890 census). The Union Army sample is currently being re-linked to incorporate the additional census years, including the newly released 1940 census. This data, known as the UA Census Redo, will be available in the future on this website.

Possible soldier matches on the census manuscript were compared with information from data previously collected to determine the strength of the match. Researchers assigned a quality code to indicate the reliability of the linkage or, in other words, the degree to which information from the PEN and CMSR align with information in the census. Quality codes range from 1 to 4, 1 indicating the strongest link and 4 the weakest.

Although great effort has been made to ensure that the quality codes are specific and objective, some subjectivity is involved in each assignment, particularly for weaker matches. In all cases an individual found in the census must have a name and approximate age match to a soldier’s pension information. Quality codes have been used with the census data in an attempt to indicate the accuracy of linkage. The codes were designed to be as concise and objective as possible. However, there are many subtleties of census research that cannot be codified. The codes should, nonetheless, prove to be valuable guides to data users.

Census Linkages by Sample and Years Searched

UA X X     X X      
UA CEN redo X X X X X X X X X
Old USCT     X X X X X X  
USCT     X X X X X X X
Andersonville X X X X X X X X  
Urban X X X X X X X X  
Oldest Old X X X X X X X X X
1850 Census