2 Data

Here, we introduce the data sets we used to conduct this analysis. Because election results does not cover the demographic or economic factors that we will use, we need to join several data sets together to create the analysis and the visualizations that we might need.

Our main data set that contains all of the counties in Texas, their voting statistics and the amount of votes each candidate received from each county came from the US Census. Once we downloaded this data, we cleaned the data by removing unnecessary variables that did not have any impact on voting results and changed the county names so that they were in an identical style. This was our principle data set that we joined all of the following data sets to. The next, is our data set that included voter registration figures from 2016, gathered from Texas Secretary of State, and we combined that with our TX data set.

After that, we included data that had the population estimates from 2016 based off of data from the TDC - Texas Demographic Center. We also combined this with our main TX data to show the proportion of registered voters as well as the proportion of actual voters from registration.

Next, we added two data sets that included 2016 demographic data by county and 2016 income data, both by county and both of these data sets came from the Census. The Demographics data included the amount of people that were of a certain race or ethnicity and the number of people in each county that were foreign-born.

The income data included only the average per-capita income of each county and we combined both of these data sets to the main TX data set. Lastly, we include two data sets that we use for visualizing, which includes the spatial county map of Texas, gathered from the Tidycensus package in R, and data that adds the major cities in Texas, whose coordinated we acquired from Google Maps, to get a better understanding of where major city centers are located in Texas.