Saturday, April 9, 2016

Regressing Kasich

I have been posting regression models that fit/predict the presidential primary races by state. Most recently, on the GOP side, four models that use between 3-5 variables each, correctly fit almost all of the states that had voted as of last week, and on the Democratic side, four models that use between 4-5 variables each, also correctly fit almost all of the states that had voted up to, but not including Wisconsin (all of which correctly predicted Sanders' win there). I have so far ignored Kasich, since it was clear early on that he would not gain enough state-level votes to get the presidential nomination on the first ballot. However, since the algorithm I developed makes it easy to plug in the other candidates, I decided to put Kasich through the models.

There is an important difference between all of the other models on both sides that I have produced so far, and the predictions to the right for Kasich. In prior models, I have used the dependent variable of the leading candidate minus the second candidate--thus, Clinton-Sanders, and Trump-Cruz. The resulting fits/predictions are of the difference between those two candidates, not a fit/prediction of the actual percent of win for the candidate. For Kasich, I have created the models based solely on his percent wins for each state. Using the same base of 3,000 variables, and preferencing models that have good statistical values, such as low residuals and BIC, and high adjusted r-squared, as well as a diversity of the "type" of variable used (jobs vs demographics, etc), to the right you will find the fit/predictions for all 50 states, and below you can find the statistical data about each model. All models have significance of p<.001, and all coefficients have significance of p<.01 (most are p<.001). Similar to the prior models, I used about 3,000 variables as the starting point, and while there were a number of very good models that were produced, I present one 2-variable, one 3-variable, and one 4-variable model. I did not test any higher order models.

There are some interesting, and unlikely predictions in some of the models. For example, Kasich2 & Kasich3 (the second and third models) both predict that Kasich will get 50% in New Jersey--a highly unlikely eventuality. Similarly, both models predict negative teens in North Dakota. Clearly, he cannot get negative votes, however, those models are very pessimistic at his success there. Both models provide fairly similar predictions, despite the fact that there is only one common variable between the two models--the percent of women in wholesale drug and chemical business in those states.

One of the unique features of the Kasich models, compared to both the Clinton-Sanders & the Trump-Cruz models, are that in the best Kasich models, I repeatedly found high ranking variables that described women in the workplace. For the previous models, the dominant jobs variables always described men in the workplace. The only male variable in the Kasich models is the broadest of the jobs measures I used, the change in the number of men's jobs from 2000-2013. The women's variables were specific to the last few years, and not a change in the jobs over time. For men, a decrease in the number of jobs, as shown in Kasich1, indicates that Kasich will do better in that state. In Kasich 2 & 3, more women in specific jobs compared to men, like wholesale drugs and chemicals, tends to signal better performance for Kasich.

There were several demographic variables in the high ranking models. The most common were those that designated a change in population and the percent of White Evangelicals. For population change, the second two models indicate that Kasich does better in states with decreasing populations, either from general population decline, or from out-migration. For religion, specifically the measure of White Evangelicals, Kasich does better in states with fewer of them. Economically, Kasich does better in states with higher costs of living. Interestingly, he does better in states where Black families have lower incomes. No other family income measure had a significant predictive utility for the Kasich models.

No comments:

Post a Comment