Thursday, April 7, 2016

Republican Presidential Primary Models: April

Last week I published new models for the Democratic presidential nomination race--here are the new Republican models, updated to include Wisconsin and a broader set of variables. Like the new Democratic models, I de-emphasized theory-building, and generated models that have the best fit of the states which have voted so far, specifically, by including models that use very specific types of jobs variables, like "men in agriculture, forestry and mining (2009-14)," & "change in the number of men in mining jobs (2000-13)." While one can plausibly build a theoretical case for why jobs data is related to the Republican race, it is far more difficult to explain why these specific jobs built statistically significant regression models, while related jobs variables did not. However, even including these jobs variables into the analysis, the best models were largely similar to the last set of models from March 23.

The first table, to the right, shows the state-level predictions/fits for the four models. The first column is the state, and the 2nd column shows the votes that have taken place so far. This is a simple subtraction of Trump-Cruz. It does not account for other contenders, such as votes that Rubio or Kasich have received. A positive number means that Trump beat Cruz by this margin, while a negative number means that Cruz beat Trump by this margin. The numbers highlighted in pink in the next 4 columns are the states this model incorrectly fits. So, for example, Rep1 got Louisiana and Maine wrong--it predicted that Maine would go for Trump, when it actually went for Cruz. Similarly, Rep4 gets 1 state wrong--also Maine. Rep2 and Rep3 correctly fits all 32 states that have voted so far in the Republican race. At the bottom of the page, the next table, lists the specific variables used in each model, and statistical information about each model.

The first model, Rep1, is the most efficient model--it uses only 3 variables, and only gets 2 states wrong, as mentioned above. It uses the percent of young women in the state, employment, and men in agriculture, forestry and mining. As with the previous models, employment & unemployment are important predictors of the Trump-Cruz race. In the prior models, I used unemployment, and the beta-coefficient was positive--meaning that in states where unemployment was high, Trump tended to beat Cruz, and vice versa. In the new models, I used employment, and as predicted, this coefficient is negative, describing that in states where employment is low, Trump does well, but in states with high employment, Cruz does better. This can be seen in specific jobs numbers found in each model. In Rep1, as the number of men in agriculture, forestry and mining (AFM) jobs goes down, Trump does better. In Rep2, as the number of men in mining jobs declines, Trump does better. Not all jobs had this pattern, or showed this level of statistical significance. The effects for women's employment was also not nearly as statistically significant in the Republican race, compared to the effects of men's employment. In Rep 1, the beta-coefficient shows that the AFM jobs variables is the strongest predictor, while the general employment variable is about half that. A test of the VIF (variance inflation factor) showed that while these two variables describe similar things, they do not influence each other in this model (vif<2 for both variables).

All of the models have an economic variable, in addition to the jobs variables. In Rep1 & Rep4, the economic variable is employment. In Rep2 & Rep3 it is family income. The results are consistent with the employment variable--i.e., in Rep1 & Rep4, when employment goes down, Trump does better, and in Rep2 & Rep3, when family income goes down, Trump does better. In that sense, all of the jobs and economic variables show a pleasing consistency--the worse the economy and jobs are, the better Trump does in that state.

Rep2 & Rep 3 are the most accurate models, in terms of correctly fitting all 32 states, and having the lowest residuals. But that comes at the cost of having to use 5 variables. In this case, both use two "political" variables, one "jobs" variable, a "cultural/demographic" variable, and an economic variable. Both models use a "tea party" measure, the strength of the tea party in Congress (the House), in 2011-12. In those states where the tea party did better, Trump does better. So while Cruz had a dominant history with the tea party, it could indicate that in states with stronger establishment voters, they are willing to deal with Cruz in order to avoid Trump.

Rep2 and Rep3 both use a second political variable--Rep2 uses the difference between the Obama and Clinton primary race in 2008, and Rep3 uses the percent of Democrats in the state-level senate (2014). The latter is positive-meaning that the more Democrats in your state senate, the better Trump does. The former represents a simple subtraction of Clinton-Obama, so a positive value indicates a win for Clinton. This beta-coefficient in Rep2 is positive, indicating that in states where Clinton did well in the 2008 primary, Trump does well in those states. Rep4 also has a political variable, results of the Republican vs Democrat presidential contests in 2000 & 2004, an average of a simple subtraction: Republican % - Democrat % in that state, meaning that a positive value indicates a Republican win by that margin over the Democrat. This beta-coefficient is negative, meaning that stronger Democrat wins in that state predicts stronger Trump wins. These latter two variables would seem to indicate that where you have a stronger Republican party, measured by stronger Republican margins in state and federal elections, Cruz does better. Perhaps this is indicative of Democrats willing to cross over to Trump, but not Cruz, and Independents, who might vote Republican or Democrats, are going voting for Trump (or are unable to vote at all in closed primary states, where they are required to register for a specific party).

Rep2, Rep3, and Rep4 also have "cultural/demographic" variables. Rep2 has a measure of race, the percent of the population that is Black. This beta-coefficient is positive, meaning that states with more African-Americans give Trump higher wins. Rep3 has a measure of a "Southern Culture Index" that I created--it also is positive, indicating that states with more "Southern Culture" tend to vote for Trump. This index is a combination of death rates, teen birth rates, slave population in 1860, and percent of the population that is White Evangelicals. Rep4 has a unique variable, provided by data from the British source, The Guardian, that counts how many citizens were killed by law enforcement in that state. This beta-coefficient is also positive, indicating that the more citizens killed by cops in your state, Trump does better. Predictably, this number is higher in Southern states, consistent with the prior two demographic/cultural measures.

There are very few "prediction" differences between these models and the models from March 23. Most significantly, from the April 5th Wisconsin vote, all four of the newest models show a Cruz win, while of the prior models, two of three showed a Cruz win. The "correct" model (M1R) is actually the same as the second model above, Rep2, and the beta-coefficients are very similar--this is expected, since the only difference in the new analysis is the inclusion of Wisconsin. However, most states show the same wins for both candidates. For example, all models, new and old, show strong wins for Trump in California, Connecticut, and New York, while giving Cruz wins in Montana and South Dakota. Some states have mixed predictions in the models, like Nebraska and New Mexico, so its anybody's guess there. Most models have Indiana going for Cruz (barely).

No comments:

Post a Comment