The first model, Rep1, is the most efficient model--it uses only 3 variables, and only gets 2 states wrong, as mentioned above. It uses the percent of young women in the state, employment, and men in agriculture, forestry and mining. As with the previous models, employment & unemployment are important predictors of the Trump-Cruz race. In the prior models, I used unemployment, and the beta-coefficient was positive--meaning that in states where unemployment was high, Trump tended to beat Cruz, and vice versa. In the new models, I used employment, and as predicted, this coefficient is negative, describing that in states where employment is low, Trump does well, but in states with high employment, Cruz does better. This can be seen in specific jobs numbers found in each model. In Rep1, as the number of men in agriculture, forestry and mining (AFM) jobs goes down, Trump does better. In Rep2, as the number of men in mining jobs declines, Trump does better. Not all jobs had this pattern, or showed this level of statistical significance. The effects for women's employment was also not nearly as statistically significant in the Republican race, compared to the effects of men's employment. In Rep 1, the beta-coefficient shows that the AFM jobs variables is the strongest predictor, while the general employment variable is about half that. A test of the VIF (variance inflation factor) showed that while these two variables describe similar things, they do not influence each other in this model (vif<2 for both variables).
All of the models have an economic variable, in addition to the jobs variables. In Rep1 & Rep4, the economic variable is employment. In Rep2 & Rep3 it is family income. The results are consistent with the employment variable--i.e., in Rep1 & Rep4, when employment goes down, Trump does better, and in Rep2 & Rep3, when family income goes down, Trump does better. In that sense, all of the jobs and economic variables show a pleasing consistency--the worse the economy and jobs are, the better Trump does in that state.
Rep2 & Rep 3 are the most accurate models, in terms of correctly fitting all 32 states, and having the lowest residuals. But that comes at the cost of having to use 5 variables. In this case, both use two "political" variables, one "jobs" variable, a "cultural/demographic" variable, and an economic variable. Both models use a "tea party" measure, the strength of the tea party in Congress (the House), in 2011-12. In those states where the tea party did better, Trump does better. So while Cruz had a dominant history with the tea party, it could indicate that in states with stronger establishment voters, they are willing to deal with Cruz in order to avoid Trump.
Rep2 and Rep3 both use a second political variable--Rep2 uses the difference between the Obama and Clinton primary race in 2008, and Rep3 uses the percent of Democrats in the state-level senate (2014). The latter is positive-meaning that the more Democrats in your state senate, the better Trump does. The former represents a simple subtraction of Clinton-Obama, so a positive value indicates a win for Clinton. This beta-coefficient in Rep2 is positive, indicating that in states where Clinton did well in the 2008 primary, Trump does well in those states. Rep4 also has a political variable, results of the Republican vs Democrat presidential contests in 2000 & 2004, an average of a simple subtraction: Republican % - Democrat % in that state, meaning that a positive value indicates a Republican win by that margin over the Democrat. This beta-coefficient is negative, meaning that stronger Democrat wins in that state predicts stronger Trump wins. These latter two variables would seem to indicate that where you have a stronger Republican party, measured by stronger Republican margins in state and federal elections, Cruz does better. Perhaps this is indicative of Democrats willing to cross over to Trump, but not Cruz, and Independents, who might vote Republican or Democrats, are going voting for Trump (or are unable to vote at all in closed primary states, where they are required to register for a specific party).
Rep2, Rep3, and Rep4 also have "cultural/demographic" variables. Rep2 has a measure of race, the percent of the population that is Black. This beta-coefficient is positive, meaning that states with more African-Americans give Trump higher wins. Rep3 has a measure of a "Southern Culture Index" that I created--it also is positive, indicating that states with more "Southern Culture" tend to vote for Trump. This index is a combination of death rates, teen birth rates, slave population in 1860, and percent of the population that is White Evangelicals. Rep4 has a unique variable, provided by data from the British source, The Guardian, that counts how many citizens were killed by law enforcement in that state. This beta-coefficient is also positive, indicating that the more citizens killed by cops in your state, Trump does better. Predictably, this number is higher in Southern states, consistent with the prior two demographic/cultural measures.
There are very few "prediction" differences between these models and the models from March 23. Most significantly, from the April 5th Wisconsin vote, all four of the newest models show a Cruz win, while of the prior models, two of three showed a Cruz win. The "correct" model (M1R) is actually the same as the second model above, Rep2, and the beta-coefficients are very similar--this is expected, since the only difference in the new analysis is the inclusion of Wisconsin. However, most states show the same wins for both candidates. For example, all models, new and old, show strong wins for Trump in California, Connecticut, and New York, while giving Cruz wins in Montana and South Dakota. Some states have mixed predictions in the models, like Nebraska and New Mexico, so its anybody's guess there. Most models have Indiana going for Cruz (barely).