Sociologist in Fall Creek Place

Friday, March 18, 2016

"Southern Culture" Index--Part 2

Earlier this week I described my first attempt to create a "Southern Culture" Index that relied on non-economic variables to facilitate generating a model to help predict the primaries. The purpose of this approach was to have one factor that I could use as a "cultural" factor to distinguish regional differences in US voting patterns, as opposed to economic factors, a separate approach. Two weeks ago I posted my early attempt at creating a regression model that fit the Democratic primaries that had taken place up to that point between Clinton and Sanders (and O'Malley). That model used two economic factors--cost of living, and rate of unemployment--along with rates of college attendance, and correctly fit 14/15 of the primary elections. Neither cultural nor race/ethnicity variables improved the model.

This post about creating a factor index for "Southern Culture" will be more technical than the first, but will also propose a revised and improved model. In both cases I used exploratory factor analysis (principal axis factor extraction) in R, specifically, the psych package, to reduce 15 variables down to a 4-variable model that has good statistical properties, moreso than the proposed factors I mapped in my last post. Here is the current map that represents the factor I am calling the "Southern Culture" Index, with states divided into four groups, with darkest red being "most Southern" and lighter shades being "less Southern."

Using R, and the original 15 variables chosen from the literature that seemed to correlate to Southern states, I created a script that would put all 15 variables into every possible combination, and tested each of those models against nine common measures of goodness of fit for exploratory factor analysis (EFA) approaches (one heavily cited reference in the literature on these issues is Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55). This produced just over 5,000 combinations of 2-6 variables per model. Typically with EFA a researcher wants to reduce a number of variables into a smaller number of factors. Say, for instance, you have a survey of 100 personality-related questions, and you want to create a small set of personality "factors"--like introversion, agreeableness, etc. In that case, you could run the answers to all of the survey questions through a statistical analysis and let the software find a few different factors based on patterns in how the 100 questions were answered by your respondents. In this case, I wanted just to generate just 1 factor, which is why I took the approach that I did.

I filtered the 5000 models based on specific cutoff criteria in the literature:

Moderate correlation of the variables: ideally between 0.3-0.8
Bartlett's test for sphericity: should be less than 0.05
Kaiser-Mayer-Olkin Index (KMO): ideally above 0.8
Tucker-Lewis Index (TLI, also called NNFI): ideally over 0.95
Root mean square error of approximation (RMSEA): ideally less than 0.05
Root mean square of the residuals (corrected for degrees of freedom, cRMSR): ideally less than 0.05
Bayesian Information Criterion (BIC): there is no standard value, but the lower the raw value the better, ie, if you have negative values, the more negative the better.
The communalities (h^2) of the variables in the factor: at least 0.32 (Tabachnick, B. G., & Fidell, L. S. (2001). Using Multivariate Statistics. Boston: Allyn and Bacon.), but the closer to 1 the better.
R2, the proportion of the variance explained by the factor: the closer to 1 the better, I chose a cutoff of 0.5 (a factor that explains 50% of the variation).

Using psych in R, the results are produced using the "fa" command, specifying principal axis factor analysis, fm="pa". Since I only wanted 1 factor, the rotation didn't matter, so I specified rotate="none". Neither KMO nor Bartlett's are automatically produced from the "fa" command in psych, but are available as separate commands: KMO(your variables), and cortest.bartlett(your variables). In this case, "your variables" would only be the specific variables you are testing in a specific model, not your entire dataset.

I pulled all of the resulting models into Excel to filter and peruse. After filtering using the above criteria, I was left with 400 models remaining. One of the variables that I felt had to be included in the model was the percent of the population in 1860 who were slaves. There is very little else that has so fundamentally shaped the history of the South in the last 200 years than an entire way of life built on slavery. This social model (slavery) did not completely end after the Civil War, since similar cultural practices continued for generations to subjugate race minorities, such as Jim Crow laws, lynchings, etc, and arguably, still has a profound impact on Southern culture today.

Even though I wanted a "cultural", non-economic model, I still included 5 economic and employment variables in this analysis, such as median family income (2014), income growth, unemployment, and levels of employment change in certain industries, such as manufacturing, over the last 15 years. Looking through the filtered models, many contained these economic variables, and while there were several models that ranked very high that included these economic variables, models that did not include economics were also highly ranked. After excluding all models that had economic variables, and that did not include the slave populations measure, I had less than 40 models remaining. These I ranked by BIC (lowest value), TLI (highest value), and KMO (highest value).

After controlling for the cutoff values listed above, the model produced is different from the one I posted earlier this week. This model has far better statistical properties, and is composed of the following 4 variables: White Evangelicals, death rates (2005-2014), slave population (1860), and teen birth rates (2014, 15-19 year olds). The factor loadings for each variable was strong and positive, meaning, in this case, that increases in incidence of what was being measured were more strongly associated with "Southern Culture," while decreases were more weakly associated with "Southern Culture." More specifically, research indicates that each of these variables is strongly associated with Southern states: higher rates of White Evangelicals, higher rates of teen births, shorter life spans (increased death rates), and of course the history slave ownership. The principal factor extraction statistical method pulled out these four variables, of the 15 tested, as best explaining the variability of these four measures across all 50 states (I did not include Washington DC or Puerto Rico). The analysis in R produced the following results:

Variables	Factor Loadings	Communalities
White Evangelicals	0.84	0.71
death rates	0.95	0.90
slave population	0.64	0.41
teen birth rates	0.80	0.65

Bartlett's sphericity: p<0.001
KMO: 0.81
RMSEA: 0
cRMSR: 0.03
TLI: 1.04
BIC: -7.07
R2: 0.67

I did not include these tests in my filter, but they were produced by R, so I reproduce them here:

Correlation of scores with factors: ............. 0.97

Multiple R square of scores with factors: ....... 0.93

Minimum correlation of possible factor scores: ... 0.87

Wednesday, March 16, 2016

"Southern Culture" Index

In trying to generate models to predict/fit this year's election cycle, I wanted to eliminate cultural factors to focus on economic factors. Doing so meant that I had find a legitimate way to control for regional patterns of cultural difference--for example, differences between the South, Midwest, Northeast, and West, presuming such cultural differences exist. Various demographic maps lend visual credibility to the existence of regional differences, although rigorously disentangling economic from cultural factors is challenging. Prior literature indicates various factors associated with "Southern Culture," including several attempts to create a "Southern Culture Index."

Below are seven maps that show regional differences based on some of the more common factors that are mentioned in the academic literature that are associated with differences between the South and other parts of the country. Total, I found 20 variables that I included in an exploratory factor analysis, ranging from voting patterns (Republican to the South, Democratic to the North), occupational differences (manufacturing to the South, science and finance to the North), and income differences (higher to the North, and along the coasts, lower to the South). However, these variables tended to produce low statistical results as factors, so I did not create maps for them. The maps below represent the variables that produced the strongest results in terms of creating a "Southern Culture" Index. In addition, I produce four more maps of the best index models that were generated using various combinations of these 20 variables.

To summarize the data, the South has a number of challenges compared to the North, Midwest, and West. For example, in addition to lower income mentioned above (although the cost of living is lower, helping to compensate for income disparities), there are lower rates of college graduation and union membership. The South has significantly higher rates of firearm deaths, teen births, and death rates from various causes. Some researchers have identified a relationship between rates of violence in a region and large rates of Scotch-Irish ancestry in that region, both of which are found in the South. Similarly, the South has a long history of human rights abuses in terms of slavery, a history which continues to shape the South. These factors all contributed to the strongest indices. However, the final models did not use Scotch-Irish ancestry, income, or cost of living. The best models used combinations of the following six variables: death rates all causes (CDC, 2010-2015), firearm death rates (CDC, 2010-2014), union membership (BLS, 2015), teen birth rates 15-19 years (CDC, 2015), white Evangelicals (PRRI, 2015), and slave ownership as a percent of the population (1860).

I generated the following maps using the opensource software QGIS, and I used the opensource software R for the factor analysis to generate the indices. The package psych has several nice factor analysis features. The first seven maps show the individual factors, while the final four maps show the best index models (combinations of factors), along with technical information about the strength of the models. As can be seen, all four models produced results that are very similar. Red states are the most "South-like," blue states are the least "South-like," and purple states have mid-range "South-like" characteristics, according to each of the four models.

Death rates, all causes

Firearm death rates

Scotch-Irish ancestry

Slave ownership as a percent of the population (1860)

Teen birth rates

Union membership

White Evangelicals

4 factor index: White Evangelical + Union Membership + Death Rate + Slave Ownership

4 factor index: White Evangelical + Union Membership + Firearm Death Rates + Slave Ownership

5 factor index: White Evangelical + Union Membership + Firearm Death Rates + Teen Birth Rates + Slave Ownership

4 factor index: White Evangelical + Union Membership + Teen Birth Rates + Slave Ownership

Thursday, March 3, 2016

Regressing the Democratic Primary (Part 2)

Yesterday I posted a regression analysis that correctly predicts 13 of the 15 Democratic primaries/caucuses that have occurred so far this year. One of the most interesting aspects of the model is that it used no polling data or historical voting patterns--just 3 economic variables: median earnings from 2014, the cost of living for 2015, and unemployment for December 2015 (all of these are the latest available data for these measures). Based on a Facebook conversation that ensued, I added education and race through the model, specifically, the percent of residents in the state with a bachelor's degree or higher, and the percent Black population. Based on a specific recommendation, I also tried one interaction term, race with unemployment. The outcome variable is the difference between Sanders' votes and Clinton's votes. So, for example, in Vermont, Sanders beat Clinton by 72.5%, but in Alabama, Clinton beat Sanders by 58.6%.

I tried 36 different combinations of these 6 variables. While race was an important predictor, and in fact, when used by itself, correctly predicted 12/15 of the races. However, in several models it dropped out (it failed to reach statistical significance), and other models failed to see improved predictability. Education, however, proved to be a useful predictor. On its own, it was one of the worst predictors, missing almost half of the races. However, when combined with economic variables, specifically, the cost of living and unemployment, it produced the only model that missed just one state, Oklahoma. The rest of the models missed 2 or more states.

In addition to accuracy of prediction, I also produced results for AIC, BIC and the residuals. The 36 models, the p-value significance of each variable, and the AIC/BIC/Residuals data is in the image below. The column for B*U is the interaction term for %Black population * Unemployment. The last column is the number of states incorrectly predicted, and the table is sorted first by states predicted, and then by lowest AIC. In statistics, you can use AIC and BIC to compare different regression models--the lower the value, one has a better case to argue that it is a better model (lowest values highlighted in green). Similarly, lower residuals also tend to indicate a better model. As you can see from the chart, the top model does not have the best AIC/BIC/residuals, despite the fact that it has the best prediction history. In the models where I did not use a specific variable, that cell is highlighted in red and an "x." In models where the variable failed to reach statistical significance (p less than 0.05), I have crossed out the value and made the font red.

This image shows the predicted values of Sanders' wins in each state based on this model (cost of living, unemployment, and college education). The only state it missed is Oklahoma. A positive value is a win for Clinton (highlighted in red), and a negative value is a win for Sanders (highlighted in blue). In the "model prediction" column, the correct predictions are highlighted in green.

This image is from the actual R-output for this model, showing the p-values, model significance, adjusted R-square, the coefficients for each variable, etc.

(Addendum--Predictions)

This final image is a list of all 50 states + DC with the original data used to calculate the models, and predictions for the outcome of the rest of the primaries/caucuses. Model 1 is just the 2 economic variables + education. Model 2 is those same 3 variables, plus median earnings and % of the state population that self-identified as Black for the American Community Survey, 5 year estimate (2010-2014). It is, arguably, the 2nd best model--one of the problems with the model is that both race and education drop out of statistical significance. However, removing them from the analysis creates an inferior model, so for the purpose of comparison, I left this model intact alongside Model 1.

Wednesday, March 2, 2016

Economic Factors and the Democratic Primary

Political prognosticators use many factors to attempt predictions of elections, and there are many theories of what factors should predict elections. Over the last several weeks, the US has been starting to pick presidential candidates at the state level, through caucuses and primaries. On the Democratic side, 15 states have gone through this process and apportioned delegates of their choosing, at this point (March 2, 2016) narrowing the field to two candidates, Bernie Sanders and Hilary Clinton.

Of the factors proposed to predict how elections will go, economic factors and historical voting patterns are at the top of the list. Using these as a basis for a model to predict Democratic primary outcomes, I sorted through approximately 20 economic factors, presidential voting data since 1992, state-level voting data from 2014, and federal congress voting data from 2014. I also incorporated polling data, primarily from 538.com, which compiles and lists public polling data, as well as polling data from other sources when 538 did not list a particular state.

Using only economic and past voting variables as a basis, I constructed a regression model that explains 75% of the variation (the "adjusted r-square") of the primary & caucus results from the 15 states that have voted so far--this model correctly predicted 13/15 of the elections (missing Oklahoma and Massachusetts). In fact, the final model that I chose uses only 3 economic variables, and no past political voting or current polling data. A second model, using the same 3 economic variables, plus the state-wide results of the last presidential election (2012), explains 82% of the variation, however, it only predicted 12/15 of the primaries/caucuses. This second model gave results that were closer to the actual state-level results, however, it missed the vary tight race of Iowa (in addition to Oklahoma and Massachusetts, also missed by the first model). The dependent/outcome variable for this model, instead of the difference between the Clinton/Sanders votes, was the percent of the vote given to Sanders.

Model 1: Only economic factors (R-square=0.752, p<0.0001)

Difference between Clinton vs Sanders = Unemployment (Dec 2015) + Median Earnings (2014) + Cost of Living (2015)

Difference = -13.6 + 2805.6 x Unemployment + 0.0021 x Earnings - 0.0023 x COL

The three economic variables that I used were 1) Median Earnings for 2014 (this data is not yet available for 2015), 2) Unemployment for December 2015 (the latest data available), and 3) cost of living variation for 2015. Using the statistics package R, I used these three economic variables as my predictor/independent factors, and the raw difference between the percent of the vote given to Clinton vs Sanders in each respective state-wide primary/caucus. Using only these three economic variables, the regression model correctly predicted that Vermont, New Hampshire and Colorado would go for Bernie Sanders, while incorrectly predicting that Massachusetts would also go for Sanders. The model predicted that all other states would go for Clinton, which was correct, except for Oklahoma.

What is particularly interesting in this model, is that it does not use any cultural or political variables, not even past historical voting data or the expensive polling that newspapers and parties invest in. I expected that votes in the past several presidential elections would help make the model more accurate, and while the presidential election of 2008 was mathematically more accurate (smaller residuals, and larger r-square), it actually did slightly worse at predicting the outcome of the elections. Similarly, I presumed that the results of the midterm elections might be a good predictor, since there are common social patterns between midterm elections and primaries--specifically, you typically only get high-information, highly-motivated voters for both of these events. However, neither the federal congressional elections of 2014, nor the state-level house/senate votes created a better model. In fact, each of those midterm election variables produced a far worse model.

As for polling, I did not actually factor it into any of the final models. Part of the difficulty was determining which polls to use. Considering there is no one pollster that produces data available for all of the state, and no pollster uses the exact same methods, I did not believe it was reasonable, in the end, to include the polling data. In the table below where I show my data, I include an estimated average of the most recent polls listed at 538 for each state. Another interesting feature of the economic-based regression model I created, is that it has more predictive value than the polls--while this model predicted 13/15 state outcomes, the poll averages only predicted 12/15, missing New Hampshire, Oklahoma, and Massachusetts. If you include the margin of error in the Iowa polls as an incorrect prediction, the polling averages actually only predicted 11/15. These averaged did not take into account that certain individual polls may have had more predictive success than the average.

For each of the three economic variables, the correlations show that the better a state's economy was doing, the more likely they were to vote for Sanders. For example, the higher the median earnings for 2014, the more likely those states were to vote for Sanders. Similarly, the lower the unemployment rates for Dec 2015, the more likely they were to vote for Sanders. On the other hand, the higher the cost of living in a state, the more likely they were to vote for Clinton.

Finally, I have not tested the model for the results of the Republican primaries/caucuses, or any prior elections. Below is the data I used for this analysis (the "Model 1" values are the predicted difference between Sanders and Clinton, with a negative value favoring Sanders and a positive value favoring Clinton; the "Model 2" values are the predicted final percent in that state going for Sanders):

Saturday, February 13, 2016

SCOTUS Vacancies During an Election Year

SCOTUS Justice Scalia died earlier today. One of the first CBS Online interviews was a CATO Institute blogger who claimed it was unprecedented for a president during an election year to nominate a new SCOTUS justice. However, this is not true--I can give the blogger a break, since the interview was within 30 minutes of when the news broke.

My search of the coincidence of SCOTUS vacancies during an election year, limiting my search to the last 50 years, yielded up to 3 instances. There were prior instances--FDR had 2 SCOTUS nominations during 2 different election years, and Woodrow Wilson had 3 nominations during election years. But I want to focus on the post-WWII years. Since 1956, Eisenhower, Nixon, and Reagan have nominated a justice during (or just before) their election year--all three presidents were Republican and they had a Democrat Senate. Two were in their first term, and one was in his second term, about to be followed by his vice president.

The first was Eisenhower's nomination of William Brennan. Justice Minton retired on Oct 15, 1956, and Eisenhower acted quickly--he appointed Brennan by recess appointment the very next day, nominated him on Jan 14, 1957, and he was confirmed shortly thereafter. Eisenhower had been reelected by that point, in November, easily defeating the Democrat contender, Adlai Stevenson, by a 15% margin in the popular vote.

The second instance, only marginally relevant, was Nixon's appointments in late 1971. In September, 1971, two SCOTUS justices, Black and Harlan announced their retirements within days of each other, both for health reasons. Justice Black died shortly thereafter, and Harlan died in December. Nixon attempted to nominate several justices that suffered humiliating defeats. However, by mid-December, Justices Powell and Rehnquist had both been confirmed. The reason that Nixon's appointments aren't quite as relevant is that this all occurred the end of the year prior to the election, not during the election year itself. Nixon was reelected in November of 1972, wiping the floor with Democrat George McGovern, with almost a 25% margin of the popular vote.

The final instance was in 1987 when Reagan nominated Justice Kennedy. Again, this instance is only marginally relevant, since this occurred the year before the election. Justice Powell retired in 1987, and a very contentious confirmation process followed, where Bork was shot down, and Justice Kennedy, the current "swing vote" was nominated on Nov 30, and sworn in on February 1988.

This was Reagan's second to the last year in office, since he was nearing the end of his second term, with George H Bush, his Vice President, about to be reelected the following year. In that sense, this is the closest example to what might happen this year under Obama. First, it's most recent example, historically speaking. Second, both are in their second term (Nixon and Eisenhower were in their first terms). Third, as with the other examples, both Reagan and Obama face(d) a Senate of the opposite party--Reagan had a Democrat senate at this point, while Obama now has a Republican senate. However, all of this started in June of the previous year for Reagan when Powell announced his retirement, and Kennedy wasn't sworn in for another 8 months. Mitch McConnell, the current Senate Majority Leader, has already announced that Republicans will stonewall any attempts by Obama to get a new SCOTUS justice appointed.

There is a fourth instance, that might be closer than the other three, but in fundamental ways is different--Justice Warren announced his retirement in June of 1968, the year Nixon ran against Humphrey and Wallace. The retirement was to take effect when Johnson appointed Warren's successor. This process was stymied by Strom Thurmond in what is known colloquially as the "Thurmond Rule." Justice Burger was nominated in May of 1969, the year after Nixon won the election. I don't consider this a reasonable parallel to the situation of Scalia's death, in the sense that there was no vacant seat in Warren's case--he agreed to stay on until his replacement was found, so there was no national imperative. In the case of Scalia, the new term is about to begin, and the seat is empty. Further, Warren announced his retirement in June, while we are 4 months earlier in the cycle at this point.

Friday, November 20, 2015

Two Problems Solved with One Fix--Disabling HTML5 in Chromium

Ever since I "downgraded" from the wretchedness which was Windows 8 back to Windows 7 (Dell Inspiron did not allow me the choice to get my new laptop with Windows 7, so 8 was foisted onto me), one of the problems I have faced with Chromium (the open-source version of Chrome) is that about 80% of any videos I try to watch have an awful screeching static sound rather than the actual audio from the video. I've searched for 2 years to find a solution, with no success. Another annoyance is that when I go full-screen in videos there is a pop-up idiot warning that I have gone full-screen, and it won't go away unless I click on "approve." Having briefly worked in internet security, I never click pop-ups anywhere, anytime, for any reason--if I can't 'escape' out of a pop-up, or use AdBlock or NotScripts to get rid of it, I go to task manager and shut down the entire browser. I NEVER click pop-ups, and neither should you.

Anyway, today I found a way to get rid of the "you are in full screen" popup: disable the HTML 5 dll file, "ffmpegsumo.dll" by renaming it to something else. In my case, whenever I want to rename a file to disable the computer from accessing it, I put "RENAME" at the beginning of the name, so in this case, it became "RENAME-ffmpegsumo.dll" -- that way I can always find it easily if I need to undo this step.

The exciting news is that this also solved the problem of the awful, screeching, static noise!! Now I don't have to switch to firefox everytime I do searches for online videos!

Saturday, June 27, 2015

ANOVA-Regression, and the GLM--Comic Pedagogy

In my Intro to Statistics course, one of the tasks I feel obliged to do is to introduce the students to how two of the main topics of the course--linear regression and ANOVA--are linked, since they seem to be completely unrelated, other than the fact that we spend 75% of the semester on doing these two tests. Regression was first explored in the late 1890s by Pearson, applying the procedure to genetics, and similarly, ANOVA (analysis of variance), pioneered by Fisher, was also applied to genetics some decades later, in the 1920s. Both tests have somewhat different assumptions that must be met before they can be applied correctly, and both tests require different types of data. Because of these, and other differences, it isn't obvious that both of the two tests are based on the same math, linear/matrix algebra. At a later point in statistical research, their linkages were discovered, and now both are subsumed under the General Linear Model.

After studying these two separately in the Intro to Statistics class, I present this finding--that not only are these two tests based on the same math, but some statistical packages are moving to unify them. For example, in SPSS, there are some ANOVA tests that you can no longer find under the ANOVA tab--you must look under the GLM tab. Additionally, any data that ordinarily seems amenable only to an ANOVA test can be transformed into being amenable to regression. At the end of this lecture, I show my students one of my favorite geeky math cartoons--it sometimes goes around on Pi Day. I explain that after hearing that ANOVA and linear regression are fundamentally the same test, subsumed under the GLM, that their faces, I'm sure, all look like this, that they will rush out to twitter the discovery to all of their friends, and use the information to pick up dates at parties.

Friday, May 1, 2015

Prager "University" and Racist "Educational" Videos

I recently ran across a video from Prager "University," of which I have never heard--it's actually just a conservative think-tank founded by Dennis Prager, which has tacked the word "University" to his name. The title of the 5-minute video is "Don't Judge Blacks Differently," filed under the "Political Science" section of the "University." Production is a combination of a still-shot of the single speaker, and primitive South Park style animation. It's a fairly typical conservative approach to race--i.e., the "color-blind" approach, that if we ignore race then the problem isn't really there, and it's academics who are the "real racists." However, while claiming to be academic, being affiliated with a "university," the video not only fails to present any kind of evidence-based claims, the theories that it uses to present its ideology is contradicted by decades of sociological and economic research.

Because of the farcical nature of the video, I created a spoof of the video, using the same video, but overlaying it with the built-in Microsoft text-to-speech voice of Anna using the software Balabolka. The text-to-speech isn't always as clear as I had hoped, and I am considering adding in a sub-titled track. But that would be a lot of work, so maybe I will, maybe I won't...

I don't have a title for the remade video, but here it is on Youtube:

Prager Spoof Video of "Don't Judge Blacks Differently"

Wednesday, April 1, 2015

Social Movement Tactics--Integrating Symbols for Mutual Benefit

I had a recent unfortunate Facebook encounter. I posted the image above, and an interlocutor argued that the function of the image was to "erase" the Black civil rights movement by the "co-opting" of the iconic imagery of segregated water fountains by the LGBTQ movement. The catalyst for the image was the March passage of the Indiana's version of the Religious Freedom Restoration Act, causing a tidal wave of opposition nationwide, and is rightly seen primarily as a counter-offensive against the increase in LGBTQ rights, including a recent federal court decision striking down Indiana's attempt to enforce anti-gay marriage law. I have seen similar arguments appear based on individuals using other Black civil rights symbols in the current series of protests against Indiana's RFRA law. Language such as "co-opting," "dilution" and "erasing" of the symbols of racial injustice are being deployed to prevent LGBTQ activists from using any such imagery. Granted, I believe that such a practice has occurred in the wake of the Ferguson, "Black Lives Matter" slogan, when its counterpart, "All Lives Matter" became a point of controversy.

The issue in that case seems to be based on the false narrative of "reverse racism." This idea relies on an overly-simplistic understanding of the concept of "racism," which presumes that any type of discrimination or prejudice by one "race" or ethnic group against another can be called "racism." However, racism is neither a behavior, nor a belief/opinion--it is a systemic pattern of oppression built on unequal social power relations. While this blog post is not the place to argue that complex case, and while it is true that members of any race can have prejudices about another, and can discriminate against another, there is no such thing as "reverse racism," nor is there such a thing as "Black against White racism," specifically because of the fundamental structural inequalities between Blacks and Whites in the United States. Racism is about broad structural power, not individual acts.

So when the "Black Lives Matter" language was altered to "All Lives Matter," I believe that activists rightly argued that this represents an erasing of the importance of the race component of the broad social problem that started the "Black Lives Matter" movement. Given that the "Black Lives Matter" slogan was 1) new, so did not have the culturally iconic nature of symbols from the 1950s civil rights movements, such as the image of the segregated water fountains; and 2) that the revised slogan failed to link to any oppressed group or specific incident of injustice, I believe the opposition to "All Lives Matter" was justified. In that sense, "dilution" and "erasure" seem like an appropriate description. While it's certainly true that "All Lives Matter," movement success depends on the ability to highlight specific repression and motivate target groups. The revised slogan seems, at best, to merge all social problems into an abstraction, and at worst, tries to argue that, "sure, Black people get harassed by police, but so do White people, and you don't see us whining about it." That argument, ubiquitous among those who enjoy, but fail to identify, White privilege, fails to recognize the profound and specific disenfranchisement of race minorities in the US, and rightly needs to be vigorously confronted and rebutted.

I don't know where the specific image above came from--it seems to be a recent design. I can only track it back to a Facebook page from March 26, 2015. A similar image from 2010 points back to the 2008 California Proposition 8 campaign to prevent gay marriage in that state. However, the process of integrating movement symbology is not new. For example, the Black Power movement's primary symbol of the raised fist, which we see used at the 1968 Olympics by Tommy Smith and John Carlos, and other Black Power motifs of the 1960s-1970s, draws from a long history of "raised fist" imagery.

It more widely represents movements of mass solidarity against repressive states. For example, the following images are from the early 1900s, for the Russian Communist party, a 1917 union poster, and the cover of a 1948 Mexican resistance magazine. The Black Power movement's utilization of this early imagery I would argue, neither functions to, nor is intended to, "dilute" or "erase" the prior movements that had used these symbols, but rather, to pay homage to the fact that the movements are sharing in similar repression, and thus functions to link into the broader social consciousness and reinforce the connections between the movements.

Within social movements theory, a widely recognized phenomenon is the "protest cycle," where several movements often arise in tandem with each other, often preceded by the creation of a salient "master frame." For example, in the mid-late 1800s, Black civil rights gained significant ground during the Reconstruction Era, only to get quickly submerged and largely rolled back. The first wave feminist movement had much success in the late 1800s-early 1900s, then seemed to disappear. In the 1950s, the Black civil rights movement was revived, and the inclusion of university students into the movement, plus other explicit strategies to link it to other groups, broadened the movement into a national force, and created much broader and long-lasting effects than had the previous, and isolated movements. A broad anti-nuclear weapons and peace movement had been simmering in response to WWII, the Korean War, and the Cold War. Martin Luther King, Jr. was successfully able to connect those movements, largely White, middle class, together in a broader discussion about minority repression in the US, as is evidenced by his focus on non-violent actions in the protests he organized. This linked his movement constituency and goals to much wider groups than had been previously incorporated.

Specifically, the master frame that developed within this cultural milieu was rooted in the idea of civil rights and justice for all people, and was able to link many movements together--peace movements argued true peace requires social equality and justice; environmental movements argued that civil rights included a sustainable set of public policies that created livable spaces for all people; clearly, women and sexual minorities were able to draw from this frame to understand their own treatment by society, and join the broader movement for rights; similarly, class disenfranchisement can be understood as a basic civil rights-type issue of being treated with equality. When these varied groups were able to see each of their forms of repression under a common rubric, or 'frame,' then they were able to more effectively share experiences, leadership and tactics to resist, to protest, and to work together for broader social change. This synergy helped to produce the decade of mass protests in the 1960s, a 'protest cycle.'

Rather than "diluting" or "erasing" each other, these varied movements were able to create a larger mass movement by sharing their symbols, integrating them, and working together. The image at the top of the page is unmistakably a reference to the iconic image of the 1950s segregated water fountains, and the Black civil rights struggle that went into remediating the "separate but equal" fallacy. In a similar way, the Indiana RFRA creates the fallacy that all people will be treated equally, even though certain groups of people may be relegated to separate facilities of public accommodation, if one of those facilities is operated by a private individual who is offended by members of specific social groups. While it is claimed that current law prohibits discrimination based on race, so purportedly, even a religious objection to interracial marriage would not allow a restaurant to restrict access to an interracial couple (this is a disputed contention), no such protections exist for sexual minorities, so such groups, or any other non-protected classes of people, could legally be excluded from facilities of public accommodations--i.e., privately owned businesses which serve the public.

The historic image of the segregated water fountain overlaid with LGBTQ symbolism, points to the connections between these movements that were recognized at least as far as the 1960s. The linkage reminds the audience of the racist history of the United States, especially in the context of the current upsurge in recognition of continued race minority disenfranchisement in the wake of the Black Lives Matter movement. That reminder sensitizes the audience to that movement, while also raising the spectre of a revisitation of the past, both for the potential implications of this law for LGBTQ individuals, as well as any other minority group, including race minorities. I would argue, therefore, that, while there are ways that the majority can dilute and erase the power of minority resistance to oppression, such as the "All Lives Matter" counter-campaign, there are also powerful ways that movements can link arms together to support each other, show their solidarity with each other, and therefore benefit from each other's power. The segregated water fountain graphic above, from my perspective, works toward the goal of solidarity by connecting, in the popular consciousness, the same master frame of civil rights that unites all oppressed groups, and broad social repression that they face.

Wednesday, March 18, 2015

Campaign Finances: City-County Council Elections, 2015 (Update to yesterday's post)

Yesterday I posted an analysis of prior votes in the Indianapolis city-county council (CCC) elections, to make predictions about this November's election. Those predictions were based solely on prior voting estimates from the 2011 CCC election (the boundaries have since changed, so only an estimate is available), and the 2014 state elections (updated boundaries are applicable). In a sociology textbook from which I used to teach, the author quoted a common political science dictum, that the candidate who gets the most contribution money, wins 90% of the time. A recent analysis at the federal level, indicates that in the 2012 election, "94 percent of biggest House race spenders won," and "82 percent of biggest Senate race spenders won."

I don't know how that translates to local elections. In our case (Indianapolis), based on city published financial records of city-council filers as of 3/12/2015, of those who have filed reports, those with the largest war chests are likely to win anyway, either being an incumbent, the only filer in the race, or already having a wide historical voting margin. In that sense, the financial data as of mid-March doesn't give us much new information.

Below, I provide two tables of the same information, both of the major party candidate filers (the deadline has passed, so this is the final list, unless independents, or minor party candidates file)--the table on the left is sorted by contribution amount, and the table on the right is sorted by district. In both tables, I have highlighted the district in red if Republicans seem likely (or assured, by virtue of being the only candidate) to win the seat, blue if a Democrat is likely (or assured) to win the seat, and green if prior voting history does not give me confidence to predict the race--there are three districts in that category: 2, 3 & 21. Of those, only district 3 has candidates with more than $1,000 in contributions so far reported, and there are two incumbents running (Hickman and Scales). Historically, district 3 has voted largely Republican, and combined, the current 2 filers have almost 250% more money than the Democrat. So far, no empirical data is looking good for Ms. Hickman.

At the far right, there is a small table with the total Republican vs Democrat finances--so far Republicans have reported more than twice the contributions as Democrats.

Tuesday, March 17, 2015

Indianapolis City-County Council 2015 Election Predictions

I am making an early attempt at predictions for the Indianapolis City-County Council (CCC) elections for this Nov 3, 2015. Currently, Democrats have a majority--15 to 14. However, two factors are radically changing the status quo, and may put Republicans back in power: 1) the Republican supermajority Indiana state legislators eliminated the 4 "at large" seats in Marion County, and 2) the Republican political attorney, David Brooks, created the district maps that will be used until early 2020s. The latter was the source of lawsuits, as Democrats alleged that the maps were drawn prior to the designated time, in order to get them passed before the Democrats took office, not to mention the quarter of a million dollars spent for the creation of the maps, that likely could have been created for a fraction of that cost by non-partisan city employees in our current GIS (mapping) office. One might also mention that the Republican City Council President at the time, Ryan Vaughn, while losing his CCC presidency in the following election due to the majority shifting to the Democrats, he was subsequently hired by Republican Mayor Ballard to be his Chief of Staff.

The other issue, the elimination of the "at large" seats, was also controversial, widely viewed as a partisan attempt to eliminate the consistently Democrat-elected at large council seats, and an invasion of city affairs by state legislators. Not counting the current 4 Democrat "at large" seats, Republicans on the Council would currently have the majority, 14 to 11. Both of these factors are partisan efforts that give Republicans an advantage in future CCC elections. Spoiler alert: my conclusion is that Republicans have a good chance of regaining the majority in 2015.

Some districts are easy to predict: there is only one candidate who filed by the major party deadline last February, or both filers are from the same party, in districts 1, 7, 8, 9, 10, 11, 14, 23 and 24. Two of those are Republicans, and seven are Democrat. Theoretically, a minor party candidate or independent could file by July 15, 2015, but a win would be highly unlikely, considering Libertarians, legally a "major party" rarely exceed 5% in city council elections.

Several other races have a natural advantage because an incumbent is running. The changing of the district boundaries make the incumbent advantage less important than it might otherwise, since, in some cases, the incumbent's previous voters may have been largely shifted into another district, or in 3 cases there are two incumbents running against each other: districts 3, 13 and 22. Some of the districts where an incumbent is running, and past voting gives that same party an advantage, I am predicting as "likely for that party." In most other races, I largely use past voting to predict the outcome.

Generating past voting data for the current CCC election is difficult for two reasons. First, it is an off-off cycle election: not only is it not on the 2-year mid-term cycle where we elect many of our state and federal officials (non-presidential election year), always an even year election, the Indianapolis city elections are on an odd year cycle, and there are no other candidates on the ballot other than for local races. Historically, this gives an advantage to Republicans, since there is exceptionally low turnout by all voters, but especially the Democrat base voters. I looked at the most recent, 2014 election (as did Paul Ogden in his race predictions), which would have a similarly low turnout compared to the presidential-year elections, but odd year elections suffer from even worse turnout than what one would find in mid-term elections. The second problem is that one can't look directly at the previous CCC election, because of the boundary changes described above. Below (Appendix 1), I summarize how I estimate the voting for current districts from the 2011 CCC election.

The following tables show the district-level votes for the 2011 and 2014 elections. The 2011 percentages (left-hand table) are unweighted averages of the combined district-level and at-large council races--these are estimates, generated with the process described below (Appendix 1). The 2014 percentages (middle table) are unweighted averages of the auditor, treasurer, and secretary of state races--these are the actual voting percents (not estimates). I have highlighted in red those party outcomes with less than 45% of the vote, and green for those party outcomes of greater than 55%. I have also highlighted in purple those Libertarian wins of over 5% (calculations were done in MS Excel). As can be seen, there is a satisfying consistency between the 2011 estimates and the 2014 actual voting averages.

The right-hand table has my predictions for the 2015 race, based on past election results. Currently I am estimating 10 likely or secure seats for Democrats, and 12 likely or secure seats for Republicans, with 3 seats I am not willing to predict (districts 2, 3 & 21). Most of the districts where a substantial lead exists for either party, there is only one filer, so the race is not contested (except by multiple filers from the same party--districts 1 & 8, both Democrat)--those are designated with an "x" in the appropriate party column for the respective district (7 Democrat, 2 Republican). There are several races with an incumbent in a district where past voting gives that same party an advantage (1 Democrat, 7 Republican).

There are also several races with multiple incumbents running against each other--1 of those races seems to favor Democrats, and 1 Republicans, based on prior voted margins. District 2 has a Democratic incumbent, but prior voting gives Republicans a significant advantage (12%) in my 2011 CCC estimates, but only a small advantage in the 2014 election (1%). District 3 gives Republicans a significant advantage in previous elections (19% & 9%), but one incumbent is a prior at-large council-person (Hickman, Dem), against a district councilor (Scales, Rep). I do not know if that will impact voting decisions, so am leaving that race unpredicted. Finally, district 21 gives Democrats an advantage in the 2011 voting estimates (5%), but Republicans an advantage in the 2014 election (6%). The difference could be a preference for specific candidates, or an estimate error on my part. Either way, I am unwilling to predict that election, although if pressed, I would be likely to say it leans Republican based on the 2014 voting, the same as I might guess for district 3.

In any case, if my "likely or secure" predictions are accurate, Democrats would have to win all 3 of the unpredicted elections to retain the majority for the 2016 CCC, which seems unlikely. However, my predictions are based solely on previous elections. I refer the reader to Paul Ogden's site mentioned above, since he brings in pertinent local information, such as personality and funding factors.

Update: 1) Subsequently I posted campaign finances for the CCC candidates as of 3/12/2015. 2) I corrected an error in the 1st paragraph--I said that Vaughn lost the election, when I meant that he lost the majority presidency.

Appendix 1: 2011 Election Estimates Procedure Summary

For the 2015 prediction of CCC races, I used some geospatial mapping (QGIS) techniques to estimate how the past voting patterns would look based on the new precincts. Briefly, 1) I used GIS shapefiles of the old precincts with the 2011 CCC voting data, 2) overlaid that onto block-level census Tiger shapefiles, 3) calculated a factor variable that estimated the approximate mean number of voters for each block for each of the 2011 city-wide elections (mayor, at large CCC, district-level CCC) by party (Dem, Rep, Lib), 4) overlaid the new precinct-district maps onto the block-level voting estimates, and summed those values to get first precinct-level estimates, and then district-level estimates. To check the reliability of this method, I tested the correlation for straight-party voting for both Republican and Democrat by precinct, compared to the actual voting data from the 2014 election, obtaining r=0.88. I also compared the 2011 voting district-level estimates to the 2014 election for "registered voters," with differences ranging from 6% to 24%, and an average difference of 16%. While in absolute terms these are large differences, presumably the relative disturbance effects when comparing Democrat vs. Republican voters would be small--since I am comparing party voting within each district, not comparing districts to each other. (Some of this current work is based on mapping work I did in 2011-2012 to explore the outcomes of the Brooks maps using the newly drawn district maps)

Sunday, January 11, 2015

Growth Rate of University Graduates by Country (2000-2012)

Quick Data Note: Based on a Facebook conversation about U.S. college attendance, I looked at OECD data for 2000-2012 to see how the U.S. compared to other high-income countries. While our average annual university graduate rate per capita is slightly above the OECD average for this time period (3.0%, compared to 2.5% per year in a 30-country average), the growth rate of the percent of graduates per year is far below average. From 2000-2005, compared to 2006-2012, the US tertiary education (all programs, including advanced research programs) graduation growth rates were 17.6%, compared to the OECD average of 27.4%). This implies that while we continue to produce university graduates per year at a slightly greater rate than average, that rate is dropping dramatically--almost half the rate of the average high-income country, and we are close to the bottom of the pack. Here is a chart of 15 countries compared. Given that U.S. university costs are between 2-3x the average for the high-income countries, most of which provide tertiary education completely free for their citizens, it is not unreasonable to suggest that our college graduation rates will continue to decline, unless we implement measures to dramatically reduce college costs, putting us at a further global economic disadvantage.

Wednesday, October 29, 2014

Soda, Telomeres, Aging, and Statistics

Anderson Cooper recently highlighted a pre-print analysis of "telomere length" and drinking sweetened carbonated beverages (soda, pop, or coke, in the vernacular) on his Ridiculist. He even includes an interview with neurosurgeon/health reporter Sanjay Gupta. I'm currently teaching a statistics course, so I'm always on the lookout for cutting-edge, peer-reviewed research that may have statistics at the appropriate level for my class, and data that would interest university undergraduates. I downloaded the paper and pulled out the Results table.

The researchers' hypothesis is framed with the theoretical belief that telomere length is related to aging. I won't address that issue in this blog post, except to say that it's a controversial (and, in my personal opinion, poorly defended) proposition. I will limit my comments just to the statistics presented in this research, and specifically, just to Table 3--spoiler alert, I wonder if the editors of AJPH were impaired when they let it into the journal.

In the table, we see Models 1 and 2, which they describe in the notes. Model 1 is just age, gender and energy (which I couldn't find that they define--perhaps it's simple daily caloric intake?), while Model 2 includes a mish-mash of "healthy habits," such as healthy eating, BMI, smoking and alcohol, as well as some extra socioeconomic demographics, such as race, education, poverty level, etc. They compare four drinks--carbonated sugar-sweetened, noncarbonated sugar-sweetened, diet, and 100% fruit juice. They provide the quartiles, the b (regression coefficient; similar to the "m" or "slope" in the dreaded high-school algebra linear equation "y=mx+b"), and the 95% confidence intervals.

My first clue that something is amiss is that there is not a consistent linearity in the quartiles (not to mention that they don't provide Q0, the minimum--we don't necessarily need the Q0, but then why provide Q4, the maximum if you're not going to provide the minimum? it's just a consistency issue that doesn't affect the analysis but makes the table feel unbalanced and sketchy to me). Their regression is self-described as linear. However, the quartiles themselves are decidedly NOT linear--at least not for anything except the noncarbonated sugar-sweetened beverages, and the combined sugar-sweetened beverages index. That is problematic for me.

Let's ignore the non-linearity question for a moment, and just look at their base analysis, which is comparing the median values of the four beverages with their published b coefficients and confidence intervals. Let's ignore Model 1, since it's just demographics. Once you control for the everyday behaviors of the people in their study (they use data from the NHANES), they only have one variable that they claim reaches the level of statistical significance: people who consume sugar-sweetened carbonated beverages apparently have shorter telomeres. While it would be nice to overstate the other Model 2 coefficients, such as claiming that people who drink fruit juice have longer telomeres, and people who drink diet drinks have neither positive nor negative impacts on telomeres, it's statistically inappropriate to make those claims, since neither of those measurements achieved statistical significance (p<0.05), therefore we can completely ignore them. So let's compare the two extremes, based solely on the published medians of telomere lengths: sugar-sweetened soda with 1.13 & 100% fruit juice (diet soda median is equal to fruit juice) with 1.08. On the surface it looks like those two numbers are "different"--clearly they are "different numbers," but that doesn't mean that in "reality" in the general population they are different, since this output is based on a sample, and therefore an estimate. That's what "statistical analysis" does by definition--creates reasonable estimates of the general population based on samples.

Let's assume the sample meets standard scientific guidelines of randomness, etc, so we just have to determine if 1.08 vs 1.13 translates into an "actual" difference when applied to a general population estimate. According to the p-value of the coefficients, it does indeed appear to be different, but we'll get to that later. For now let's stick to the medians. Notice the spread from Q1, Q2 (the median), Q3 to Q4. Since they aren't linear, I'm not quite sure what the Q2 value actually represents. As any of my undergraduate statistics students can tell you, one of the data assumptions that must be met before you can do a regression analysis is data linearity--this non-linearity of quartiles makes me suspicious. Putting that question aside, I wonder what if the differences between the quartiles are "actually" differences, or if the non-linearity indicates that these are merely natural variation and there actually are not regular increases in telomere lengths from Q0 to Q4? We can't know that based solely on this study, but to me personally, I don't see that assumption is met given this table. Let's then take a leap and say--what if the actual Q2 measurement is anomalous, and perhaps a better estimate of Q2 would be to take an average of Q1 vs. Q3? In that case, the "estimated" Q2 for sugar-sweetened carbonated beverages is not 1.13 at all, but [(1.04+1.09)/2] 1.065, which is shorter than the telomere length of the fruit juice telomeres. This is actually what the researchers, in fact, predict with their regression equation--shorter telomeres with sugar-sweetened carbonated beverages. But this isn't necessarily self-evident, since the actual median shows that soda/pop has longer telomeres than fruit juice. Based on the presented median lengths, one could interpret that drinking sugar-sweetened sodas are actually better for you than drinking fruit juice! Granted, I'm not going to make that claim--although there is significant evidence that drinking fruit juice is not much healthier, if at all, than drinking soda.

Finally, let's look at the b coefficients. I always make a point for my statistics students to ignore any published data that doesn't include confidence intervals, since hiding confidence intervals (by not publishing them) is a GREAT way to completely misrepresent a data analysis to your benefit. One can't interpret any parametric analysis (like a regression coefficient) without the confidence intervals. In this case, it seems that in Model 2, sugar-sweetened carbonated drinks shorten telomeres (b=-0.010) compared to fruit juice, which seem to actually lengthen telomeres (b=+0.016). BUT! Remember that these are statistical estimates, and not "real" numbers--the real numbers for soda-related telomere shortening are actually somewhere between -0.020 and -0.001, and we can't know "actually" where without a 95% confidence of making a Type 1 error (i.e., claiming this result is real, when it isn't). So in reality, the authors aren't claiming that the "actual" telomere shortening is exactly -0.010, but almost certainly somewhere between -0.020 and -0.001.

Similarly, the alleged telomere lengthening properties of fruit juice isn't "exactly" +0.016, but likely somewhere between 0.000 and +0.033. So for my Introduction to Statistics students, I have them look at the maximum confidence value of the lowest measurement, and the lowest confidence value of the highest measurement before making an assessment. This means that the lowest likely value of telomere lengthening of fruit juice is actually 0.000 (i.e., no change at all), vs. telomere shortening of soda is -0.001. While "statistically" if looking at the p-values, that appears to be a measurable difference. But in reality, personally, I would interpret that as not at all an important clinical difference (I won't get into "effect sizes" in this particular blog post, but since they don't post the standard deviations, we can't calculate those, which I would guess come out to be completely insignificant).

What makes this difference perhaps even less relevant, is if one looks at the decimal points and takes rounding into consideration. If the soda telomere maximum confidence level is b=-.001, potentially that could be -0.0005. Similarly, the minimum telomore shortening of b=0.000 could be -0.0005; in other words, they could be basically the same value, depending on how they rounded! On the one hand, I would have liked to have seen that one extra decimal place to rule out that possibility. On the other hand, one can simply eliminate a decimal place, and then the maximum soda level becomes b=0.00 and minimum fruit juice level would also be b=0.00. So, in actuality, I don't really need to see that extra decimal place at all--I don't think these results support their conclusions, going by this table alone.

Sunday, October 26, 2014

Just over a week left before the 2014 election, with major shake-ups likely for U.S. Senate and Governor races. Why are 6 seats important? Because that's all the GOP needs to take control of the senate by 1 vote.

Comparing 6 different prediction sites, all agree that Republicans will pick up at least 6 seats, with the most likely being Alaska, Arkansas, Iowa, Kentucky, Louisiana and South Dakota. The only exception is Real Clear Politics, which is making the most conservative estimates (scientifically conservative, not politically conservative) and declaring many of the polling so close that they are still within the margin of error, so keeping them as "Toss-ups" (although they have published a "no toss-up map" that agrees with the other 5 prediction sites, that the GOP will pick up 6 seats). Of the other differences between the sites, one is Kansas, where Politico and Washington Post are calling likely Republican, whereas 538 and Princeton are calling "leans Independent". Both of the latter along with the Washington Post say Georgia is slightly leaning Democrat, and both Colorado and Louisiana are leaning Republican, each of which the other 3 sites still call toss-ups. Sabato at the UVa Center for Politics is still counting Kansas as a toss-up.

However, that number "6" is contingent on a couple of things. First, it presumes that Republicans would be certain of keeping all of the seats they currently control. But three of these seats are actually far closer than expected--Georgia, Kansas and Kentucky. In Kansas, Governor Brownback has made the Republican brand so toxic that the incumbent Republican senator may get kicked out of office, replaced by an independent who has not stated for whom he would vote for Senate leader, but he would not support either Reid or McConnell. So while McConnell's seat may end up being safe from challenger Lundergan-Grimes, unless Republicans vote in a new Senate leader other than McConnell, Orman might refuse to caucus with them, potentially giving Democrats a hail-Mary if the race is closer than expected. The second problem for Republicans is Georgia, where they lost their safe incumbent Saxby Chambliss, and now that race may turn into a runoff which 3 of the prediction sites are calling a toss-up, and the other 3 are saying leans slightly Democrat.

The second "6 seats to victory" contingency is that the close races will cut evenly between Democrats and Republicans. If the 6 states listed above all go for the Republicans, which all of the sites agree is likely, and they keep Georgia and Kansas, then they are safe. However, if either one of these states goes for the Democrats, then Republicans will need Colorado and/or Louisiana for the definitive win. Three prediction sites (Politico, Sabato and RCP) are not calling either of these races, leaving them as toss-ups as of Oct 26, while the other three sites (538, Princeton and WaPo) are calling them likely Republican (in the "no toss-ups" model, RCP calls both of these for Republicans). If Republicans bring in BOTH Colorado AND Louisiana, then they don't need Georgia or Kansas for the tie-breaker. But if three of these 4 states (GA, KS, CO, LA) go for Democrats, then the situation gets complicated.

First, if the Senate decision comes down to one seat, then Georgia or Louisiana may have to break the tie, and neither of those races may be determined by this election due to the nature of their system. For example, in Georgia, if none of the candidates break 50%, it requires a runoff which would be held in mid-January. Louisiana's potential runoff would be in early December. Second, if 3 of these borderline states go to Democrats, then the Senate would be tied, meaning Vice President Biden would be needed to break any tie votes.

So what all of this means, if that the Senate goes the way that all of these 6 sites predict, based on polling as of October 26, then control of the Senate in 2015-2016 will likely be Republican. The spoilers are Georgia, Kansas, Colorado and Louisiana. Republicans only need 2 of these for the win. But if Democrats get 3, Biden will be needed for a tie-breaker, and if they get all 4, then Democrats remain in control of the senate.

Friday, October 17, 2014

Health and Human Services Spending since 2003

The recent Texas Ebola outbreak has caused a political frenzy with mutual-blame casting by Republicans and Democrats. I looked up the official outlays as published by the U.S. Treasury. While we have had an increasing budget for several decades in terms of raw dollars, once adjusted for inflation and population growth, those budgets look increasingly anemic. Below are four separate budgets from the Health and Human Services, which is the agency most tied into U.S. health, as well as specifically preparedness for infectious disease control. The first two graphs are the total relevant budgets for the the National Institutes of Health, and Centers for Disease Control from 2003-2014, adjusted for inflation (in 2014 $), and population growth. Both budgets have dropped during the time period shown, from 2003-2014. The CDC budget has dropped from $20.07 per person in 2003 to $19.95 today. The NIH budget has dropped from $101.33 per person in 2003 to $97.57 per person today.

The second set of charts organizes the data slightly differently. Within the Health and Human Services Budgets, there are three separate agencies that engage in "health care research and training": NIH, CDC and HRSA. Two of these, the HRSA and CDC also have budgets for "health care services." I have aggregated these two categories of outlays, and depicted both the inflation-adjusted spending per-capita, and spending as a percent of real GDP. The health care research and training budget (inflation adjusted per capita) has doubled since 1979, but when compared to real GDP, spending on health care research and training as actually gone down, from 0.76% to 0.65%. Per capita, this budget has declined since 2003. Similarly, the HRSA and CDC budgets for direct health care have gone up (inflation adjusted per capita), not quite doubling from $24.35 to $40.21, but has decreased when compared to our real GDP, from 0.37% to 0.25%. Per capita, this budget has remained approximately the same since 2003.

Year	CDC Outlays (Inflation-Adjusted/Capita, 2014$)	NIH Outlays (Inflation-Adjusted/Capita, 2014$)	CDC Total Outlays (in millions)	NIH Total Outlays (in millions)	Health Care Research and Training (CDC+HRSA+NIH; Inflation Adjusted Per Capita)	Health Care Research and Training (as % of Real GDP)	Health Care Services (CDC+HRSA; Inflation adjusted per capita)	Health Care Services (as % of Real GDP)	Health Resources and Services Administration-Health care services	Health Resources and Services Administration-Health research and training	Centers for Disease Control and Prevention-Health care services	Centers for Disease Control and Prevention-Health research and training	National Institutes of Health-Health research and training	CPI	Population (in millions)
1979	3.63	43.67	238335	2869567	49.41	0.76%	24.35	0.37%	1417014	321648	182947	55388	2869567	68.3	225.06
1980	3.46	42.64	261502	3222305	48.94	0.75%	22.48	0.35%	1500392	412741	198362	63140	3222305	77.8	227.22
1981	3.52	42.23	300028	3603805	48.52	0.74%	19.81	0.30%	1470106	457519	220341	79687	3603805	87	229.47
1982	3.46	39.24	323137	3664695	44.22	0.68%	19.08	0.29%	1545563	378315	236376	86761	3664695	94.3	231.66
1983	3.70	38.36	361590	3749649	42.37	0.61%	15.09	0.22%	1174389	330608	300839	60751	3749649	97.8	233.79
1984	3.51	40.47	360128	4157294	42.70	0.58%	14.66	0.20%	1202976	171545	303007	57121	4157294	101.9	235.82
1985	3.43	43.52	368489	4670264	45.92	0.60%	14.26	0.18%	1223635	195573	306481	62008	4670264	105.5	237.92
1986	3.82	45.46	429378	5114537	48.17	0.61%	14.29	0.18%	1249201	234533	358427	70951	5114537	109.6	240.13
1987	4.05	45.34	466027	5222194	48.10	0.58%	14.22	0.17%	1234942	254067	402556	63471	5222194	111.2	242.29
1988	5.08	52.38	613764	6333941	54.77	0.64%	15.38	0.18%	1308047	226809	551422	62342	6333941	115.7	244.5
1989	6.45	54.72	823573	6991990	57.29	0.65%	15.24	0.17%	1229863	223020	717685	105888	6991990	121.1	246.82
1990	7.61	55.11	1034752	7491894	57.56	0.65%	16.70	0.19%	1370516	198856	899583	135169	7491894	127.4	249.62
1991	7.75	52.67	1128102	7667156	54.90	0.61%	17.06	0.19%	1467465	211557	1015393	112709	7667156	134.6	252.98
1992	7.91	55.34	1198123	8380743	57.84	0.61%	19.15	0.20%	1831699	248467	1068990	129133	8380743	138.1	256.51
1993	8.92	60.21	1412925	9540319	62.80	0.65%	20.51	0.21%	1993519	254149	1256552	156373	9540319	142.6	259.92
1994	9.55	61.75	1570939	10155278	64.32	0.64%	22.37	0.22%	2257521	271948	1421135	149804	10155278	146.2	263.13
1995	10.44	63.61	1786000	10883000	65.95	0.64%	22.43	0.22%	2213000	239000	1625000	161000	10883000	150.3	266.28
1996	12.19	57.46	2167000	10217000	59.97	0.56%	31.18	0.29%	3537000	288000	2008000	159000	10217000	154.4	269.39
1997	12.13	60.39	2249000	11199000	63.15	0.56%	27.53	0.25%	3023000	345000	2082000	167000	11199000	159.1	272.65
1998	12.65	65.59	2410000	12500000	68.15	0.58%	27.44	0.23%	3042000	265000	2187000	223000	12500000	161.6	275.85
1999	12.40	70.49	2430000	13815000	73.05	0.59%	29.03	0.24%	3481000	280000	2208000	222000	13815000	164.3	279.04
2000	12.44	75.71	2532000	15415000	78.36	0.62%	30.47	0.24%	3884000	328000	2321000	211000	15415000	168.8	282.16
2001	15.27	80.88	3257000	17253000	84.37	0.66%	33.05	0.26%	4065000	472000	2985000	272000	17253000	175.1	284.97
2002	16.36	93.91	3563000	20450000	98.53	0.76%	37.49	0.29%	5012000	596000	3152000	411000	20450000	177.1	287.63
2003	20.07	101.33	4523000	22834000	106.17	0.78%	41.79	0.31%	5323000	661000	4094000	429000	22834000	181.7	290.11
2004	18.60	110.54	4311000	25626000	116.72	0.84%	40.80	0.29%	5482000	1097000	3976000	335000	25626000	185.2	292.81
2005	18.79	112.58	4528000	27123000	116.65	0.81%	42.11	0.29%	5884000	712000	4260000	268000	27123000	190.7	295.52
2006	18.25	109.79	4615000	27771000	114.05	0.77%	40.35	0.27%	6087000	582000	4119000	496000	27771000	198.3	298.38
2007	21.02	107.95	5480000	28138000	111.95	0.75%	42.07	0.28%	5886000	644000	5080000	400000	28138000	202.42	301.23
2008	20.95	106.13	5749000	29123000	109.31	0.75%	43.07	0.30%	6268000	672000	5550000	199000	29123000	211.08	304.09
2009	22.14	107.79	6130000	29847000	110.97	0.76%	44.89	0.31%	6465000	713000	5964000	166000	29847000	211.14	306.77
2010	23.81	115.40	6822000	33068000	119.39	0.80%	48.41	0.32%	7078000	1116000	6794000	28000	33068000	216.69	309.33
2011	22.00	117.17	6454000	34370000	119.78	0.79%	48.32	0.32%	7544000	942000	6630000	-176000	34370000	220.22	311.59
2012	21.59	107.77	6567000	32781000	112.48	0.73%	45.08	0.29%	7755000	824000	5957000	610000	32781000	226.67	313.91
2013	20.29	99.52	6316000	30976000	104.83	0.66%	40.88	0.26%	7303000	753000	5419000	897000	30976000	230.28	316.16
2014	19.95	97.57	6364000	31124000	103.92	0.65%	40.21	0.25%	7667000	823000	5159000	1205000	31124000	233.92	319

Saturday, October 11, 2014

District 29, 2014: Delph vs. Ford

Last February, the current state senator from District 29, Mike Delph, had a twitter meltdown about the same-sex marriage constitutional amendment shenanigans at the Indiana state house, that arguably (along with some other issues) caused the Senate leader to impose several sanctions against Delph, which included moving him to the back of the chamber thus forced to sit with the Democrats, removing him from leadership roles, and taking away his press secretary. Shortly thereafter, a Democrat competitor arose from his district, JD Ford.

The district has traditionally been solidly Republican, but with the 2010 redistricting, and steady urbanization of that region, I wondered if the situation would have changed. Using Tiger geographic redistricting shape files , Indiana election results for 2012, and ACS population data, I created some estimates of who voted for whom at the precinct-level in 2 specific races--Attorney General, and School Superintendent. As background for these races, AG Zoeller was running for his 2nd term, being active in far-right political cases (like submitting anti-gay-marriage amicus briefs to various cases around the country), and School Superintendent Tony Bennett had been a charter-school activist while in office. Zoeller won his 2012 run, but Bennett lost to Glenda Ritz, despite the GOP sweeping the rest of the offices in the state, including electing a supermajority in both Senate and Representatives chambers.

In what is now the newly redistricted D29 for senate (some of these counts are estimates based on precinct line changes), Zoeller (R) won approximately 57% of the vote, while Ritz (D) won approximately 51% of the vote. There are several factors working in Delph's favor. First, he has an incumbent advantage. Second, he has the party advantage for the "6 year presidential itch;" i.e., in the 6th year of an incumbent president's term, the "other" party (in this case, Republicans) tend to have an advantage. Third, he has a turnout advantage--midterms tend to favor GOP. Fourth, he has the numbers in this district--Zoeller beat his competitor by a wide margin.

However, several factors may also come into play. The hit that he took by the senate majority leader, and Delph's very public meltdown about the same-sex marriage issue, which was accompanied by several very insulting remarks about what makes a "true" Christian (i.e., "any professed Christian minister that teaches any sin is acceptable is NOT acting in true love but in eternal condemnation #truth"), and arguably "crazy" comments on several other issues, may cause voters in his district to question is ability to represent them. While midterms already have miserably low turnout (usually about 1/3 of registered voters come out to midterms in the Marion County area), some typically stalwart GOP in his district may be incentivized to stay home rather than be forced to choose between Delph and the gay Democrat. And as demonstrated in the 2012 elections, D29 voters are more than willing to vote for the Democrat if they like the candidate, rather than simply doing a party-line vote.

Any of these factors could shift the vote in Ford's direction. Additionally, the demographic numbers that we have for that district are largely from the 2008-2012 ACS--but that entire area is dramatically and rapidly changing, largely in the direction of younger voters, and race minorities, so the influx of new residents, if they register and vote, will mostly be potential Ford voters, especially when given the choice between Delph and Ford.

In terms of raw numbers, there are about 100k residents in D29 over 18 (2012 ACS 5-year estimate), with an average age of 36, 52% female, and 69% White. But these estimates are 2-6 years old--a lot has changed in 2 years in that area. According to my redistricting estimates above, there were about 60k votes cast in the 2012 election, so perhaps half of that will show up for the midterms. Much of the new housing in the area, because of the recession and housing price bust, has been apartments, which will bring in voters more inclined towards Democrats (younger, poorer, single females).

Maps of the redistricting, by % of vote for AG and School Superintendent, are below.