Friday, July 15, 2016

Getting Shot Dead by Police: Analyzing Guardian Data

Two studies have been recently publicized about police shootings by race, and they appear to be contradictory. One, a study published by Ross on the online peer-reviewed network, Plos One, looks at county-level data throughout the country from 2011-2014, finding that unarmed Blacks are 3.5 times more likely than unarmed Whites to be shot dead by police than White. The second, published by Fryer at NBER, found that Blacks were no more likely than Whites to be shot dead by police, when controlling for whether the victim was armed (this could be any type of weapon). It should be noted that of these sources are 'standard' academic outlets. NBER is not peer-reviewed--they are 'working papers' published by (typically) respected economists. Plos-One is peer-reviewed and generally respected, but because it is a newer, online-only format that doesn't specialize in one specific discipline, there are extra levels of skepticism about consistent reliability.

In this present analysis, I use two data sources--first, from the Guardian's, The Counted, and second, from the Ross, Plos One article above. Both have publicly available raw data, whereas the NBER paper does not. The Guardian data is available at GitHub, and is from all of 2015 through July 15, 2016. The Ross' data is available from Google Docs, and is from 2011-2014.

I limited my analysis to just those incidents where the victim was shot dead by police, and where the victim was either unarmed, or armed with a gun (or what could be misinterpreted as a gun, such as a realistic-looking toy gun). I use the phrase "shot dead" to specifically refer to the fact that the victim was killed by a firearm. The Guardian data lists all persons "killed" by police or in police custody by any means. The Ross data only lists police "shootings", but includes victims who were shot but did not die, and victims who were shot and died. The results are in Table 1 below.

In top half of Table 1, from the column labelled "X/White:Firearm," the Guardian data shows that Blacks are 2.3 times more likely than Whites to be shot dead by police if the victim is carrying a firearm, and 4.1 times more likely if they are not carrying a firearm (Guardian data). In the bottom half of the table, the Ross data (PLOS One), shows that Blacks are 3.3 times more likely than Whites to be shot dead by police if the victim is carrying a firearm, and 4.8 times more likely if they are not carrying a firearm. Hispanics are also at some greater levels of risk in both sets of data, while Asians are far less likely to be shot dead by police in any circumstance, while native Americans are at far greater risk if they are carrying a firearm (the Ross data only looks at Black, White & Hispanic).

Both sets of data shows that Whites are shot dead by police more frequently than Blacks, and Blacks are shot dead by police more frequently than Hispanics. This holds true whether or not the victim had a firearm, although the Ross data shows that from 2011-2014, the same number of unarmed Blacks and Whites were shot dead by police. Columns 5 & 6 show the rates at which Whites, Blacks & Hispanics are shot dead by police per million of their race/ethnic group. So Blacks with firearms are shot dead by police at a rate of 5.04 per million Blacks, and Blacks without firearms are shot dead by police at a rate of 1.4 per million Blacks. The final two columns show rates of Black and Hispanic deaths by police shootings in reference to White shooting deaths by police.

Both of these data sets fail to support findings published by Fryer in NBER. His study focused only on 10 specific communities, and his core analysis focuses only on Houston. He also asks very specific questions other than "rates at which Whites vs Blacks vs Hispanics are shot dead by police." The New York Times discussion of his results is here. Criticisms of the study can be found at Vox, by Feldman, and by Simonsohn.

Table two shows the population values I used to calculate the rates per million. This data was retrieved July 15, 2016, using Census FactFinder. One of the difficulties of these types of race-ethnicity analyses, is that while the Guardian and Ross create three categories of Black, White & Hispanic, the Census has two categories for race, Black & White, and a category for Hispanic ethnicity. This means that there are actually four categories for what the Guardian and Ross list--Black Hispanic, Black not-Hispanic, White Hispanic and White non-Hispanic. The Ross data does actually provide a way to separate these out, however, it is left unclarified how race & ethnicity are determined. In this case, I calculated White using non-Hispanic White, Black as non-Hispanic Black, and Hispanic as all categories noting Hispanic ethnicity. In other words, summing White Hispanic, Black Hispanic, Asian Hispanic, etc.

Friday, July 1, 2016

Predicting the Presidential Election from Polling

The pundits are currently saying that national-level polling about the presidential election is unreliable this far out--the election is just over 4 months away. However, state-level polling can be useful. For the first part of my analysis below, I looked at the polling from the 2012 election between Obama and Romney up to July 15 to see how well their results conformed to the actual election for that state.

For my methodology, I decided to compare three polls per state. The polls had to be between May-July 15, they had to be of "likely voters," and the sample size had to be above 500. I used data from the site Real Clear Politics which has polling data going back several elections for each state. In some cases, like Virginia, there were many polls conducted between that time frame, and more than 3 that polled only "likely voters." In that case, I used the 3 polls closest to, but before, July 15. The one exception to these criteria is that I always included PPP's results as one of the 3 polls. According to a study by a Fordham political science professor, PPP had the best polling results for that election.

For the 2012 election, there were only 15 states where the difference between Obama and Romney was less than 10%--I considered only those 15 states in this analysis. As can be seen from the table, in 2012, Obama won 11 of those 15 states, and Romney won only 4. The 3rd column, labelled "PPP Poll: Date" is of the listed PPP poll, and then the results for Obama, Romney, and the difference between them. If Obama polled higher, the "Obama-Romney Difference" column is blue. If Romney polled higher, it's red. To the right of that is the second poll used, with the date the poll was completed, and their results. In the furthest right section is the 3rd poll and their results. If there are blanks, it means there were not enough polls that met my criteria to include them in the list, so some states (Nevada and Minnesota) have only the PPP poll. Two other states (Missouri and Georgia) have only 2 polls.

Using this methodology, looking at the 3 polls for each of these 15 states, if at least two polls agreed on a winner, they did in fact correctly predict the winner for that state, even as early as the May/June/July polling. For these 15 states, the pre-July 15 polling by PPP only got 2 of these states wrong: Missouri and North Carolina, and both were within the margin of error, so were statistical ties (for simplicity of presentation, I did not include margin of error in the table). Using this as a guide, I propose that this methodology is reasonably useful to predict the 2016 election.

I followed the same methodology described above to collect polling data so far from Real Clear Politics. There aren't nearly as many state-level polls this year as in 2012. This could partly be because my 2012 method allowed polls up through July 15, and many of the above polls were, in fact, from early July--I am currently writing this on July 5th, which may explain the relative lack of polls. The table below shows the results

In order to win the presidency, the candidate must reach 270 electoral votes. If we assume that Clinton will win all of the 15 states (and DC) that Obama won by more than 10%, that gives her 191 electoral votes. If we assume that Trump will win all of the 20 states that Romney won by more than 10%, that gives him 154 electoral votes. If we then look at the 2016 polling data, and give any state where at least 2 polls agree on a candidate to that candidate as a win, then so far one state is going to Trump (GA), and 4 states are going for Clinton (OH, NH, IA, and WI). That puts Clinton at 229 electoral votes, 41 short of what she needs, and Trump at 170, 100 from what he needs.

Let's assume that MO & AZ go to Trump (Obama lost those by more than 9% in 2012), and MI & MN to to Clinton (Obama won those by more than 7.5%). Clinton is at 255, while Trump is at 191. In this scenario, the only real "battleground states" left are NC, FL, VA, CO, PA & CO. If Clinton wins either PA or FL, and Trump wins all of the other states, then Clinton still wins the election. Or if Clinton wins VA+NV or VA+CO, then Clinton wins the election. As of July 5, 538 (Nate Silvers) is predicting that Clinton will win every single one of those states (MI, MN, NV, PA, CO, VA, FL & NC), with Trump winning only MO & AZ.

What is surprising for me is that in an earlier analysis I showed that since WWII, the US likes to switch its presidential parties every 8 years, with the only exceptions being the Reagan-Bush long GOP tenure, and the short Carter Democratic tenure. I also noted that those years had unique economic situations--unusually high/low GDP growth and unemployment rates--that helped to explain these departures from typical election patterns. In our present case, pundits are telling us that dramatic demographic shifts are giving Democrats an advantage this year. But regardless, as we have seen with the success of the Trump candidacy, it is dangerous to predict anything political this year.


Correction: In the first version of this post, I had the incorrect values for the Georgia polls, showing that Clinton was predicted to win there. This has been fixed, and the relevant analysis corrected.