Wednesday, October 23, 2019

Interpreting Election Polling: Pay Attention to Margins of Error

I have to confess--I had absolutely no doubt Clinton would win the 2016 presidential election. Despite months of working with datasets about state-level polling, despite teaching statistics and sociological research methods, and closely following several sites that monitor state-level polling, I had no doubts. Zero doubts. However, in hindsight, it is clear I was not using my statisticians hat when I made that prediction--I was relying on intuition and emotion.

Probability gave Clinton the odds to win (according to 538, their last estimate was 71-29 for Trump, one of the most conservative estimates by news & polling sites). However, as any gambler knows, the odds are just that--odds, chances, possibilities. If you flip a coin that's weighted to come up heads 71% of the time, it is likely to come up tails 29% of the time. A 71% chance of winning is reasonably good odds, but clearly not a sure thing. An in the case of election polling, those odds can fluctuate from week to week based on current events and pubic mood. In fact, as of the end of September, 538 only gave Clinton 56-44 odds of winning.

Ironically, in my classes, as the campaign season heightened in late 2015 & early 2016, I warned students about the perils of ignoring the tedious details of polling results. I specifically showed the students a bar-graph of a respected poll comparing Trump vs Clinton--it looked like Clinton was far ahead... until you added in the confidence intervals. In the first graph below, the blue bar is the Clinton estimate, and the red is Trump, showing results from a Michigan poll. The pollster is Public Policy Polling (538 gives them a B+ rating). In this poll, showing Clinton clearly has a 5% lead over Trump. However, the margin of error for this poll (once you read the fine print) is 3.2%. That means the Trump estimate, shown at 41%, could actually be 3.2% above or below that, or a range of 37.5-44.2% (with 95% confidence). Similarly, Clinton, shown at 46%, could be anywhere from 42.8-49.2%. Notice on the graph the lower bar for Clinton overlaps with Trump's upper bar. This is what we call overlapping confidence intervals. In practice, this means they are at a statistical tie. So while I was teaching this to my students, I myself ignored my own advice. Clearly a case of "do as I say, not as I do."

Bring this idea to the national level. Many (especially Democrats, and clearly Clinton) believed Michigan, Wisconsin and Pennsylvania were a so-called 'blue-wall' that would definitely vote for a Democrat for president. Clearly this was not the case. In fact, their 46 electoral votes would have swung the election in Clinton's favor. Instead, they all went for Trump (although barely--none voted for Trump by more than 1%, and for all three states combined, it was less than 78,000 voters total that swung the vote for Trump). The graph below represents polling from just three states (MI, WI, PA), that 538 gives a B or higher quality rating, conducted in November, where the margin of error was reported. I include the raw data, and graphed the results with error bars. Contrary to conspiracy theorists, proposing there was election result interference, claiming the election results were all so far from polling estimates, the polling was actually all within the margin of error between Clinton and Trump. The only exception is one Wisconsin poll by PPP--while those error bars do not overlap, they are very close, which should make any skeptical poll watcher nervous. What does this tell us about 2020? When interpreting election polls, never ignore the confidence intervals (and don't let your gut feelings pull you away from the data).

Tuesday, July 30, 2019

Mapping in R & Jupyter Notebook (Python)

For a decade I have been doing all my mapping in Quantum GIS. However, I recently tried to do some spatial regression, and could not figure out how to make QGIS do it. This forced me to try other options, and I discovered that there are great packages available for mapping in both Jupyter Notebook (using Python), and R, with the latter having several packages for doing spatial regression (I'm sure such packages exist in Python as well, but since I've been using R for 15 years, that's where I'm more comfortable).

A long-running project of mine is mapping police violence, specifically, the number of people killed by police in the United States every year. Importing data from Mapping Police Violence, and the U.S. Census I generated maps both in Jupyter and R. This first set of four maps is a county-level depiction of the rates of killings in each state--specifically, the number of people killed by police in that county in relation to the total estimated population in that county in 2015. The Mapping Police Violence data covers all of 2013-2018. These maps were generated in Jupyter, which automatically created the nice legend on the side. The first map shows killings of all race/ethnic groups combined, while the next three separate the killings by Black, Hispanic, and White.

This next map is a screenshot of a map created in R-Studio, using the packages rgdal & leaflet. What I like about this, is that not only is it mapping the county-level data as before, but you can easily overlay any available set of maps underneath it, in this case, I imported OpenStreetMaps, the leaflet default. This allows the user to zoom in & out, and scroll all around the country.

The next part of this project was to do some spatial regression. Several packages were available, and I chose spgwr, because it specifically included geographic-weighted regression, and a way to map it with the sp package. The first issue was the regression itself. Since just over half of US counties have no recorded killings by police from 2013-2018 (ignoring the shocking aspect that just under half of all US counties DO have recorded killings by police in this 6-year period), there are a lot of zeroes in this data. This means the usual Gaussian, OLS approach will not work, since transforming the data cannot make it look normally distributed. Several distributions allow modelling 'count' data, for example, Poisson, quasipoisson, and negative binomial, with the latter being the best fit for my data. Another reason I used the spgwr package is that it allows geographically modelling negative binomial approaches.

One final issue is that there is some statistical question whether p-values have any meaning with geographically-weighted variables, since spatial auto-correlation is a serious issue--ie, effects are unlikely to be discretely localized inside county borders, but rather, are likely to be spread out over many counties, if not across most of a given state and across state lines as well. More discussion can be found by the author of the spgwr program (Roger Bivand, Norwegian School of Economics) here, and the creators of ArcGIS here--both removed p-values from their software, after initially including them for negative binomial approaches to geographically-weighted regression.

On the other hand, they still include p-values for OLS models. Because of this, as well as just for comparison, I generated spatial regression models for both OLS and negative binomial, as well as maps, plus p-values for OLS (recognizing they couldn't be trusted, I just wanted to see what they looked like). Out of about 20 demographic variables I originally included for analysis (many of which have been shown to be predictive in previous research), the best model, where all predictor variables remained statistically significant at p<0.05, included seven county-level variables (generally from the 2015 American Community Survey estimates): divorce rates, education (those lacking a high school diploma at 25+ yrs), house crowding (house occupancy rates with greater than 1 family), poverty rates, inequality rates (GINI), segregation (measured by the White-NonWhite Exposure Index, the likelihood of these two groups regularly interacting with each other in their communities), and percent of Trump voters in 2016. Here are summaries of the OLS (left) and negative binomial models (right).

While all variables are significant in both, what is interesting is the sign-flips for Trump voters and poverty between the models. The OLS model indicates a negative relationship between poverty & trump voters, as predictors of the number of killings by police. In other words, the fewer Trump voters, the higher police killings, and vice versa--similarly for poverty. However, given that this model is unreliable since it doesn't meet regression assumptions (normally distributed variables), the negative binomial results are much more important, which indicate that higher rates of poverty, and a higher percent of Trump voters in any given county, is predictive of the number of killings by police.

Finally, I mapped these results. Given that there is no good way to visualize a regression model with seven predictor variables, I mapped each individual predictor variable listed above. Here, I show only Trump voters coefficients. First, the OLS map, which, as mentioned above, indicates that fewer Trump voters predicts higher killings by police in most of the country--this is indicated with the pink, purple and blue areas on the map. Only the orange and yellow areas are where a higher rate of Trump voters predicts police killings. After that map is the map of p-values for the OLS model. Since the software doesn't generate p-values for the negative binomial model, those can't be shown--I show the OLS model here, just to show what the package can do if your data met OLS regression assumptions. If one could trust the OLS model, the coefficients for Trump voters would only be statistically significant in the green and blue areas (or yellow areas, if you did not mind using p < 0.1). Finally, the last map shows the negative binomial map, indicating that everywhere in the country, the relationship between Trump voters in 2016, and killings by police in any given county, are positive, though the areas in blue have weaker predictive value, and the areas in yellow have the strongest. Since this model had seven predictor variables, they each are stronger or weaker in different areas of the country.

Thursday, August 30, 2018

Windows 10, Shared Network Folders Fail

I have my desktop computer I use at home, and my work laptop I take back and forth with me. I mostly work from home, transferring relevant files, like lectures and exam to the work laptop--this requires properly working shared folders. This should be EASY. However, every time Windows rolls out a new update, this seems to break, with the evidence being the amount of sites and length of pages that discuss this exact issue, spiking every few months with new updates. The situation was apparently so bad that Windows discontinued the HomeGroup functionality, believing that would solve the problem.

If you are here, it's likely your shared network folders aren't working. Last week I had to reinstall Windows 10 on my work laptop, and of course I lost my network folders. I worked for four whole days Googling and trying various ways to bring back this functionality. While I can't claim to know what the problem is with your folders, this page finally fixed mine--you have to go on both computers in Group Policy Editor (gpedit.msc, assuming your version of Windows has this function), and "Enable insecure guest logons."

Guest access in SMB2 disabled by default in Windows 10 Fall Creators Update and Windows Server 2016 version 1709

The only reason I was able to find this solution is that (again, after four days) my computer gave me a new error message about unauthenticated guest access. I hope your solution search is quicker than mine. Good luck!

Wednesday, June 6, 2018

The Bindweed Horror

My house had been abandoned for 10 years when I bought it. Thus the yard was simply a jungle of monstrous vegetation. After several weeks of cutting and pulling the overgrowth, I thought I had it under control. I wanted a sustainable, natural yard, not a manicured lawn, so I largely allowed it to do what it wanted after that, just the occasional trimming. I was pleasantly surprised when a a vine produced pretty white flowers in the morning, which I (incorrectly) believed was morning glory. For years I allowed it to flourish, tearing it up when it started to take over. This year I realized the plant was not morning glory--it is bindweed. A quick Google could have solved this early on--bindweed produces a white flower, morning glory is other colors. I committed myself to its eradication.

Several websites offer suggestions for the removal of bindweed. The most extreme suggestion was simply to move away. The vigorous growth and nefarious root system makes it near impossible to eradicate. The University of California Extension says, "bindweed is one of the most persistent and difficult-to-control weeds in landscapes and agricultural crops." They suggest cultivating (i.e., pulling) it every 2-3 weeks to prevent germination. Other sites similarly recommend frequent pulling, claiming this will eventually "exhaust the roots." I don't know if this is accurate, but it sounds good to me, so that's what I have been doing for the last month.

The bindweed is mostly confined to the backyard, and specifically, only the north half, which is far sunnier, and separated from the south by a sidewalk. I have spent about 20 minutes in the morning, and another at night, scouring the yard for the telltale diamond-shaped leaves and/or viny appearance. Almost every site I found simply said to pull the weed, with no indication of trying to dig it out. However, after 2 weeks of picking about 75 bindweed plants per day, I dug one up and discovered the size of the root/rhizome--it's huge, as shown above. The UC site claims a plant can spread 10 feet from its root system. I switched my simple pulling to digging out as much as I can of each plant. Once you know what to look for, they are very easy to spot in a hunk of upturned dirt--they are thick, white, and make weird 90-180 degree turns, which can make them difficult to simply pull out of the soil without breaking off.

So far I haven't seen a significant decline in the number of bindweed found per day, but I can only assume it will eventually start working. The graph below (May 12-June 5; update on June 26 is below) shows the number of bindweed that I picked/dug up per day. The decline in the last few days could be a number of different factors--perhaps my hard work is paying off, perhaps I didn't spot as many as I should have, or the cooler nights might be inhibiting their growth (the last several nights have been in the high 50s, compared to nights in the upper 60s & low 70s for the previous few weeks).


Update: June 27, 2018 (3 weeks later)

For the first 2 weeks of this project, from May 12-May 24, I was only picking the bindweed from the ground up, i.e., I was not digging up the root system. Starting May 25, seeing no clear decline in the counts, I began digging up the easy roots. Again, as the count failed to significantly decline, I have become more aggressive in my digging. The picture above shows the perpendicular rhizome--it is fairly easy for the vertical component to snap off, leaving the horizontal rhizome intact. While one web site claims that picking the stems will "exhaust the roots," they provided no evidence for this, or if true, how long it might take. The picture below shows how large these systems can become. In this picture, I show what was under a stone block--bindweed kept coming up from along the sides, and when I finally moved the block, this was underneath. The graph below that shows the count to date, along with a trendline. As can be seen from the trendline, the daily count decreases, but very slowly.


Update: July 26, 2018 (7 weeks from the original post)

July in Indianapolis this year was very hot and dry. The various cover plants in my yard started to shrivel and I had larger patches of dirt. My twice-daily bindweed digs have significantly decreased, from the 50s at the beginning of July, to the 20s by the end of July. However, I have no way of proving it is the digs, or simply the dry heat, or even just a natural decrease for this time of year. I have caught each plant before they reach 6", mostly before 3", and no plants have flowered. I think I have increased my proficiency at digging out the white, fleshy rhizomes, although sometimes they are too deep to find efficiently, especially in the grassy areas. One artifact in the data, is that I realized I have been digging up 2 types of bindweed--field and hedge. The former has lighter colored and smaller leaves


Update: Sept 4, 2018 (6 weeks from the last post, 15 weeks since the original post)

While the end of July brought daily dig totals down to the 20s, increased rain and cooler temperatures bumped them back up into the mid-30s for two weeks, finally falling into the mid teens by the end of August. The trajectory seems to continue a steady decline as seen in the most recent graph. Using a regression model since the beginning of the count (May 19; the regression equation is on the chart), I should have already dropped down to 10 plants per day on Sept 2, while in actuality I remain in the mid-upper teens as of Sept 4. However, the decline of the count seems to have leveled off around mid-July, so a revised equation, beginning from July 20, predicts that I should average 10 plants/day by Sept 15, and 5 plants/day by Oct 5. By that late in the year, it would difficult to differentiate the natural decline from digging the plants up versus the declining temperatures and shorter days. Finally, it seems worth noting, that as mentioned in my previous update, I have 2 types of bindweed, field & hedge, and that while initially I was almost exclusively picking field bindweed, now it's about half & half.


Update: Oct 25, 2018 (7 weeks from the last post, 5 months since the original post)

This is the final report, at least for 2018. Over the last 10-days I have only found 3 bindweed. I would claim success, but my guess is that the significantly cooler temperatures and shorter days are the cause. I can only hope that by May of next year I will have far fewer bindweed than I had this year. At least one internet source claimed that the rhizomes can last for 3 years, which, if correct, means that I have another 2 years of digging up bindweed. However, about half-way through this year's experience, I began a concerted effort to dig all the way down to the rhizome. They are tricky to get, since they grow horizontally, and the plants snap off the rhizome with too much tugging, including the movement of digging down to get the rhizome, many of which were more than 6" below the surface. Here is the final graph for this year.

Friday, April 6, 2018

Slavery in Mississippi, Native Land Act in South Africa

In the Africa class I teach, a student recently asked me about a map I showed about demographics in South Africa--I was oblivious to the question that was being asked, and it took several hours afterwards for me to figure out that it was a great question that needed to be explored further. At issue is the Native's Land Act of 1913--it legally mandated that Black South Africans could only live in 7% of the country, despite the fact that Whites were only 22% of the population. This law was not repealed until 1991, at which point, Whites were only 11% of the population. The first map shown here is the country of South Africa, and the areas where Blacks were legally allowed to live. They were not allowed to be in any other areas of the country, unless they had work permits, and this racial movement through the country was strictly policed.

A second map shows the racial population distribution as of the 1970 South African Census. In hindsight, the student's confusion was obvious--if non-White South Africans were only allowed to live in certain locations in the eastern part of South Africa, why were there such high concentrations of non-Whites all throughout South Africa?

The simplest response, but one that is least informative, is that there were traditional nomadic groups, primarily in the West, that were allowed to continue their ways of life. They did not own property, and interacted very little with other segments of the population. They largely resided on lands that were otherwise not very useful to White South Africans, such as the Kalahari Desert, as shown in this third (land use) and fourth map (vegetation).

However, arguably a far better answer, in the context of the Apartheid regime instituted in 1948 by the National Party, is that high populations of non-Whites were allowed on White lands because of their use as cheap labor. At first glance, it might seem impossible to have such a high population of non-Whites in such large territories of South Africa, while the National Party maintained their tight grip on the country, and were to be able to impose their brutal, racist ideology. However, this same pattern was seen in the United States under slavery.

In order to demonstrate this I constructed a county-level map of Mississippi from 1860, using a GIS shapefile that was generously sent to me by the state GIS office (at the time of writing, their automated web site was not working--I notified them of the problem, so it may be fixed by the time this analysis is posted). Merging this file with the 1860 Census county-level data, produced the following map of Mississippi. As can be seen, a number of counties were over 75% slaves--the numbers shown in each county represent the total population in that county that were slaves in 1860. In fact, 55% of the total population of Mississippi in 1860 were slaves. With this example in mind, it should be easier to understand how such a large section of South Africa in 1970 could be non-White, when only Whites were allowed to own property in, or even legally be in (without a work permit), 90% of the country.

Monday, February 12, 2018

Two-Week Weather Prediction Accuracy

Last month I posted an analysis of the accuracy of several weather forecasting services, focusing just on temperature predictions. By the time they got to Day 7 in their predictions, the various forecasters were from 8-13 degrees (F) off from the actual temperatures for that day. The goal of this new analysis is to test predictions after that--from day 7-16, their 'long-range' forecasts. Most sites do not offer this service. For this test, I used Accuweather & The Weather Channel. Additionally, I compared those predictions to the historical averages provided by the National Weather Service. As can be seen in the charts, their long-range forecasts ranged from 4-12 degrees (F) off, and simply going by the historical averages were about as accurate as long-range predictions.

Data & Methods
For this analysis, I collected high and low temperature predictions from Accuweather and The Weather Channel from Jan 18-Jan 27 (2018), and historical temperature averages from the National Weather Service (NWS). Temperatures were typically collected at the same time of day, around 10am. I missed one day, Jan 19, so this analysis represents 9-days of forecast data, followed by another two weeks (until Feb 11) of recording the actual high-low temperatures as documented by the NWS.

For each of the 9 days of collecting forecast data, I recorded the predicted high and low temperatures the two services provided for days 7-16 into the future. Interestingly, starting with the 16th day of predictions, The Weather Channel no longer reported an attempt to predict the temperature, but simply reverted their forecast to the historical average temperature. Accuweather continued to provide a unique forecast for 3 months into the future.

Analysis & Results
The statistical analysis is based on taking the (absolute value of the) average difference of the forecasted temperature compared to the actual temperature over Days 7-16 into the future. That difference is plotted in Graph 1. Then the average of temperature deviations for all 10-days is plotted in Graph 2.

As can be seen in Graph 1, the deviations from the predicted versus actual high & low temperatures range from 4-12 degrees (F) over the course of the 10 days of predictions. What is striking in the graph is that there is no trend in the data--in other words, the further into the future the forecast goes, the predictions do not get worse (or better). This implies that by Day 7, the accuracy of the temperature forecasts are as good (or bad) as they are going to get from that point on, and that accuracy is not very good. While there is significant deviation in how badly the forecasts are off over the course of the 10-days of predictions, for example, Accuweather's low predictions range from being 5-degrees off at Day 8, versus 12-degrees off on Day 12, their low-temperature forecast did not continue to worsen--in fact, it got better. Similarly, on Day 7, The Weather Channel's low-temperature prediction was almost 10-degrees off, but decreased to about 7-degrees of after that, reaching a 6-degree deviation by Day 16. This lack of a trend implies a high-degree of randomness in the models used to predict temperatures by Day-7 into the future.

As can be seen in Graph 2, the average deviation in the forecasted temperatures over the 10-days of predictions was about 7-degrees off for both the lows and the highs. The average deviation from the historical averages is also about 7-degrees. This implies that with the current technology and weather modelling, the long-range temperature forecast is no better than simply relying on the historical averages, at least for the low-temperatures. The high-temperature predictions were slightly better--The Weather Channel's deviation was only 6-degrees off, while the historical averages were 9-degrees off. However, my guess is that this is simply a data-collection anomaly--in other words, collecting data over a longer time-perioud would likely have erased these differences.

Conclusion
While this sample size is limited, it seems reasonable to conclude that the accuracy of long-range temperature forecasts (defined as 7-16 days into the future) is no better than simply relying on the historical temperature averages. While sites like Accuweather provide temperature forecasts as far as 3-months into the future, those forecasts likely are as useless as flipping a coin to guess the temperature, or reading your horoscope to predict how your day will go.

One question I did not pursue, is the impact of climate change on the accuracy of 'historical average' temperatures. For example, assuming that climate has been changing over the last century, what if, instead of using the entire historical average highs and lows as provided by the NWS (I do not know how far back they take their averages), we simply used the highs and lows for the last 10 years? Would that data be closer to the actual temperature than the current historical average?

Saturday, January 13, 2018

Temperature Prediction Accuracy for Five Weather Services

With the recent arctic-temperatures here in Indiana, I wondered how accurate weather services were at predicting high and low temperatures. This study represents 12-days of data, collecting 7-day temperature predictions from 5 different weather prediction services: local NBC-affiliate WTHR, Intellicast, weather.com, the National Weather Service (NOAA), and Accuweather. All of these predictions are available online, and I have no specific reasons for choosing these services, except that over the years I have placed them in my browser bookmarks.

The data shown in the graph represents an average of the high and low temperature predictions for the given day, up to 7 days ahead, in Fahrenheit. Some services offer predictions 10 days or more, but two (WTHR & NWS) only offer only 7 days of predictions, so I chose that as my limiting factor. One exception to this is that WTHR and NWS do not offer the 7th-day low prediction (at least not when I collected the data at about 9 AM each day), so the graph below represents a full 7-day prediction for the other 3 services, but only represents 6.5 days for WTHR and NWS. Further, while the data was collected for a total of 12 days, there are not 12 days of data for each of the 7 days of predictions: there are 12 data points for day 1, 11 data points for day 2, and so on, until there are only 6 data points for day 7. The high and low temperature for each day comes from the NWS official record (click "Get Climate Data" for January 2018--a pdf of the month's summary so far will download). The temperature deviations on the graph represent an average of the difference between the high and low temperatures predicted, versus the actual high and low temperatures as recorded by the NWS.

As  can be seen, all services perform about the same from Day 1-Day 5, about 3-6 F deviation for each day, on average, although the local NBC-affiliate, WTHR, performs slightly worse by the 4th-5th days. By the 6th day, WTHR and the NWS have clearly started to diverge, and by day 7, all services range from 8-13 F from the actual temperature, with the NBC local-affiliate being by far the least successful, and Intellicast and Weather.com being the most. The 7th-day temperature deviance is even more striking for WTHR and NWS given that they alone do not provide a 7th-day low temperature prediction, but the other services do, yet they still perform better.

This was different from what I expected--I presumed that the local TV station and NWS predictions would be the best, since they represent local experts on-the-ground making a prediction, whereas the other web-based services I presume are simply mass algorithms produced by a computer for each zip code. However, my assumptions about how the data is produced by each service may not be correct. Further, there may be differences in how each day's temperatures are differentiated. For example, in the past, WTHR used to cut-off it's daily low temperatures at midnight, but that has now changed, so that the low temperature prediction extends into the next morning. The lowest nightly temperature often is not until about 6-7 AM. In a technical sense, this is the "next day's" temperature low, however, intuitively, when we look at a low prediction, we are expecting the lowest temperature for a given "next night," not the "previous early morning." The prediction services do not specify when their cut-off time for low predictions are, however, they seem to be relatively consistent. However, this may impact how accurate my data collection indicates each service being. Further, there may be micro-geographic differences in where temperatures are collected that produce significantly different results--northern Indianapolis may be cooler than southern Indianapolis. The services do not specify the locations from where they provide their temperatures. Since the services do not all provide a "past-observations" feature, my decision was to only gauge past observations from the National Weather Service to compare all services. This clearly does not benefit the NWS predictions outcome in this study, given that they perform the 2nd-worst of all 5 services.

Saturday, November 18, 2017

My First Python Script: Mapping US Residents Killed by Law Enforcement-

I have been mapping with QGIS (and ARCGIS) for almost 10 years. This project represents my first Python script--a map created by linking Python and QGIS with PYQGIS. For a separate project I have been mapping the number of U.S. residents killed by law enforcement. The data comes from the Mapping Police Violence project, which documents where police have killed residents, linking their name, county, and other information to the original news story where the victim's story was told.

Along with the mapping with Python, I have also generated an R script, pulling data from a repository and running a regression analysis on a total of about 30 demographic, economic and political variables. The strongest, yet most concise model is

Killed ← log(Population) - Employment + Poverty

Killed: Number of residents killed by police per county from 2013-Oct 2017
Population: The total number of people living in the county in 2015, US Census
Employment: Percent of the county employed from 2011-2015, American Community Survey, US Census
Poverty: Percent of the county living in poverty from 2011-2015, American Community Survey, US Census

According to a pseudo-R^2, these three dependent variables explain about 77% of the variation in the data. The model, and all DVs were significant to p<0.0001. Since the IV was a count variable, literally counting the number of victims per county, I used a negative binomial model. The results indicate just a slightly elevated risk of being killed by police in counties with larger populations, a decreased risk when employment is high, and an increased risk when poverty is high.

The data from Mapping Police Violence, while an amazing resource, needed extensive cleaning. I found 60 counties that did not match the states that were identified, or where no counties were listed. I manually found the correct counties through a Google search. (Edit: I have been informed that all of these issues have been corrected on the original Web site)

Saturday, October 14, 2017

Mapping Trump's Drop in Approval Rating by State

From January to September, Trump's approval ratings have dropped nationally. But more, they have dropped in every single state. Using data from this week's Morning Consult, which interviewed almost half a million registered voters, the state-level drop for Trump ranges from 11% (Louisiana) to 31% (Illinois). The map below was generated from first taking his approval ratings minus his disapproval ratings for January and September, then subtracting the January rating from September. This is shown in the first map.

In contrast, the second map shows that there are many states where Trump's approval is still above water. This ranges from Hawaii at a low (-33%) to Wyoming (-26%). Twenty-seven states have Trump underwater, and 23 above. Both of these maps were created using Quantum GIS.

Thursday, June 15, 2017

Two-Parent "Nuclear" Family Households in US Slavery

In 2011, Rick Santorum and Michelle Bachmann, along with many other GOP candidates, signed a "Marriage Vow" pledge that initially claimed that US slave children were largely raised in two-parent homes. In the article linked in this NPR/New Republic story by John McWhorter, he defends this claim. This claim recently reappeared in a student-faculty dispute, where the sociology professor's contract was not renewed, arguably due to posting threatening personal statements about the student on social media.

The professor in question apparently used a test-bank from a well-known marriage and family textbook, written by a well-established sociology researcher, Andrew Cherlin. While I could not find an online copy of Cherlin's textbook to quote, he has a trade book that makes the following claim: "Most slave children, [Gutman] contends, grew up in two-parent families" (2006, p102). As can be found elsewhere, for example on the blog of another marriage and family scholar, Philip Cohen, the test bank contained a question that asked about the structure of US slave families, and the 'correct' answer was "Most slave families were headed by two parents." The student objected to this answer, believing it to be untrue, and even created a presentation arguing her case that both the textbook and her professor were relying on outdated research (Cohen, above, quotes his own textbook, where he seems to affirm Cherlin's position, saying, about slave families "most children lived with both parents").

While I am not a marriage & family scholar, my reading of the recent peer-reviewed literature on this subject seems to confirm the student's position, contrary to Cherlin's claim in his textbook and the publisher's test-bank. What he refers to above has been deemed the "Gutman Thesis," since this revisionist perspective on slave families seems to come from Herbert Gutman's research (1976), and was supported by subsequent research by other scholars in the 1970s & 80s (Genovese, Blassingame, Jones, White, as referenced in Stevenson (1996)). However, by the 1990s, Gutman's ideas started to be challenged by a number of slavery scholars, and this more recent trend is not apparent in Cherlin's work, from what I can find.

An outline of this transition away from Gutman can be seen as early as 1995 in Stevenson's "Black Family Structure in Colonial and Antebellum Virginia: Amending the Revisionist Perspective." By this point, data had already appeared which contradicted Gutman, and a consensus had begun to form that the pre-Gutman picture of slave families was better supported, specifically, the position that since slave-owners did not recognize the legality of slave families, and typically did not even believe that Blacks could form emotional family bonds, there was no incentive to protect slave families. Rather, there was every incentive to break up slave families when it was profitable to sell fathers to other slave owners, or as soon as the children were considered old enough to be removed from their mothers care, often around 10-years old.

Cracks started to form early around Gutman's framing of slave families. A 1980 dissertation by Crawford (Quantified Memory), in an examination of narratives from interviews of ex-slaves, only 51% recalled being born into "two-parent, consolidated family households" (quoted in Malone, Sweet Chariot, 1992). Crawford finds that in smaller plantations (15 or fewer slaves), single-parent homes were closer to 75% of families. (Malone goes on to cite Fogle, who apparently had misrepresented Crawford's work, claiming far more two-parent families under slavery).

Similarly, in 1986, Kulikoff's, Tobacco and Slaves makes the following claims based on his research:

  • 357-59 "Because spouses of African-born slaves were usually separated, African mothers reared their Afro-American children with little help from their husbands. ... First, planters kept women and their small children together but did not keep husbands and teenage children with their immediate family."
  • 369 "Nearly half of all the Afro-Americans owned by four large planters resided in households that included both parents and at least some of their children. More than half of the young children on all four plantations lived with both parents."
  • 371 "Only 18 percent of the blacks on small units in Prince George’s County in 1776 lived in two-parent households. … More than two-fifths of the youths ten to fourteen years of age lived away from parents and siblings. … Although slave fathers played a major role in rearing their children on large units, they were rarely present on smaller farms. … Children under ten years almost always lived with their mothers, and more than half on large plantations lived with both parents.. Between ten and fourteen years of age, large numbers of children left their parents’ homes."
  • 373 "The fact that about 54 percent of all slaves in single-slave households in Prince George’s in 1776 were between seven and fifteen years of age suggests that children of those ages were typically forced to leave home. Young blacks were most frequently forced from large plantations to smaller farms."
These statements are supported by multiple data tables. Two different tables support Crawford's claim above, that large plantations had more stable two-parent families, hovering around 45%, while smaller plantations had far lower rates of two-parent households, only around 18%. Additionally, reiterating what was quoted above, the tables show that there were children who lived on their own in households, especially children 10-14 years of age, where 15% on large plantations lived in households with no other family members present.

These findings are supported by Malone, 1992, in Sweet Chariot. She explicitly contradicts Gutman, et al's, research, saying

  • 254 "Revisionist literature of the 1970s and 1980s on the slave family projected the supposition now frequently reflected in college textbooks that most slaves in the United States lived in families consisting of both parents with their children. This idea developed in part from a misreading of slave family-household studied that analyzed only the composition of simple families rather than that of the entire community. If one looks solely at simple families instead of the households making up the entire social body, then a majority of those slaves in families did live in two-parent nuclear units under normal circumstances. But such an approach obscures the fact that at many points in their lives slaves were not part of a standard nuclear family but functioned as solitaires or as a member of other household types. It also fails to perceive the holistic nature of slave society"

Stevenson (1996), in Life in Black and White, continues the attack on the Gutman Thesis with more contemporary data:

  • 161 "Even when the physical basis for a nuclear family among slaves—the presence of a husband, wife, and their children—existed, as it did for a significant minority, this type of family did not function as it did for free people, whether blacks in precolonial Africa or whites in the American South. Slave family life, in particular, differed radically from those of local whites of every ethnicity or class. … Virginia law, for example, did not recognize, promote, or protect the nuclear slave family or slave patriarchy. In fact, the only legal guideline for slave families did much to undermine these concepts …. Since ‘husbands’ had no legal claim to their families, they could not legitimately command their economic resources or offer them protection from abuse or exploitation."
  • 208 "Despite scholarly speculation to the contrary, even the largest local slaveholdings often did not translate into monogamous couples or nuclear core families who resided together on a daily basis."
  • 212 Regarding Stevenson's research on various slave-owning families in Virginia in the late 1700s-early 1800s, including George Washington, "as many as 74 percent of those slave families with children did not have fathers present on a daily basis—46 percent of the slave mothers had abroad husbands, while 28 percent were ‘single’ or had no identifiable spouses. ... 71 percent of his slave mothers lived with their children, but had no husband present."

Continuing to build on this research, Burke (2010), in On Slavery's Border: Missouri's Small Slaveholding Households, 1815-1865, the following conclusion is drawn from this state sample:

  • 226 "In the five counties studied, only 28 percent of slaveholdings in 1850 and 27 percent in 1860 comprised just slave women or slave women and children alone. In fact, a full 47 percent of slaveholdings in 1850 and 50 percent of slaveholdings in 1860 had in residence both male and female adults. … a great number of abroad and single women resided in slaveholdings with slave men who were not their husbands. Many of the men living with abroad women may have been related in ways other than marriage; they were fathers, brothers, and sons."

Finally, in a current textbook, Cole (2016), Race and Family: A Structural Approach, summarizes many of the researchers quotes above, saying,

  • 189-90 "[Kulikoff] in one account of interviews with former slaves, 82 percent mentioned their mothers' presence in their childhood, but only 42 percent recalled consistent contact with their fathers. Hence, the master, not the father, was frequently viewed as the provider for the family. ... 47 percent of families on large plantations were nuclear, as opposed to only 18 percent on small plantations. Crawford (1980) reported that single-parent families were 50 percent more prevalent on plantations with 15 or fewer slaves."

The most current research seems to contradict the claims of those who argue that slave families were largely composed of two-parent households. But beyond the specific data claims, there is also a question as to why the original claim was made. While, on the one hand, it may simply be a case of "following the data where it leads," an important principle of scientific research, the persistence of these claims, particularly the removing of these claims from their original research contexts, seem to derive from a perspective of wanting to minimize the devastation of slavery, and in doing so, would seem to lend credibility to those in society who want to argue positions that facilitate racist beliefs and claims.

Monday, June 12, 2017

Chrome: Disabling Autoplay Video/Audio

Chrome, a Google product, which used to have may user-friendly features, has progressively removed those features. One example is their removal of the feature to disable videos that automatically play when you open certain web sites. It was a broad feature that turned off plug-ins until you manually click on them to make them play. Several other browsers still have this feature, and a perusal of the mass number of complaints about this from an internet search, show that Chrome users are simply switching to those other browsers, such as Firefox (for Windows) and Safari (for Apple/Mac).

I HATE ads in general, particularly overlays & popups that prevent you from seeing a web page until you click on something to get rid of it, and any video/audio that automatically plays. As a former IT professional, forcing users to click on a popup/overlay to get rid of it is a GREAT way to download malware to computers, which is why I continually had to remind the staff in the departments where I tried to keep their computers safe, to NEVER EVER EVER click on any pop-ups, no matter how benign they seemed. I follow my own advice--if I can't "escape" out of a pop-up/overlay, I simply leave that site permanently. It's a horrible web design practice. I have 5 different ad/pop-up/overlay blockers installed as extensions to prevent these issues, and together they tend to work pretty well:

  • ScriptSafe
  • BehindTheOverlay
  • AdBlock
  • Adblock Plus
  • Ads Blocker for Facebook
But another pernicious issue is the videos/audio that automatically plays when you open web sites. The volume is usually incredibly loud, and frequently scares both me and my dogs. It's also obnoxious because I'm usually listening to music or TV, and sometimes miss whatever is happening on my show or the news. After having to reinstall Windows from scratch due to a crash, I had to find and reinstall all of the programs that prevented autoplay--none of the internet searches were helpful. For example, most sites claimed "Disable HTML5 Autoplay" stop those videos. It stops maybe half of them. Some sites, like CNN and USA Today, try multiple times in multiple ways to force you to watch videos--you can watch the web site continually try to reload as it fights with all of the various extensions designed to prevent video, popups and overlays.

Another approach is to disable Flash and Javascript. This tends to work, combined with Disable HTML5 Autoplay, but sites like CNN subvert this by using javascript for everything--for example, when I completely turned off javascript, the main CNN page would not load anything at all. So I successfully turned of the autoplay videos, but I also couldn't use the site since the text news or even headlines wouldn't load either.

SOLUTION: Unfortunately, there were no silver bullet extensions that successfully turned off all autoplay videos, and no "settings" in the most recent version of Chrome that allow you to turn off autoplay. The only way I was able to solve this problem was to uninstall Chrome and find an older version, pre-55. I found Chrome 54 from SlimJet.com. I did the following, all from Settings --> Advanced Settings --> Privacy --> Content Settings:

  1. Check "Block third-party cookies and site data"
  2. Javascript: "Do not allow any site to run JavaScript" (an option appears in the URL tab of each site that you can mark as exceptions to this rule--it's incredibly handy)
  3. Handlers: "Do not allow any site to handle protocols"
  4. Plugins: "Let me choose when to run plugin content"
  5. Pop-ups: "Do not allow any site to show pop-ups"
  6. Location: "Do not allow any site to track your physical location"
  7. Notifications: "Do not allow any site to show notifications"
  8. Unsandboxed plugin access: "Do not allow any sites to use a plugin to access your computer"
  9. You get the idea--I basically turned off absolutely everything in the "Privacy" section

Wednesday, May 10, 2017

Windows 10 Updates Nag Screen Hijack

Windows has always loved to hijack your computer because Gates wants to micromanage your life. Previous versions would automatically update your computer, whatever you were doing, no matter how important it was (such as being in the middle of a class lecture, thus interrupting all 60 students for a 30-minute update, thus forcing you to cancel your lecture). However, you used to be able to change some well-hidden settings and turn off automatic updates, so that you could update your computer when it was convenient. Not so with Windows 10. There is no way to actually just completely turn them off. I have googled many times, and tried all the tricks to get this turned off, and while my automatic updates are *mostly* turned off, I still occasionally get an overlay screen that covers my desktop that won't let me do anything else until I click my acknowledgement that I must restart my computer. I'm not forced to restart my computer right then, but the overlay lockout keeps happening until I do. This is unacceptable.

The only thing I've found that works is to disable the permissions on the files with variations of the name "MusNotification." Here is a screenshot of a computer search (using Locate) to find all instances of these files. Every couple of months I have to redo this, since Microsoft seems to realize the original files have been disabled, so finds someplace new to put them, and it starts over.

The procedure I used to do this was to right click each file to open "Properties" then "Security" then "Advanced." First, I took ownership away from "TrustedInstaller" and gave ownership to "Users." Sometimes that still doesn't allow me access to change permissions, because the file is inheriting the permission from elsewhere, requiring me to click "disable inheritance." Once that is done, I have to go into each of the Permission entries, and "Deny" full control to all of them: TrustedInstaller, System, All Application Packages & All Restricted Application Packages. I also changed Administrator and Users to "Read only" (not Read & Execute). This is tedious, since you have to do this for all of the MUS Notification & MusNotificationUX instances, and the list grows every couple of months.

Other things I've tried, some of which partially help, are the following

  1. Go into "Services" and "Windows Update"--I can "disable" this option, but then I can't manually update the computer--I have to go back in and re-enable this service for me to update my computer. This is not acceptable.
  2. Metered connection--there is a procedure to change the wireless properties, to change how my computer monitors my wireless connection, seeing it as a "metered connection," and thus doesn't automatically download updates. However, this ended up messing with My Outlook e-mail, so couldn't use this.
  3. Going into Group Policy (some versions of Windows 10 don't have this), and upating ConfigAutoUpdate but this didn't seem to work for me--others had better luck.

Some more information about this is at Superuser.com

Saturday, May 6, 2017

House ACHA Vote, May 4--GOP Win %

On May 4th, the House finally passed a version of the Obamacare repeal with the slimmest of margins--they needed 216 votes, they got 217. All Democrats voted against it, predictably, but 20 Republicans also voted against it. I was interested to see the 2016 election results in their districts.

The GOP House win % ranges for those who voted against the AHCA, were from 48.5% (TX-23) to 71.3% (KY-04). In TX-23, Willie Hurd won by a hair--the Libertarian got almost 5%, so the Democrat was almost able to win the district. Clinton won that district by over 3%. The next lowest GOP win was Coffman, CO-06, with 50.2%, who easily beat the Democrat since the Libertarian took 5%. However, Clinton & Obama won this district the last 3 elections.

On average, the House Republicans who voted against the AHCA won their district with a 59.4% margin, and Clinton's win margin was 45.4%. Nationally, Republicans who won their districts got 63.4%, and Clinton lost those districts at 38.4%. So while the GOP House members who voted against the AHCA had tighter win margins than other Republicans, it was not as close as I expected.

Another possible explanation seems to be the number of people that would lose healthcare coverage from the AHCA. The Center for American Progress crunched the CBO estimates for the original AHCA proposal (which failed), estimating how many people in each congressional district would lose coverage.

In fact, this result is the opposite of what I predicted. Of the 20 Republicans that voted against the AHCA, on average, about 49,580 citizens would lose coverage in those districts, compared to an estimated 53,413 who would lose coverage in all of the other GOP-held districts. In Democrat held districts, that number jumps to 57,317 coverage loss per district.

The CAP even breaks down those who would lose coverage by group: non-elderly (adults, children, disabled & Medicaid expansion), Medicaid-elderly, Employer-sponsored, and "Exchanges & Other Coverage." There were slightly more elderly covered under Medicaid who would lose coverage in the districts of the GOP who voted against the AHCA (1,880) versus the rest of the GOP-held districts (1,842), but this difference seems negligible. I found similar results for the "Employee-sponsored" losses (17,635 vs 16,240) and the "Exchanges & Other Coverage" losses (7,150 vs 6,044).

Either there is a problem with the data, or the Republicans in these districts had other reasons for voting against the AHCA.

Sunday, April 9, 2017

Filibuster by Senate Minority Party

The filibuster's history is interesting to visualize. The graph below shows three variables, all along a timeline that extends back to 1917: the number of filibusters (cloture motions filed), the President's party, and the Senate minority party. Recently the GOP Senate "went nuclear" by getting rid of the filibuster for Supreme Court nominees. In addition to the standard rhetoric of obstructionism by the Senate party in charge when the minority party resists, the GOP's defense of this maneuver also involved blaming Democrats for "starting it," in the first case with the Reagan SCOTUS nominee, Robert Bork, in 1987, and then later when Democrats got rid of the filibuster for federal judges in November 2013. There is a false equivalence in both of these arguments.

For the case of Bork, the GOP argument would only be valid if Democrats obstructed all of Reagan's nominees and if the obstruction was long party lines. However, neither of those are the case. Previously, Reagan's nomination of Sandra Day O'Connor passed the Senate with a 99-0 vote, his elevation of William Rehnquist passed 65-33, his nomination of Antonin Scalia passed 98-0, and finally Anthony Kennedy passed with 98-0. The vote for Bork was not rejected along party lines--6 Republicans voted against him, and 2 Democrats voted for him.

To the issue of Democrats getting rid of the federal judge filibuster at the end of 2013, the data shows that this was the result of an exploding number of filibusters by Republicans. The historical record shows that Republicans blocked the Obama agenda at every level of government, including their obstructionism to approve federal judges and other appointees. This pattern began shortly after Obama took office in 2009, continued throughout his presidency, of course culminating in the GOP refusal to allow Obama's SCOTUS pick of Merrick Garland to even come up for a vote after the death of Antonin Scalia.

All of this behavior was literally unprecedented. As the graph below shows, for most of the 1900s, the use of the filibuster was relatively low--typically less than 10 cloture motions filed per year, and many years there were none filed. This began to change in the 1970s. In 1975, the Senate made it easier for cloture to be filed--reducing the percentage from 2/3 (67 votes) to 3/5 (60 votes). The 1960s was a tumultuous era, the peak of the Civil Rights movement--the higher threshold for filibuster likely kept them to a minimum, but even still, there were very few filibusters during that decade.

In the late 1960s, Nixon & Reagan's "Southern Strategy" successfully changed the way that the public voted for Democrats and Republicans--the South transitioned from a Democrat stronghold to a Republican stronghold. Therefore, on the graph, I begin coloring the cloture motion bars blue (Democrat) or red (GOP) in 1968, but do not assign party prior to that (the 2001-2002 Senate is uncolored--control went back and forth due to a Senate tie). Since it is the minority party that filibusters, the color of the bars represent the minority party (ie, the 1969-1970 Senate was Democrat controlled--Republicans were therefore the minority, so the bar is colored red). The background color represents the party of the president (blue for Democrat, red for Republican). There s a gradual increase in filibusters from 1970s-early 2000s. The big jump first occurs in the late 2000s when Democrats were in control of the Senate at the end of the Bush years--conservative Republicans were filibustering Democratic Senate legislation, as well as opposing much of their own president's (Bush) agenda, such as the expansion of Medicare and immigration.

This significantly changed once Obama was elected. As the graph shows, Republicans as the minority in the Senate filibustered twice as much as had ever been seen before (2013-2014), eventually leading to Democrats getting rid of the filibuster for the one issue of allowing federal judges to be approved.

Tuesday, February 28, 2017

The Rise and Fall of Milo

While I am loathe to bring more attention to Milo Yiannopoulos, I will do so simply as a way to blog something for the month. I noticed how quickly his name entered my consciousness, and how quickly it left. Surely he is far more important than I recognize, since I hadn't heard of him before the incident with Leslie Jones, which caused Twitter to ban him. I won't recapitulate his long history of targeting and abusing vulnerable populations. What is interesting is how a Google image of his "trending" has but 3 peaks, and otherwise is fairly flat-lined.

One notices the first small peak that occurs in July 2016. This is when Twitter banned him for hate speech and inciting violence. The second peak in early February 2017, is when he cancelled a Berkeley, CA paid speaking engagement at the university because he apparently became frightened by some snowflakes. Shortly thereafter he appeared on the Bill Maher debacle, where he caught the attention of a Canadian teenager who disapproved of the provocateur's earlier history of supporting pedophiles, and she decided to take down the Breitbart editor by reminding the public about those prior comments (which were always available on the interwebs, were one inclined to do some googling). Milo's sugar-daddies (Breitbart, CPAC, and Simon & Schuster) decided he was no longer able to please them, so they cut him loose.

Milo, not satisfied with a graceful bow-out, decided to very publicly become what he had preached against to win his earlier acclaim--a "victim" of circumstances beyond his control, psychically injured by abusive people. While I can sympathize with his situation, what will certainly take years of concerted effort from which to heal, I cannot now suddenly accept that his years of abuse of other people is justified by claiming "victims become victimizers." The irony is so delicious. His fame and supporters loved the fact that he lashed out at people who were weak and oppressed by a patriarchal, heterosexist, racist system. The stigmatization that comes from not falling into line within this system is brutal, not only psychologically, but physically and materially. Yiannopoulos intentionally pretended none of this damaging reality existed. Now that he has been cut out by the gatekeepers of this system, he claims the system is victimizing him and has begun shifting the blame elsewhere. He falls into the camp of those who love and vigorously support a system from which they benefit, but which creates fundamental and profound inequalities for others--until he no longer benefits, he himself has become the 'other.' Now he suddenly finds flaws with the system and wants some support. Granting such at this stage would seem to make his life far too easy compared to the evil he has perpetrated.

Sunday, February 5, 2017

Median Family Income by County: 1950-2012

I was interested in geographical economic capital flows in the U.S., but had no idea how to search for it, since Googling for "United States," "geography" and "capital" just kept giving me the state capitols. So with some inspiration from some FB friends, I grabbed some Census data, spent 2 days hand-transcribing some old PDFs from the pre-computer era, and created this GIF. It's median family income by county from 1950-2012. From what I can tell, there is no county-level data prior to this. I could be wrong, feel free to correct me. The red are the poorest areas that year, and the green are the richest areas.

I used QGIS to make the maps, Excel to manipulate the data, OpenOffice to bring the Excel data into a format QGIS could use, and Paint Shop Pro, Animation Shop to make the GIF.

Saturday, October 15, 2016

Presidential Polling Since the "Trump-Tapes"

This is a quick data note. Since the release of the vulgar "Trump Tapes" from 2005, there has largely been a perception that Trump is out of the race, with unsubstantiated (and subsequently refuted) rumors of Pence dropping off of the ticket. Indeed, he has been abandoned by even more Republicans (many "establishment" Republicans had already refused to support him prior to this). However, current polling doesn't necessarily indicate a voter groundswell different from the weeks prior to the release of the tapes.

I've been struggling with how to create an easy-to-read graph of the polling of these changes. On the one hand, the polling has been reasonably consistent that Clinton has a far better chance of winning the electoral vote than Trump, even prior to the release of the tapes. As of October 3rd, Nate Silver's forecast gave Clinton a 72% chance of winning. The tapes came out Friday, Oct 7th, and post-release polling largely wouldn't have been released until either Monday or Tuesday. To be safe, I looked at polling released starting on Tuesday, Oct 11th. The graph I created shows 4 time periods-the first is the Romney-Obama election win margins of 2012, the second is 2016 polling up through Sept 28 (the first debate), the third time-period is from Sept 29-October 10, and the last is polling since Oct 11th. Negative values (below the 0 mark) represents a lead for Republicans, and positive values (above 0) represent a lead for Democrats.

Comparing the Obama-Romney win to pre-October polling, Utah, Texas and Arizona are the most obvious "Republican" states who shifted the most. This was up to and including results from the first presidential debate, and all show fairly radical shifts towards Clinton. Some other states, like North Carolina, Virginia, Michigan and Maine also showed some movement, with NC & VA showing movement in Clinton's direction, while MI and ME showing moves towards Trump. Somewhat more surprisingly, the Utah, Texas and Arizona shifts seem to be holding as of October 15th polling, even shifting further towards Clinton--AZ continues to poll (narrowly) for Clinton, and the NC polling for Clinton is even stronger. Whereas Romney won TX with a 38% margin, Trump has been polling with only a single-digit, decreasing lead. Other states, like Michigan and Maine, are firmly now in the Clinton camp.

However, the Trump tapes so far are not showing a large impact in these specific states. The graph only includes states where polling has been done since Oct 11th. While there does seem to be some movement in Clinton's direction in several states, the change is slight, and within the margin of error in every case except Michigan. In fact, Florida, Pennsylvania & Wisconsin evidenced a shift in Trump's favor from the two-week period after the first debate, but before the release of the Trump tapes, to the week following the release of the tapes (these shifts are also within the margin of error, but at least 2% in Trump's favor--certainly not the expected shift towards Clinton following the tape release).

Tuesday, September 6, 2016

WaPo-SurveyMonkey All-State Polling for President

The Washington Post teamed up with SurveyMonkey to do polling of all 50 states on the presidential race, and they released the results today. The table to the right shows those results in the column labelled, "2016: WAPO, Sept 6." You can compare this with the 2012 actual state-level results between Romney and Obama, and in the final column, a summary of polling average so far this year (since July). There are very few surprises, although there are some, which I have highlighted. For 36 states, earlier polling (where it's been done) matches WaPo polling, which matches the 2012 election results. There are 14 states where we see some differences. The table is sorted by the WaPo results--Clinton's highest leads are on top, going all the way to her biggest losses at the bottom. Regardless, the 2012 results and the WaPo polls are fairly close, with a correlation of r=+0.92.

First, there are some differences between the 2012 race and the WaPo polling, which I have highlighted in yellow. Not all of these differences are Dems vs GOP, but rather, differences is amount. For example, Rhode Island has no surprises in their choice--Democrats. However, in 2012 Obama won Rhode Island by 27 points, and currently, Hillary is polling at a 10 point lead over Trump. Similarly, in 2012, Romney won Utah by a whopping 47%, but Trump only has an 11 point lead in this WaPo poll. Earlier polling showed his lead at 20 points.

Second, depending on the quality of this WaPo poll, it shows a reversal in a few states, like Mississippi, Arizona, & Iowa. On the one hand, Clinton's 1 point lead in Arizona is within the margin of error, so somewhat meaningless. But on the other hand, Romney won Arizona by 9 points in 2012. Other polling shows a similarly tight race as the WaPo poll, showing Trump with just a 1 point lead. What is more curious is Mississippi, where WaPo shows Clinton with a 3 point lead, and Iowa, which shows Trump with a 4 point lead. Both of these would represent reversals from 2012. I can perhaps buy the Iowa switch--I'm far more skeptical about a Mississippi switch to Democrats. However, earlier polling has already shown Trump and Clinton tied in Mississippi--in 2012, Romney won here by 11 points. Perhaps this GOP stronghold is turning purple?!

Third, there are also a few significant differences in today's WaPo results and earlier polling, although only one is a "switch," Ohio--earlier polling gave Clinton an average of a 4 point lead, while today's WaPo results give Trump a 3 point lead. These results would be within the margin of error, so the differences are largely uninteresting, but similarly, unhelpful in predicting a winner, other than to say, "it's likely to be close." Two states, Colorado and Wisconsin, had Clinton with a 10 point lead in earlier polls, but today's results give her only a 2 point lead. The latter results puts it in the margin of error, so could be a significantly tightening rate there.

Just for funzies, let's use the WaPo results as a blueprint, and see what it would produce in terms of an electoral result (neither Texas nor DC were in this poll--for the sake of argument, let's give Texas to Trump, and DC to Clinton--polling averages give Trump an 8 point lead in Texas). First, if we use it "as is," ignoring margin of error, and leaving out Georgia and North Carolina, where polling has a dead heat (0), Clinton gets 325 electoral votes, Trump with 182--a landslide for Clinton. Second, let's only use states where candidates have a 5 point or more lead--That gives Clinton 224, and Trump 158. At the 5-point cutoff, Clinton doesn't garner enough electoral votes to reach the required 270, although, with a 66 point advantage, we still have a reasonably likely Clinton win. In this scenario, she only needs a couple of the states that Obama won in 2012, like Florida+Pennsylvania. Trump's path is far more difficult--he would need to win most of these 11 states. For example, if he lost both Florida and Pennsylvania, he only gets to 265 electoral votes. Or, if we combine a 2012 Obama with with earlier pro-Clinton polling, say, if Trump loses Ohio, Michigan, Wisconsin, and Colorado, Clinton wins. In all, a Trump win is still a statistical possibility, but the path forward for Clinton continues to be far more mathematically obvious.

Saturday, September 3, 2016

US Senate Race 2016

So far my 2016 political analysis has been about the presidential race. Now that all of the Senate primaries are over (except Louisiana--they have that weird "jungle primary," which isn't until election day), it's time to see how the field is lining up. While most Senate primaries were significantly earlier in the year, of the 34 seats up for grabs, 9 weren't until last month, and 2 of those (Arizona & Florida), weren't until August 30th.

Currently the Senate is in Republican hands, with a margin of 54 to 46 (technically, there are two Independents, although both of those caucus with Democrats: Bernie Sanders & Angus King). However, that lead will almost certainly shrink after the November election. Regardless of any purported "drag" effect from Trump, Republicans in this election are defending 24 seats, while Democrats are defending just 10. Historically, it's harder to defend this many seats without losing more than you gain.

This year, Democrats merely have to take 5 seats from Republicans, while holding onto all of their current seats, to gain a slim majority, and so far they have a good chance of doing just that, according to Larry Sabato, political scientist at University of Virginia, who runs the UVA's Center for Politics. His analysis shows that all of the Democrat seats are safe, except for Nevada, where Senate Minority Leader, Harry Reid, is retiring, and that race is currently too close to call. In contrast, of the 24 Republican Senators up for re-election, Sabato says that 2 are "likely" to lose to Democrats (Wisconsin & Illinois), while 3 are "leaning" Democratic (Indiana, New Hampshire & Pennsylvania). While the Indiana race would not have originally been very competitive, with the entry into the race of Evan Bayh, a popular former Indiana Governor, the race is now polling significantly in his favor--the most recent polls have him up an average of 17% over his challenger.

In fact, depending on who wins the presidency, and it strongly looks like it will be Clinton, the Democrats really just have to win 4 seats, since that would give them a tie, and Vice President Kaine would presumably break any ties in favor of Democrats. That means that even if the Nevada Senate seat goes to a Republican, but the rest of the seats go in the direction of current polling, Democrats will technically control the Senate. In any case Democratic hold over the Senate would be tenuous, with either a 50-50 tie, or 51-49 as the most likely scenario.

August polling seems to support Sabato's assessment. The table shows all of the state-level polling since August 10th. Red indicates Republican, blue indicates Democrat. The 3rd column, "Curr," indicates which party currently holds that seat. The bottom section of the table are the seats that Sabato calls "safe." Indeed, most pollsters haven't even bothered to survey these states. The few that have (Colorado, New York, South Carolina & Utah) show that these seats should remain safely in the hands of the current party. The middle section of the table are those seats that Sabato says are "likely" to go to a given party. Polling results, on the right-hand side of the table, are clearly smaller margins than the "safe" seats, but the designation of "likely" also seems fair for both Wisconsin and Iowa, the only two states in the "likely" category for which polling exists since August 10th.

The top section of the table is where Sabato calls the seats "leans," plus the one toss-up, Nevada. The polling in yellow are results that were obtained prior to those state's primaries (designated in the middle column labelled "Prim")--in this case, just Arizona & Florida. While McCain's (AZ) early lead seemed quite large (13%), the only poll since the August 30th primary shows that race is currently tied.

The "lighter" red and blue are polling results that are between 3-5% for Republicans or Democrats, respectively, which I would consider "weak" leads, if at all, since these are likely within the margin of error. The green poll results are within 0-2%, or basically just a tie. My assessment of the polling tends to match Sabato's.

Current state-level polling, and the UVA's politics site both seem to indicate that Democrats have a good chance of taking the Senate in November, either with a tie, or at most, a 51-49 lead. What seems less likely, is the coincident situation that Reid's seat remains in Democrat hands, AND McCain's seat also falls to Democrats. But even then, the Democrat win margin would only be 52-48, a long way from a filibuster-proof majority. And either way, there are few analyses that are predicting a Democratic win for the House, undoubtedly leading to a bitter 2 years (if not 4) of Democratic & Republican wrangling for control of the federal budget & political system.