Tuesday, April 30, 2013

Student Plagiarists Who Watch too Many Crime Dramas

Every year I discover a plagiarist in one of my classes. Half of those have clearly watched too many crime dramas, where the guy who 10 witnesses saw murder somebody in clear daylight, gets off on a technicality, just because his lawyer filed every possible appeal for every possible loophole, or because of behind-the-scenes politics, where the judge just doesn't like the prosecuting attorney, so lets the perpetrator walk. Apparently this belief in the combination of tenacity and luck as the road to success for any good capitalist, drives them to appeal my decision to give them a 0 on their assignment which they clearly copied word-for-word from another source. When I find plagiarism, I gather the evidence, present it to the student, tell them the consequence, and inform them of their rights: “if you want to appeal my decision you can go to the Dean’s office,” which is the basic procedure at all of universities where I have taught.

The best stories about my plagiarists come from the most extreme cases. In two instances, I had a student who, as it turned out, were dating a student who had taken my class during a previous semester, or from a different section of the course, and the plagiarist submitted papers identical to what the other student had submitted to me earlier. Unfortunately for one of these students, he might have got away with it, except he referenced a textbook from a previous semester that I had stopped using. An obvious give-away. In several instances, the students literally copy/pasted an entire Wikipedia article and submitted them as-is. In each of these cases, when I confronted the students with the evidence, they were shocked at my accusation, having no idea where I would get the idea they had copied somebody else’s work, despite the identical papers sitting in front of them—theirs and somebody else’s.

When I inform them of their right to appeal, I strongly urge them not to appeal the decision because as it stands, they are facing a 0, and usually they can pull out a passing grade. Sometimes it only drops them one letter grade. I explain that if they appeal, they enter the Dean’s radar as a cheater, and an official report is filed in their permanent record. Not to mention the fact that the Dean’s office reserves the right to take further action—such as expel the student. Here’s where the stories get frustrating, more so than the original plagiarism itself. The students, who watch too many crime dramas, “hear” my warning as, “this professor is afraid that the Dean’s office will overturn his decision, so if I appeal, not only will I get my grade back, but this professor will get into trouble, so if I just keep appealing, I’ll get off on some kind of technicality, or a brain-dead administrator who doesn’t like this professor.”

As one particular of these students later confessed to me, “how could I trust that you were looking out for my best interest, when you were giving me a zero!?” This confession occurred after they had appealed my decision, and the Dean confirmed to them, “Yes, now you have a permanent record of cheating on file, and if it happens again, you are expelled. You should have listened to your professor,” followed by the administrator's very stern lecture about the stupidity of copy/pasting a Wikipedia article, and asking the student how they had got this far in their college career?! I tend not to yell at students. Some Deans seem to relish it in certain instances--dealing with plagiarists is one of those times.

Senators Who Voted Against Gun Background Checks Face Public Blowback

The recent senate vote that killed common sense background checks for gun purchases has had major consequences for those voting against the bill. Those senators have seen approval ratings collapse--not only in the change from current approval to disapproval ratings, but the change is especially marked when comparing their prior approval ratings. Below are three tables that represent recent polling. The rows highlighted in green represent current polling, while the row below that represents the prior most recent polling (if available), and the number in the far right-hand column represents the change from previous polling to current.

StateVoted against background
checks for guns
ApprovalDisapprovalDifferenceChange since
prior poll
AZFlake3251- 19
AKMurkowski4641+5- 16
Prior rating5433+21
AKBegich4137+4- 6
Prior rating4939+10
OHPortman2634- 8-18
Prior rating3525+10
NVHeller4441+3- 2
Prior rating4742+5
NHAyotte4446- 2-15
Prior rating4835+13

On the other had, senator Toomey helped create the background checks bill. Senator McCain supported the bill, and is from the same state as senator Flake above. The most recent PPP question specifically asks about trust ratings comparing McCain to Flake.

StateVoted FOR background
checks for guns
ApprovalDisapprovalDifferenceChange since
prior poll
PAToomey4830+18 +7
Prior rating4332+11

Even a Fox News poll indicates collapsing support for those senators who voted against background checks for gun purchases: "Likely to support a politician who voted AGAINST expanding background checks"

More LikelyLess Likely

Apr 29 PPP poll: More backlash against Senators on gun vote
Atlantic Wire: How Jeff Flake Became the Most Unpopular Senator in America
Business Insider: We're Starting To See Some Very Real Ramifications From Senators' Votes On Gun Control

Monday, April 29, 2013

Indiana GOP Legislature Micro-Manipulates Marion County Political Structure

In case you weren't paying attention, the GOP controlled Indiana legislature reshaped the Marion County city-couny council with the passage of SB621. Currently we have 4 "at large" councilors. Originally, back in 1969, these seats were created to consolidate GOP hold on the city, when it was believed a simple by-district representation would not give them the majority, even though, by-population, the mayor would always be expected to be Republican. The process was quite heated, and very obviously a political move to consolidate power. 30 years later, the removal of these seats is again a subversive political move--all 4 at-large councilors are currently Democrat, as is the council majority, while the mayor is Republican. Removing the at-large councilors, however, will give the council back to Republican majority.

Presuming this was a political move, it was incredibly short-sighted, in terms of strategy. Large urban cities across the country, including the Midwest, trend towards Democrat. Indianapolis is no different. Removing these at-large seats will not consolidate Republican power, as they hope. Presuming trends hold in the future, and the current state of the GOP nationally gives no indication of a change, Indianapolis will continue to move Democratic, and SB621 will serve only as a reminder of manipulative political tricks by the GOP in Indiana now that they have a trifecta supermajority.


Friday, April 26, 2013

Testing for Data Normality in R

This tutorial uses the free, open-source statistics software R. It uses primarily command-line entry, not as simple as SPSS, but far more powerful.

For regression analysis, your univariate data (all of the continuous variables used, independent and dependent) has to be normal, equivalent to a bell-shaped, gaussian distribution. You can visualize the curve of this distribution with a histogram, verification of which can help you establish the normality of your data. Visualization with a Q-Q plot can also help verify normality. However, these "eye-ball" tests have never satisfied my empirical instincts, since what seems to be "normal enough" for one person, may not be normal to another.

While many in the social sciences still are not implementing, or at least reporting, these tests, without using normal data, the regression analysis may be invalid. Several tests exist for demonstrating data normality, with the follow-up procedures for non-normality being data transformation and/or removal of outliers. The Law of Large Numbers and the Central Limit Theorem provide some protection for your analysis if you have a large sample (n>30) even if your data is not normal, as do certain kinds of tests that are considered robust. However, it is good practice to try to get your data as normal as possible, since the math behind these types of analyses assume univariate data normality. The primary normality tests are as follows:

  1. Visual tests: Histogram and Q-Q plots
  2. Z-score tests for skew and kurtosis (Statistic/Standard Error for that Statistic)
  3. Specialized tests: Shapiro-Wilk, Kolmogorov-Smirnov, Jarque-Bera, etc.
  4. Filliben's test for Q-Q plot correlation
For the numerical tests, significance of the model typically requires your p-value > 0.05. This causes my students immeasurable confusion, since they have been taught in other classes, and in most instances of my class, that they should look for p < 0.05 to determine if a test is significant. The latter is for tests of means, such as ANOVA, or significance for their regression model and coefficient. In those cases, the null hypothesis is: the means are equal. If you are testing that one intervention is effective, and another one isn't, or that one group is different from another, you want to reject the null hypothesis, therefore, you are looking for p < 0.05. On the other hand, for data normality tests, the null hypothesis is: the data is normal, or the data is equivalent to a normal distribution. In this case, you want your data to be normal, since otherwise you cannot even proceed to the means test. You want to be able to "fail to reject the null hypothesis," i.e., you want p > 0.05.


Find the optimal transformation using Box-Cox procedure: generates the optimal λ (power)
(Note: The first version of this post recommended the command box.cox.powers, which has since been deprecated, so is no longer available.)
A handy way to obtain a good univariate lambda power value is from the AID package, that you will need to install.
The output is a best-guess lambda, plus the normality values for each of 7 common normality tests. Be certain to look at the plots that are generated. If the software can generate a reasonable lambda, then the plots will show a "peak" (either minimum or maximum). Sometimes a reasonable lambda cannot be determined within the -2 to 2 boundaries, and the plots will not peak for any of the normality tests, but will simply appear linear. In this case, it may be that the data cannot be transformed.

Generating a Box-Cox transformation: Generates [(xλ-1)/λ], where x is each value, and λ is a power
>bcPower(world$hiv, λ)

Produce histogram
Importing text files seems to create non-numeric data problems. Import from SPSS works OK, as does importing from the clipboard using x.num <- as.numeric(readClipboard())

Produce qq-plot

QQ-Plot Correlation Coefficient (Filliben, 1975)
>qqp <- qqnorm(hivt) [also produces a qq-plot, but less informative than qqPlot]
Interpretation for sample size at http://www.dm.unibo.it/~simoncin/QQCritVal.pdf

Remove missing data—some tests won’t work with missing data
>hivt <- na.omit(world$hiv)

Skewness and Kurtosis
Some argue that any skew and kurtosis < 1 represent normal data (assuming k-3, the value most software provides, since for normal distribution, unmodified k=3). Most use the Z-score of skew and kurtosis. If n < 200, then most argue that normal data is given by Z < 1.96, and if n>200, then Z < 2.58 likely represents normal data. For n > 2,000, these numbers often produce unreliable results. Z-Skew = [skew/standard error of skew], and Z-Kurtosis = [kurtosis/standard error of kurtosis].

In R, package e1071 produces skew and kurtosis, so will need to be installed and then activated. The default for these commands is "Type 3", which is the standard skewness formula adjusted for the sample size, called the "adjusted Fisher-Pearson standardized moment coefficient," modified by multiplying skew by: n/(n-1)(n-2). If you change this to "Type=1" then you generate the results from the standard formula, and if you change this to "Type=2" then you get the values produced by SPSS. You can also load the package "moments" to get a skew and kurtosis command that matches the hand-calculated values. Package e1071, Type=1, uses sample size = n-1, presuming you are using a "sample," while package Moments uses sample size = n, presuming you are using a population. Each of these methods used to calculate skew and kurtosis can produce different values, but which are relatively close to each other, and should be statistically insignificant.

Skewness and Z-skew
>skewness(hivt)/sqrt(6/n) [n=sample size]
or >skewness(hivt)/sqrt(6/length(hivt))

Kurtosis and Z-kurtosis
>kurtosis(hivt)/sqrt(24/n) [n=sample size]
or >kurtosis(hivt)/sqrt(24/length(hivt))

Specialized Numerical Tests: Empirical Distribution Functions
Kolmogorov-Smirnov test-Liliefors Correction (need to install library nortest)

Kolmogorov-Smirnov (uncorrected)

Shapiro-Wilk test (Royston correction)

Jarque-Bera test (need to install library tseries)

Robust Jarque-Bera test (need to install library lawstat)

Anderson-Darling test (from nortest)

Cramer-von Mises test (from nortest)

Shapiro-Francia test (from nortest)

Sunday, April 21, 2013

Comparing Between Regression Models: Aikaike Information Criterion (AIC)

In preparing for my final week of sociological statistics class, the textbook takes us to "nested regression models," which is simply a way of comparing various multiple regression models with one or more independent variables removed. In the example I'll be using in my class, we'll be looking at a dataset of Florida crime by county as a dependent variable, with the independent variables of urbanization, education, and average income. To evaluate the reliability of the independent variables to be able to predict crime rates, we can generate any of several regression equations, using all of the three variables, two of them, or just one.

Distinguishing the "best" equation is somewhat subjective, but statisticians have developed some criteria to evaluate whether one model is likely better than another. For my class we are using SPSS as our statistical software, since that's the licensed software on our campus (IUPUI). It's expensive, and even with our campus license, you have to "rent" it every semester you want to use it. I personally don't use it for my research, since, while it's a reasonable GUI option, there are many advanced functions that it just can't do, and its flexibility to alter parameters is limited. I use the free, open-source software R, which has a steep learning curve, since it is command-line, but far more powerful and flexible. I don't have my students to learn it because of the learning curve, and since most of them in their future careers will just want simple software that they can point-and-click to get reasonable results. They likely won't want to do any programming to get their results.

In my search for a way to allow my students compare "nested regression models" using SPSS, I spent a great deal of time Googling ways to get SPSS to generate AIC, and it just won't, except for logistic regression, or using the advanced Generalized Regression Models feature--both of which are great options, but the former is a specialized technique for probability outcomes, while the latter is not necessarily a good option for an introductory sociological stats class.

I found 5 ways to get SPSS to give me AIC, and I will teach the students 2 of those ways--one formula, and manually forcing SPSS to produce the regression AIC using syntax. I reproduce the 5 methods below, since there is no simple "checkbox" for regular linear regression in SPSS. Recognize that the linear regression method and the GZM (generalized linear regression) AIC produce different numbers. The absolute AIC number is not relevant, but only the difference in the AICs of different models--then choose the model that produces the smallest AIC.

In the equations below, n = sample size, k = number of parameters, SSE = sum of squares error (or residual sum of squares as listed in SPSS output)

  1. AIC formula #1 (same result as SPSS linear regression syntax)
    n*Ln(SSE/n) + 2*(k+1)

  2. AIC formula #2 (same result as SPSS GZM)
    2k + n [Ln(2(pi) SSE/n ) + 1]

  3. AIC formula #3 (same result as SPSS GZM)
    Requires you to obtain the log-likelihood, which in SPSS, you can only get using GZM (generalized linear model, see option #5 below)
    2*k – 2* loglikelihood

  4. SPSS method #1: Use linear regression syntax
    /CRITERIA=PIN(.05) POUT(.10)
    /DEPENDENT DependentVariable
    /METHOD=ENTER IndependentVariables separated by a space.

  5. SPSS method #2: Use GZM
    Click the following:
    Analyze --> Generalized Linear Models --> Generalized Linear Models
    Under the "Response" tab, put the outcome variable into the "Dependent Variable" box.
    Under the "Predictors" tab, put all continuous independent variables into the “Covariates” box--if you have any categorical predictors, those go into the "Factors" box.
    Under the "Models" tab, put all listed variables (independents) into the “model” box and make sure that as "Type" they are listed as "Main effects"

Saturday, April 20, 2013


Indystar paywall cookies: collective-media.net etweb.indystar.com forum.whatismyip.com indystar.com

Thursday, April 18, 2013

Senate Vote Kills Gun Safety Bill--Did they Vote their People's Wishes?

The Senate just voted down a proposal to create a system of universal background checks on gun purchases, thus closing the so-called "gun show loophole" and "straw purchases." On an issue where 85-90% of the public supports such a bill, did those senators vote the wishes of the people they were elected to support? The vote was highly partisan--in a 54-46 vote, 4 Democrats voted against the bill, while 39 Republicans voted against it. Those senators are listed below, by state.

Some have argued that while the high support for background checks represent national majorities, they do not represent the states from where these senators come. However, that is simply not the case--the high support for background checks has held across states, in those polls that list results by state. For example, this recent poll shows that even in states that went for Romney in the last election, there is still high support for background checks. Republican senators in those states still voted against the bill.

SenatorStatePopulation supporting background checks for gun purchases
Alexander TNNo data
Ayotte NH89%
Barrasso WYNo data
Baucus MT79%
Begich AK90%
Blunt MO85%
Boozman AR84%
Burr NC90%
Chambliss GA91%
Coats IN89%
Coburn OK87%
Cochran MSNo data
Corker TNNo data
Cornyn TXNo data
Crapo IDNo data
Cruz TXNo data
Enzi WYNo data
Fischer NENo data
Flake AZ90%
Graham SCNo data
Grassley IA88%
Hatch UT83%
Heitkamp ND94%
Heller NV86%
Hoeven ND94%
Inhofe OK87%
Isakson GA91%
Johanns NENo data
Johnson WINo data
Lee UT83%
McConnell KY82%
Moran KSNo data
Murkowski AK60%
Paul KY82%
Portman OH83%
Pryor AR84%
Reid NV86%
Risch IDNo data
Roberts KSNo data
Rubio FL91%
Scott SCNo data
Sessions ALNo data
Shelby ALNo data
Thune SDNo data
Vitter LA85%
Wicker MSNo data