Wednesday, October 29, 2014

Soda, Telomeres, Aging, and Statistics

Anderson Cooper recently highlighted a pre-print analysis of "telomere length" and drinking sweetened carbonated beverages (soda, pop, or coke, in the vernacular) on his Ridiculist. He even includes an interview with neurosurgeon/health reporter Sanjay Gupta. I'm currently teaching a statistics course, so I'm always on the lookout for cutting-edge, peer-reviewed research that may have statistics at the appropriate level for my class, and data that would interest university undergraduates. I downloaded the paper and pulled out the Results table.

The researchers' hypothesis is framed with the theoretical belief that telomere length is related to aging. I won't address that issue in this blog post, except to say that it's a controversial (and, in my personal opinion, poorly defended) proposition. I will limit my comments just to the statistics presented in this research, and specifically, just to Table 3--spoiler alert, I wonder if the editors of AJPH were impaired when they let it into the journal.

In the table, we see Models 1 and 2, which they describe in the notes. Model 1 is just age, gender and energy (which I couldn't find that they define--perhaps it's simple daily caloric intake?), while Model 2 includes a mish-mash of "healthy habits," such as healthy eating, BMI, smoking and alcohol, as well as some extra socioeconomic demographics, such as race, education, poverty level, etc. They compare four drinks--carbonated sugar-sweetened, noncarbonated sugar-sweetened, diet, and 100% fruit juice. They provide the quartiles, the b (regression coefficient; similar to the "m" or "slope" in the dreaded high-school algebra linear equation "y=mx+b"), and the 95% confidence intervals.

My first clue that something is amiss is that there is not a consistent linearity in the quartiles (not to mention that they don't provide Q0, the minimum--we don't necessarily need the Q0, but then why provide Q4, the maximum if you're not going to provide the minimum? it's just a consistency issue that doesn't affect the analysis but makes the table feel unbalanced and sketchy to me). Their regression is self-described as linear. However, the quartiles themselves are decidedly NOT linear--at least not for anything except the noncarbonated sugar-sweetened beverages, and the combined sugar-sweetened beverages index. That is problematic for me.

Let's ignore the non-linearity question for a moment, and just look at their base analysis, which is comparing the median values of the four beverages with their published b coefficients and confidence intervals. Let's ignore Model 1, since it's just demographics. Once you control for the everyday behaviors of the people in their study (they use data from the NHANES), they only have one variable that they claim reaches the level of statistical significance: people who consume sugar-sweetened carbonated beverages apparently have shorter telomeres. While it would be nice to overstate the other Model 2 coefficients, such as claiming that people who drink fruit juice have longer telomeres, and people who drink diet drinks have neither positive nor negative impacts on telomeres, it's statistically inappropriate to make those claims, since neither of those measurements achieved statistical significance (p<0.05), therefore we can completely ignore them. So let's compare the two extremes, based solely on the published medians of telomere lengths: sugar-sweetened soda with 1.13 & 100% fruit juice (diet soda median is equal to fruit juice) with 1.08. On the surface it looks like those two numbers are "different"--clearly they are "different numbers," but that doesn't mean that in "reality" in the general population they are different, since this output is based on a sample, and therefore an estimate. That's what "statistical analysis" does by definition--creates reasonable estimates of the general population based on samples.

Let's assume the sample meets standard scientific guidelines of randomness, etc, so we just have to determine if 1.08 vs 1.13 translates into an "actual" difference when applied to a general population estimate. According to the p-value of the coefficients, it does indeed appear to be different, but we'll get to that later. For now let's stick to the medians. Notice the spread from Q1, Q2 (the median), Q3 to Q4. Since they aren't linear, I'm not quite sure what the Q2 value actually represents. As any of my undergraduate statistics students can tell you, one of the data assumptions that must be met before you can do a regression analysis is data linearity--this non-linearity of quartiles makes me suspicious. Putting that question aside, I wonder what if the differences between the quartiles are "actually" differences, or if the non-linearity indicates that these are merely natural variation and there actually are not regular increases in telomere lengths from Q0 to Q4? We can't know that based solely on this study, but to me personally, I don't see that assumption is met given this table. Let's then take a leap and say--what if the actual Q2 measurement is anomalous, and perhaps a better estimate of Q2 would be to take an average of Q1 vs. Q3? In that case, the "estimated" Q2 for sugar-sweetened carbonated beverages is not 1.13 at all, but [(1.04+1.09)/2] 1.065, which is shorter than the telomere length of the fruit juice telomeres. This is actually what the researchers, in fact, predict with their regression equation--shorter telomeres with sugar-sweetened carbonated beverages. But this isn't necessarily self-evident, since the actual median shows that soda/pop has longer telomeres than fruit juice. Based on the presented median lengths, one could interpret that drinking sugar-sweetened sodas are actually better for you than drinking fruit juice! Granted, I'm not going to make that claim--although there is significant evidence that drinking fruit juice is not much healthier, if at all, than drinking soda.

Finally, let's look at the b coefficients. I always make a point for my statistics students to ignore any published data that doesn't include confidence intervals, since hiding confidence intervals (by not publishing them) is a GREAT way to completely misrepresent a data analysis to your benefit. One can't interpret any parametric analysis (like a regression coefficient) without the confidence intervals. In this case, it seems that in Model 2, sugar-sweetened carbonated drinks shorten telomeres (b=-0.010) compared to fruit juice, which seem to actually lengthen telomeres (b=+0.016). BUT! Remember that these are statistical estimates, and not "real" numbers--the real numbers for soda-related telomere shortening are actually somewhere between -0.020 and -0.001, and we can't know "actually" where without a 95% confidence of making a Type 1 error (i.e., claiming this result is real, when it isn't). So in reality, the authors aren't claiming that the "actual" telomere shortening is exactly -0.010, but almost certainly somewhere between -0.020 and -0.001.

Similarly, the alleged telomere lengthening properties of fruit juice isn't "exactly" +0.016, but likely somewhere between 0.000 and +0.033. So for my Introduction to Statistics students, I have them look at the maximum confidence value of the lowest measurement, and the lowest confidence value of the highest measurement before making an assessment. This means that the lowest likely value of telomere lengthening of fruit juice is actually 0.000 (i.e., no change at all), vs. telomere shortening of soda is -0.001. While "statistically" if looking at the p-values, that appears to be a measurable difference. But in reality, personally, I would interpret that as not at all an important clinical difference (I won't get into "effect sizes" in this particular blog post, but since they don't post the standard deviations, we can't calculate those, which I would guess come out to be completely insignificant).

What makes this difference perhaps even less relevant, is if one looks at the decimal points and takes rounding into consideration. If the soda telomere maximum confidence level is b=-.001, potentially that could be -0.0005. Similarly, the minimum telomore shortening of b=0.000 could be -0.0005; in other words, they could be basically the same value, depending on how they rounded! On the one hand, I would have liked to have seen that one extra decimal place to rule out that possibility. On the other hand, one can simply eliminate a decimal place, and then the maximum soda level becomes b=0.00 and minimum fruit juice level would also be b=0.00. So, in actuality, I don't really need to see that extra decimal place at all--I don't think these results support their conclusions, going by this table alone.

No comments:

Post a Comment