Nice work from the Washington Post



It's easy to find misleading articles in the news. Today, we're going to focus on an article that does a nice job of explaining a complex, data-driven issue.

The article - "One of America’s healthiest trends has had a pretty unexpected side effect" — was published in the Washington Post. The basic premise is that, as smoking rates have declined, obesity has been on the rise. 

So, what does author Roberto Ferdman do well? 

  • He uses (and links to) established sources, including the Surgeon General, the Cleveland Clinic and the National Institutes of Health
  • He explains the methodology behind the research - including the fact that researchers had to approximate certain effects
  • Perhaps most importantly, he draws a distinction between correlation and causation. He quotes an obesity expert (Yoni Freedhoff) who said, "Obviously, it's hard to establish any causal relationship here, but I would definitely say it's plausible that the fall in smoking contributed to the rise in obesity."
  • Finally, he offers a takeaway that cautions people against reading too much into the study: "What exactly are we supposed to glean from the suggestion that the fall in smoking might have contributed to the rise in obesity? The answer is not that anyone should look back upon the days when more than half of the population smoked regularly with nostalgia. Rather, according to Baum, it's better to view the study's finding as more of a point of interest, a takeaway that allows us to look at how societal changes move like waves that ripple, touching other shifts, even if only slightly."

Read the full Washington Post article here

EVERYDATA meets Sherbit

The average person consumes 34 gigabytes worth of data each day. Some of that data comes from the apps and devices that you use, which are constantly collecting data about your everyday activities.

Recently, we discovered Sherbit - a quantified self app for managing and understanding your personal data. Sherbit claims to let you easily understand and analyze your information, as you can see in the sample chart above. It's an interesting and fun way to look at some of the data in your life, and use it to learn more about yourself.

As a smart consumer of data, here are a few things to consider as you look at data from apps like Sherbit: 

1. Where is your data coming from? If you're analyzing how many steps you take in a day (which is what the chart below shows), ask yourself - are the devices that track your steps accurate? Are they really tracking steps, or are they using a motion tracker or other technology to approximate the number of steps you take each day?

Driving time vs. steps

Driving time vs. steps

2. Are you looking at correlation or causation? This chart shows something interesting—the amount of driving time decreased as the number of steps increased. But does that mean that stopping the car caused you to take more steps? Does it mean taking more steps caused you to stop driving? Or is it just a correlation, rather than a causal relationship?  

3. Understand that predictions may only be as good as your prediction model. Let's say as your weight goes down, so does your heart rate. You might assume that your heart rate will keep going down as you lose weight. But will that relationship hold true forever, or will your heart rate stop going down at some point?

4. Make sure you're asking the right question to solve the problem. Imagine you want to lose weight. You might ask: "How many steps do I have to take to lose 1 pound each week?" Or perhaps, "What exercise is the most effective in losing weight?" Maybe even, "What food should I cut out of my diet in order to lose weight?" All of these are valid questions - but each one may lead to a different answer.

To learn more about Sherbit, visit their website.

"Required reading" says Arkansas Business


In a column that referenced John Oliver and Tyler Vigen (who we quote in our book), Arkansas Business editor Gwen Moritz argued that EVERYDATA "should be required reading in high school and for every journalist and journalism student in the universe."

Here are a few of our other favorite quotes:

"Using sentences and examples that even I can understand, Johnson and co-author Mike Gluck explain the way averages can be used and misused."

"They write about the problem with self-reported data... [and] warn us to consider whether important data might be missing."

"We can either be smart, skeptical consumers of data or suckers. Take your pick."

You can read  Gwen's full article here.

Tablet vs Paperback?

In our forthcoming book, Everydata, we very briefly address an interesting study from the University of Oregon that finds "People actually recall more information when they read a printed newspaper versus reading it online." Our purpose in raising the study was not to closely examine the underlying statistical methodology (though we might have something to say about the sample of 45 people) but to introduce the concept that how you receive your data can also effect how you interpret or retain it.

Read More

The Shaky Statistics on the Myth of the Holiday Weight Gain

Every year, we see a glut of stories in the media debunking the myth of the holiday weight gain.  The general narrative is one of disbelief that the average holiday weight gain is 5 to 10 pounds (see for example, this SF Gate story) and that the majority of weight gained by the average person in a year is gained during the period between Thanksgiving and Christmas.

Read More

We All Scream for Ice Cream.

In this simple yet creative data visualization posted by Randal Olson, the folks at dadavizmap key ingredients to various Ben and Jerry's Flavors.   From the everydata perspective, this is a beautiful illustration of the statistical concepts of PERMUTATIONS and COMBINATIONS-- reflecting the many ways different items can be combined.  This is much more intriguing than the way probability is usually taught--think of the classic example of flipping two coins--how many different combinations can you get of heads (H) and tails (T):  HH, TT, HT, TH.   Everydata frequently retweets Randal's data visualizations (@randal_olson) if you want to follow him on twitter.  Dadaviz (@dadaviz) is a fantastic site for visualization as well.

Some More Ice Cream:  My best buddy @probonodude asked a good follow up question about exactly how should one think about the implied probabilities in this chart.  

First, assuming this is the full universe of ingredients, chocolate is in the most Ben and Jerry's flavors.  You can visually see this by the fact that the brown color is most dominant in the picture.

Second, on to permutations.  There are 8 ingredients.  How many different flavors can we make from those 8 flavors:  If each "flavor" is only made using 1 ingredient, this is an easy problem: we end up with 8 flavors. But, if each "flavor" is now made combining any 2 ingredients, that yields 56 ice cream flavors. (8 for the first choice, and 7 for the second).  If each flavor can be made using 3 ingredients,  that yields 336 ice cream flavors.

Now, obviously with these 8 flavors, Ben and Jerry's isn't conducting a statistical exercise--so they aren't looking to create every possible permutation of 8 ingredients.  But perhaps we can conclude the optimal number of flavor permutations is 72 (the number of ice creams shown on the right), as very few ice cream companies have been more successful or creative than Ben & Jerry's.