Just HOW Perfect Does Practice Make?

Recently Vox ran an article titled “I was really bad at sports in high school. This new study helps me understand why.” The author, Brian Resnick, then goes on to cite a meta-analysis that looks at 33 studies, to explain the marginal effect of practice on athletic achievement. That’s a pretty astounding proclamation, that “practice doesn’t explain why the best athletes are so good,” as the article states.

So how did the meta-analysis reach the conclusion that practice only accounts for 18% of success? Well, let’s take a look at some of the potentially limiting factors of this type of study:

  • Self-reported results. When athletes report their own practice times, that data may not be 100% reliable
  • Other variables that account for performance. The authors of the meta-analysis were unable to account for other factors that may correlate with practice, such as the fact that less naturally gifted people may practice more to compensate, or possibly even less because they just play a sport on the side. Remember – correlation does not necessarily equal causation.
  • Differences among sports. Depending on the sport, practice may play a larger (or smaller) role. For example, genetics may have a significant impact on a sprinter’s natural speed, but only a small impact on a pitcher’s curveball accuracy

Finally, consider where the 18% figure came from. From what we see, it belongs to a pie chart titled “overall.” But when you look at the chart titled “Mixed” (denoting a mix between sub-elite and elite samples), it jumps to a much higher 29%, implying that the difference between elite and sub-elite athletes is much more attributable to practice.

So, does practice make perfect? It’s hard to say. But if you want to be an all-star data expert, it certainly helps to take a closer look at the numbers. 

Not so Fats! How a Dieting Study Can Lead You Astray

Earlier this year, mensfitness.com ran an article titled, “New Study Could be your Excuse to Eat that Epic Burger this Memorial Day”. According to this article, the “new study” means that you can take days off from your diet to eat high fat diets, so long as you eventually go back to your low fat diet, and not be any worse off. But let’s take a closer look at the facts of this study:

·         Mice on the high fat diet actually DID gain more weight, but the scientists decided to call it a comparable amount. Normal diet mice gained 5g, and alternating gained 8g. Is that truly a “similar” amount?

·         All mice got the SAME total amount of calories. The alternating diet mice just got fewer calories on their “normal days”

·         The mice on the high fat diet ended up gaining body fat and losing lean muscle. This means even if the mice stay the same weight with this diet method, they’ll become pudgier since a pound of fat takes up more space than a pound of muscle

·         The subjects were mice! Until something is tested with humans, we can’t know for sure that it extends to us.

 

Several of these things are mentioned as disclaimers at the bottom of the article, but the author still goes on to tell you to eat a (high-fat) burger.

As we talk about in our book, when you see these types of studies reported in the media, make sure you’re asking the right questions, and waiting to act until you have the answers.

In other words, don’t bite on every “new study” you see.

Quantifying Data Consumed at Thomas Jefferson High School

Source: sunwarrior.com

Source: sunwarrior.com

One of the true highlights of the experience of writing a book for me has been the opportunity to speak to different audiences about statistical literacy.    During a recent talk at Thomas Jefferson High School, the students asked me some great (and tough) questions.   TJ is one of the premiere high schools in the country, with a significant focus on math and science, so the students were, in some respects, an ideal audience for my message.

One of the numbers we often quote in our talk is that the average American consumes 34 GB of data each day.    I was asked by one of the students whether that was simply a calculation that added the volume of data a given individual is exposed to, versus how much data the person's brain actually cognitively consumes.  Great question, and I promised the class I would follow up.

It turns out, when we wrote the book, my co-author Mike actually spoke to Derek Daniels, PhD (an associate professor at the University at Buffalo) about this very issue on quantifying data.  Here is their exchange:

That's a good question. It's certainly quantifiable. You would need to know the number of sensory receptors engaged (this would depend on how much you move the toothbrush around, and the density of receptive fields). Each receptor would be activated in a way that could be quantified based on the neural activity. Ultimately, this could be quantified by measuring the amount of transmitter released by the cells into the brain. Not easily quantifiable (practically impossible in humans) but could be estimated. I'm sure there are oral biologists and sensory physiologists who have the numbers you'd need to get a reasonably good estimate. I can dig around a bit and see what I can find. If you don't hear from me in a while, send a reminding nudge.

Then a follow up: 

What are your units? Is a photon hitting a photoreceptor in the retina one unit? If that single photon generates 100 spikes in the sensory neuron, is that 100, or still one because it's one photon?

When I showed him our source, and talked about vibrating toothbrushes:

By that measure, you might consider a vibration (one pool of bristles back and forth) a byte, so the measure would have less to do with the nervous system, and more to do with the stimulus. That would be consistent with the letter=byte idea, which doesn't take into account anything on the nervous system side of things -- otherwise the size of the letter, relative to the distance from the eye, would play a role in determining how many photoreceptors were activated. Fun problem.

If you have other questions, we'd love to hear from you!  

Nice work from the Washington Post

Source: SurgeonGeneral.gov

Source: SurgeonGeneral.gov

It's easy to find misleading articles in the news. Today, we're going to focus on an article that does a nice job of explaining a complex, data-driven issue.

The article - "One of America’s healthiest trends has had a pretty unexpected side effect" — was published in the Washington Post. The basic premise is that, as smoking rates have declined, obesity has been on the rise. 

So, what does author Roberto Ferdman do well? 

  • He uses (and links to) established sources, including the Surgeon General, the Cleveland Clinic and the National Institutes of Health
  • He explains the methodology behind the research - including the fact that researchers had to approximate certain effects
  • Perhaps most importantly, he draws a distinction between correlation and causation. He quotes an obesity expert (Yoni Freedhoff) who said, "Obviously, it's hard to establish any causal relationship here, but I would definitely say it's plausible that the fall in smoking contributed to the rise in obesity."
  • Finally, he offers a takeaway that cautions people against reading too much into the study: "What exactly are we supposed to glean from the suggestion that the fall in smoking might have contributed to the rise in obesity? The answer is not that anyone should look back upon the days when more than half of the population smoked regularly with nostalgia. Rather, according to Baum, it's better to view the study's finding as more of a point of interest, a takeaway that allows us to look at how societal changes move like waves that ripple, touching other shifts, even if only slightly."

Read the full Washington Post article here

EVERYDATA meets Sherbit

The average person consumes 34 gigabytes worth of data each day. Some of that data comes from the apps and devices that you use, which are constantly collecting data about your everyday activities.

Recently, we discovered Sherbit - a quantified self app for managing and understanding your personal data. Sherbit claims to let you easily understand and analyze your information, as you can see in the sample chart above. It's an interesting and fun way to look at some of the data in your life, and use it to learn more about yourself.

As a smart consumer of data, here are a few things to consider as you look at data from apps like Sherbit: 

1. Where is your data coming from? If you're analyzing how many steps you take in a day (which is what the chart below shows), ask yourself - are the devices that track your steps accurate? Are they really tracking steps, or are they using a motion tracker or other technology to approximate the number of steps you take each day?

Driving time vs. steps

Driving time vs. steps

2. Are you looking at correlation or causation? This chart shows something interesting—the amount of driving time decreased as the number of steps increased. But does that mean that stopping the car caused you to take more steps? Does it mean taking more steps caused you to stop driving? Or is it just a correlation, rather than a causal relationship?  

3. Understand that predictions may only be as good as your prediction model. Let's say as your weight goes down, so does your heart rate. You might assume that your heart rate will keep going down as you lose weight. But will that relationship hold true forever, or will your heart rate stop going down at some point?

4. Make sure you're asking the right question to solve the problem. Imagine you want to lose weight. You might ask: "How many steps do I have to take to lose 1 pound each week?" Or perhaps, "What exercise is the most effective in losing weight?" Maybe even, "What food should I cut out of my diet in order to lose weight?" All of these are valid questions - but each one may lead to a different answer.

To learn more about Sherbit, visit their website.

How to save money when you buy a house – the Everydata way

Source: Wikipedia

Source: Wikipedia

You’ve decided it’s time to stop renting and own your home. But do you know the 3 questions that can save you money when you buy a house?

Which “comps” are being used? Oftentimes, real estate agents will run “comps,” or comparisons of houses in the area, to show you how much a house should cost. But, an agent who wants to justify a higher price for a house may cherry pick certain comps to show you. Make sure you’re looking at all the comparable houses in your neighborhood.

How accurate is the Zillow data? Many people use Zillow to negotiate the price of a home. As we explain in our book, Zillow gives a “Zestimate” that estimates the price of a house. But only half of these Zestimates in a neighborhood are within what they call their median error rate. For example, if you’re trying to buy a home with a Zestimate of $500,000, even if it’s one of the houses within a median error rate of 5 percent, that means it could be $25,000 more or less than the estimate. That’s a $50,000 range – which gives you a lot of negotiating room. And remember, only half of the homes are even within the median error rate – the rest are outside of this $50,000 range.

Is this a good time to buy a house? You tell your parents buying a home is a great investment since prices are up 180 percent. Your Dad says you’re making a huge mistake because housing prices are actually down nearly 20 percent? Who’s right? You both could be – it just depends how many months (or years) of data you’re looking at. Anytime you’re looking at historical data – as you are with home prices – consider the date range carefully.

Want more ways to improve your life using data? Sign up for our newsletter today!

"Required reading" says Arkansas Business

Slide6.png

In a column that referenced John Oliver and Tyler Vigen (who we quote in our book), Arkansas Business editor Gwen Moritz argued that EVERYDATA "should be required reading in high school and for every journalist and journalism student in the universe."

Here are a few of our other favorite quotes:

"Using sentences and examples that even I can understand, Johnson and co-author Mike Gluck explain the way averages can be used and misused."

"They write about the problem with self-reported data... [and] warn us to consider whether important data might be missing."

"We can either be smart, skeptical consumers of data or suckers. Take your pick."

You can read  Gwen's full article here.

Buffalo - Everyone's Favorite Big City?

Source: www.city-data.com

Source: www.city-data.com

In case you missed it, Buffalo NY was just ranked #1 in Travel+ Leisure's America's Favorite Places survey

We love Buffalo. John went to high school there, which is also where he met Mrs. Everydata. The city has come a long way in recent years, thanks to Canalside, the Medical Campus and other developments. 

But is Buffalo really #1? It depends how you look at the data.

  • The rankings were based on publicly available online survey, which meant that anyone with Internet access could cast a vote. Presumably, this also means that all votes are equal. A vote from the top travel writer in the country would count just as much as a vote from your Uncle Joe. 
  • Cities were defined as "governed bodies with a population over 100,000" according to Travel+Leisure. So if your favorite city was smaller than that, it wasn't eligible. 
  • Voters could rank cities based on more than 65 categories, and then "each entry was ranked according to an average score." As you know, an average can hide variation in the data. For example, if everyone ranked Buffalo highly on 60 out of 65 categories, and poorly on the other 5, Buffalo would still likely have a very high average score—a score that effectively hides the low scores in 5 categories.
  • Another question to ask: were all categories treated the same in the average, or were some categories weighted more heavily than others?  

This all comes back to something we stress in our book: Are you asking the right question to get the answer you want? In this case, are you asking which city is ranked the highest by travel experts? Or are you asking which city is the most popular with people who answer online surveys?

Check Your Typosquatting Numbers...

Source: http://www.theslicedpan.com/nonsense/16-deeply-unfortunate-but-funny-typos-in-newspapers/299459

Source: http://www.theslicedpan.com/nonsense/16-deeply-unfortunate-but-funny-typos-in-newspapers/299459

Last week, EVERYDATA highlighted some costly typos thanks to Mental Floss.  We also teased this article that talks about research that estimates Google Ad revenues of over $497 million per year on mistyped websites.  What caught our eye was the phrase "with some back of the envelope math" before the cost estimates.

We took a look at the original and as best we can tell, yet to be published, research article.   The article actually focuses far more on the technical aspects of "typosquatting" and runs a series of statistical analyses on factors that make it more likely someone mistypes a web domain.   

The estimate of $497 million per year, though, is not in the working paper.  It is actually found in a separate web appendix found here.  There, we can see the assumptions underlying the calculation:

First, the calculation assumes a constant 3.5 cents of revenue per Google search.  If that is not accurate, or varies widely by search, that would influence the estimate.

Second, there is no publicly available information on Google domain parking fees, so the author's assume for the calculation that is comparable to Google search prices.  Again, if that base assumption is wrong, the estimate is likely off.

Third, the calculation assumes that the results they generate from the top 3264 sites can be extrapolated to the top 100,000 sites.  Ask yourself the question--does it make sense that the frequency of typosquatting on the most popular websites occurs at the same rates as the 100,000 most popular website?

All of this interested triggered because of the phrase "back of the envelope math."  

The Everydata of Tyops (Whoops!)

Source: techslides.com

Source: techslides.com

In a fun article by Jennifer M Wood at Mental Floss, the value of some very costly typos (10, to be exact) is described.  You can see the list here, which includes everything from leaving out the "not" in one of the 10 Commandments (think adultery) to misprinting rate increase maps on the NY Subway.

In our book, we talk about thinking about data more broadly than just statistics and numbers, but the plethora of information around us.  One of our favorite examples is about a mistake involving the stock ticker NEST.  You'll have to check out the book to learn more--it is in the chapter on misrepresentation of data.

TEASER ALERT: Something else caught our EVERYDATA eye in the Mental Floss story, a reference to an article about the potential cost of typos generating google ads revenue.   An eye-popping $497 million dollar number and a back of the envelope calculation lead to an EVERYDATA investigation...More on that in part II later this week...

 

Where Does Polling Data Come From?

"Hillary Clinton holds a 12-point lead over Bernie Sanders nationally, but in a hypothetical match-up against Donald Trump, Sanders does much better than the current Democratic front-runner."

That was the lead paragraph from an NBC News article about the 2016 presidential election. And yes, that's what the polling data said. 

But - there's almost always a but - did you see where the data comes from? At the bottom of the article, they disclose that the poll was conducted online among adults who say they're registered to vote, and who take surveys on the SurveyMonkey platform.

Do you see the concerns? 

  • The poll was conducted online - which means if you don't go online, you wouldn't have been part of the sample set
  • It relied on self-reported data - that is, adults who claimed that they're registered to vote
  • The only people surveyed were those who use SurveyMonkey's platform

To their credit, according to the full methodology, data was "weighted for age, race, sex, education, region, and voter registration status" using Census Bureau and other data "to reflect the demographic composition of the United States." But that doesn't erase the fact that, as they state, "the sample is based on those who initially self-selected for participation," which in this case means "no estimates of sampling error can be calculated."

Are people who take online surveys - often for a chance to win a prize - representative of the general population? Even if they're weighted after the fact, we'd want to take a closer look at the data. Maybe it's perfectly valid. But, as I explain in my book, these are exactly the types of questions you should be asking - as an educated consumer of data - when you look at polls in the news.

Can Cheating Be Good?

Reproduced from the article; attributed to Lisi Niesner / Reuters

Reproduced from the article; attributed to Lisi Niesner / Reuters

Between the headline ("The Glory of the Cheat Day") and the lead photo (above), I really wanted this article in The Atlantic to be beyond reproach. The article asserts that a "recent study" found that people that allowed themselves to occasionally cheat on their goal (like a diet) had a greater likelihood of success. The by-line is very official: "A new study suggests that planned lapses in self-control can help you stick with your goals over time." They even cite the flashy term that the study gives cheating and italicize it for emphasis: "planned hedonic deviations." That is so official, it has to be true! 

As always, digging a little deeper shows that the conclusion might be too good to be true. The article does state that the study relied on three experiments:

  • The first experiment was a thought exercises for the participants. As we all know after reading Everydata, self-reported data can be biased. The article doesn't tell us the sample size.
  • The second experiment puts 36 people through the diets envisioned during the thought exercise in the first experiment. It's a small number of people and they were only on the diets for two weeks.
  • The third experiment is a questionnaire asking people how they might react. Again, self-reported data.

Relying on the link in the article to slightly more detailed summary (though equally glutinous image) in the blog Research Digest by the British Psychological Society, we learn that the study used only 159 people across the three experiments: 59 in the first, 36 as stated in the second, and 64 in the third. All very small groups. We also learn that in the second study, the participants are given one cheat day per week so the affects are measured over two events. 

While the results may be accurate, we might just have to struggle on with our diets until deeper research can be done. 

Talking to Myself

In this recent article in Higher Perspectives, a study is highlighted that suggests that people who talk to themselves can perform better at a certain manual task, in this case, looking for a given product in a supermarket.   The study is based on an experiment of 20 participants.  Sometimes participants were asked to find a grocery item but had to remain silent, and other times, they were able to speak to themselves as they searched for the product.   

The original research is based on a very small sample, and I don't see anything in the research that justifies the claim made in the graphic above that accompanied the Higher Perspectives article.   Its also not clear from the HP article how the objects were chosen, and what controls were in place to prevent any other factors from contaminating the results.  

Barking Up The Wrong Tree

“The Data Says ‘Don't Hug the Dog!’ ” 

That was the headline on a Psychology Today article that's getting a ton of coverage in the press.

Should you believe it? Not exactly. Because the "data" was simply based on observations of 250 pictures of dogs—a fact that raises a lot of questions. For example: 

  • What breeds of dogs were studied?
  • Were they representative of all dogs?
  • Where did the photos come from?
  • Are people who post pictures of their dogs different than people who don't?
  • Is 250 dogs a large enough sample size?

Fortunately, the Washington Post and other media outlets have pointed out some of the flaws with the study. And, to be fair, much of the hype around this study appears to have come from the media outlets themselves; the author of the piece never claimed that it was a peer-reviewed study - only that he was collecting some data.

If there's one thing you'll learn from our book, it's to question where the data comes from. Should you hug your dog? Maybe, maybe not - but you should probably look for more scientific-based data if you want to make the right decision.

An Election Year Lesson in Anecdotes

Bernie Sanders at a rally in New York before Primary Day.  Source: Justin Lane/EPA.

Bernie Sanders at a rally in New York before Primary Day.  Source: Justin Lane/EPA.

28,000.   Stories of enthusiastic crowds all over NY State were a familiar news story leading up to the primary. Pundits proclaiming that the enthusiastic crowds were a pre-cursor to a potentially shocking upset victory in NY that would betray the polls.

Hillary Clinton won by a little over 300,000 votes, taking 58% of the vote.  As described in this article from Vice News:

His defeat in New York was bleak.  Despite record-setting rally turnouts of 28,000 and a grassroots effort manned by some of his most vocal and vehement supporters both on and offline in the Empire State, Sanders' momentum after eight straight primary victories failed to translate to the polls in New York on Tuesday.

This political narrative reminded me of one of my favorite quotes in the book-- "The plural of anecdote is not data."   Large, boisterous crowds are anecdotes.  But having a large crowd doesn't necessarily mean that it will translate to broader support on election day.  The same narrative was reflected in the 2012 Presidential Election.   Here is a description of Mitt Romney's shock on Election Day from CBS News:

"He was shellshocked," one adviser said of Romney.

Romney and his campaign had gone into the evening confident they had a good path to victory, for emotional and intellectual reasons. The huge and enthusiastic crowds in swing state after swing state in recent weeks - not only for Romney but also for Paul Ryan - bolstered what they believed intellectually: that Obama would not get the kind of turnout he had in 2008.

Large crowds may be energizing for candidates, and may well be a sign of a deeply loyal following.  But, when you are dealing with millions of potential voters, tens of thousands of supporters is merely an anecdote.

Inside Self Storage World Expo

John will be keynoting the Inside Self Storage World Expo this week in Las Vegas, Nevada on Tuesday morning.

OPENING SESSION: Self-Storage and ‘Everydata’: Recognizing the Misinformation Hidden in the ‘Little Data’ You Use Every Day.

We are constantly bombarded with data. How do you know if it’s reliable? The constant barrage of information can make it difficult to analyze the pieces—the good, the bad and the ugly—that are critical to your business decision-making process.

In this insightful opening session, our presenter will show you how to recognize the misinformation that’s buried inside the “little data” we consume all day, every day. You’ll not only get an engaging and easily understandable overview of basic statistical-analysis techniques to assist you in making decisions around your self-storage investment or operation, you’ll hear examples that are specifically relevant to facility ownership and management. Join us for this powerful presentation and get the tools and confidence to be smarter and more discerning in your approach to data!