Quantifying Data Consumed at Thomas Jefferson High School

Source: sunwarrior.com

Source: sunwarrior.com

One of the true highlights of the experience of writing a book for me has been the opportunity to speak to different audiences about statistical literacy.    During a recent talk at Thomas Jefferson High School, the students asked me some great (and tough) questions.   TJ is one of the premiere high schools in the country, with a significant focus on math and science, so the students were, in some respects, an ideal audience for my message.

One of the numbers we often quote in our talk is that the average American consumes 34 GB of data each day.    I was asked by one of the students whether that was simply a calculation that added the volume of data a given individual is exposed to, versus how much data the person's brain actually cognitively consumes.  Great question, and I promised the class I would follow up.

It turns out, when we wrote the book, my co-author Mike actually spoke to Derek Daniels, PhD (an associate professor at the University at Buffalo) about this very issue on quantifying data.  Here is their exchange:

That's a good question. It's certainly quantifiable. You would need to know the number of sensory receptors engaged (this would depend on how much you move the toothbrush around, and the density of receptive fields). Each receptor would be activated in a way that could be quantified based on the neural activity. Ultimately, this could be quantified by measuring the amount of transmitter released by the cells into the brain. Not easily quantifiable (practically impossible in humans) but could be estimated. I'm sure there are oral biologists and sensory physiologists who have the numbers you'd need to get a reasonably good estimate. I can dig around a bit and see what I can find. If you don't hear from me in a while, send a reminding nudge.

Then a follow up: 

What are your units? Is a photon hitting a photoreceptor in the retina one unit? If that single photon generates 100 spikes in the sensory neuron, is that 100, or still one because it's one photon?

When I showed him our source, and talked about vibrating toothbrushes:

By that measure, you might consider a vibration (one pool of bristles back and forth) a byte, so the measure would have less to do with the nervous system, and more to do with the stimulus. That would be consistent with the letter=byte idea, which doesn't take into account anything on the nervous system side of things -- otherwise the size of the letter, relative to the distance from the eye, would play a role in determining how many photoreceptors were activated. Fun problem.

If you have other questions, we'd love to hear from you!  

"Required reading" says Arkansas Business


In a column that referenced John Oliver and Tyler Vigen (who we quote in our book), Arkansas Business editor Gwen Moritz argued that EVERYDATA "should be required reading in high school and for every journalist and journalism student in the universe."

Here are a few of our other favorite quotes:

"Using sentences and examples that even I can understand, Johnson and co-author Mike Gluck explain the way averages can be used and misused."

"They write about the problem with self-reported data... [and] warn us to consider whether important data might be missing."

"We can either be smart, skeptical consumers of data or suckers. Take your pick."

You can read  Gwen's full article here.

How Bad Is Your Data?

an interesting new project by Alexandra Meliou at U Mass Amherst is focusing on all the ways data collected can go wrong--or what is called "Bad DATA."  As explained in this article from the Daily Hampshire Gazette, her five year project will focus on "how data is accumulated and shared to gain insight into how such information is weakened by bad curation or being taken out of context."

Read More

Tablet vs Paperback?

In our forthcoming book, Everydata, we very briefly address an interesting study from the University of Oregon that finds "People actually recall more information when they read a printed newspaper versus reading it online." Our purpose in raising the study was not to closely examine the underlying statistical methodology (though we might have something to say about the sample of 45 people) but to introduce the concept that how you receive your data can also effect how you interpret or retain it.

Read More

Gender, Teacher Evaluations, and Key Words 2: Understanding Omitted Variables?

An example of a performance evaluation form used by students to rate their courses and instructors.
An example of a performance evaluation form used by students to rate their courses and instructors.

In my prior post, Gender, Professor Evaluations, and Key Words Part 1I described an interesting website that can be used to show how male and female professors differ in their professor evaluations by any key word across two dozen disciplines.   That post ended with this cliffhanger question:Can we interpret all this information to tell us that male professors are more boring, but female professors are more frequently incompetent and beautiful?  

When we attempt to understand differences along a key dimension, (such as difference in whether a teacher is rated as "boring" by gender), we need to consider whether the relationship we are observing is capturing  the "whole story," or whether there is  some other confounding factor that could explain what we are observing.  In this example, in order to determine whether students rate professors differently whether they are male or female, we need to make sure we are comparing identical professors along all other dimensions. Among economists and statisticians, drawing a conclusion about a relationship between two factors when ignoring a key alternative factor is known as omitted variable bias.

It won't surprise you that a range of factors could go into a student's subjective professor evaluations.  Just off the top of my head, I came up with a few examples:

  • How difficult was the class and subject material?
  • Did the student get an A in the class?
  • How much homework did the student receive?
  • Was the class at a time that college students might not like (such as the 8 AM Friday class?)
  • How accessible was the professor?
  • Was the class an introductory course, a course required for a major, or an elective class?

Now, the fact that a number of factors could ultimately determine student rankings is still not enough to draw a conclusion.  The important question is whether any of these "other factors" vary systematically by gender.   Take an extreme example:  Assume in the Math department, female professors were always assigned to teach the much harder required Calculus class, whereas male professors were assigned to teach the very popular statistics elective.   If we observed significantly more negative ratings for female professors, it could be simply driven by the fact that female professors were disproportionately teaching the difficult, less popular class that students were more prone to rank negatively in end of course evaluations.

This simple example is not to imply that the differences may not be due to the fact that students rank professors differently depending on gender--but we surely can't tell that from a statistical perspective based on these simple observed differences.

And, as an aside, for an interesting view on how effective student evaluations are in assessing professor performance, I point you to this blogfrom Berkeley Statistics Professor Philip Stark.