The NCAA Tournament -- March Madness-- is a tradition that often captures the imagination of even non-basketball fans everywhere. From a data perspective, a flood of interesting related concepts come to the surface at this time of every year. For example:
ORDINAL RANKINGS: Teams are selected and ranked in four brackets from 1 to 16, plus 4 play-in teams. Selection Sunday sets off endless rounds of debating, haggling, and discussion of the subjectivity of these ranks. For example, see this article from CBS Sportsline.
PROBABILITY: As office mates around the country dutifully fill out their brackets, people may harken back to the old "ball in urn" problems from high school days. Here are some articles highlighting the probability of:
Getting a perfect bracket: Nate Silver's 538.com blog (the answer is 1 in 1,610,543,269.)
How to win without picking Kentucky: Neil Greenberg at Washington Post.com.
Betting on a 16-seed to win: Heavy.com feature on the profitability of such bets.
MEASUREMENT: And, finally, what would the tournament be without a plethora of studies trying to quantify the lost productivity this time of year? Here is one of several articles putting the estimates of losses at over 1 billion dollars. But, as a consumer of data, make sure to ask yourself the important questions--what are the underlying assumptions, how is the data being extrapolated, and what may be the limits of such studies?