Tuesday, February 10, 2009

Crunching Numbers

We begin by examining the daily life of a statistician. One of the most common tasks a statistician must undertake is the calculation of a sample mean. Let's work through an example to illustrate this.

We begin with a set of numbers, which statisticians often call "the data". (A common question ordinary laymen have is how to pronounce the word "data". It's an apt question, as I have personally heard the word pronounced in no fewer than two ways! Fortunately, there's a simple rule to help you remember the correct pronunciation of the word: Data rhymes with "strata".)

Suppose these are the numbers: 4, 8, 15, 16, 23, 41, 17, 26.923, 1. (An immediate question is, where have these numbers come from? Lengthy tomes could be written on this subject, but it suffices to say that they are usually on a piece of paper handed to you by the boss.) In initially perusing the numbers you may believe you see a familar pattern; ignore it! -- it can only lead to disappointment when the pattern goes askew. Professional statisticians are best equipped to handle this, having been trained from an early age to scan numbers dispassionately.

Let's return to the task of the calculation. The first step is to sum the numbers. This is most easily done with electronic calculators, which are commonly found, e.g., on the faces of today's fashionable wristwatches. (In this manner, we can achieve the sum without any carrying of ones, twos, or threes: Statisticians avoid manual labor whenever possible.) In this example, the sum of all of the numbers is easily seen to be 151.923. We're nearly halfway done.

The next step is to count the number of data values. Go ahead, count them. (To minimize the chance of error, you should actually place your finger on top of each number on the computer screen as you count. Don't worry about smudging the screen; modern computer monitors are easily cleaned with a scouring pad or something.) Done? I hope you agree that there are nine values in the data set. A neophyte would finish the calculation by dividing the previously attained sum of 151.923 by 9. While this technically works, it's kind of like riding a tricycle. Any thrill-seeking statistician would scorn that approach, instead being much more likely to MULTIPLY the sum by 1/9. Either way, the result is identical: approximately 16.8803. At long last, we have our sample mean! Some folks colloquially call this "the average", but such people are Philistines who, as the avian biologists like to say, "couldn't tell a Sphyrapicus thyroideus thyroideus from a Sphyrapicus thyroideus nataliae".

With the sample mean safely calculated, the statistician can head home, content in the afterglow of a hard day's work done. In our next posting in this series, we'll examine another common statistician's task: the development of asymptotically unbiased (order root-n) confidence bands for the hazard function of a doubly-censored random vector. Until then, good night!

2 comments: