Another way of stating this question is "Are all digits equally likely?"

It turns out, no.  For large sets of numbers resulting from measurements of nearly anything, the lower numbers are more common.  In fact, they tend to follow a power law (See below).

But saying so doesn’t make it so. How about some examples?

To get some quick results, I wrote a Python script to count digits.  The core counting routine is shown below (download .py, PyX required for making plots).

inf = file(options.filename,’r’)

nre = re.compile(‘[0-9]’)

hist = {0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0}

for line in buf:
nlist = nre.findall(line)
for n in nlist:
hist[int(n)] += 1

…

Next, find some data.  I started close to home by looking at the data from a monthly report of online performance data and financial performance data for the employer.  For this data, the histogram of 1 month’s data looks like Figure 1 below.

Figure 1. Distribution of digits 0-9 in monthly performance data for AdPay, Inc.

For a quick comparison, let’s find some data on the Web for rainfall and population statistics.

Figure 2.  Distribution of digits 0-9 in rainfall data. Is the digit ‘3’ unusually common in rainfall data?

Figure 3.  Distribution of digits 0-9 in population data from a combination of several countries.