Skip to content

Is the distribution of the digits 0-9 uniform?

May 8, 2007

Another way of stating this question is "Are all digits equally likely?"

It turns out, no.  For large sets of numbers resulting from measurements of nearly anything, the lower numbers are more common.  In fact, they tend to follow a power law (See below).

But saying so doesn’t make it so. How about some examples?

To get some quick results, I wrote a Python script to count digits.  The core counting routine is shown below (download .py, PyX required for making plots).

inf = file(options.filename,’r’)
buf = inf.readlines()

nre = re.compile(‘[0-9]’)

hist = {0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0}

for line in buf:
    nlist = nre.findall(line)
    for n in nlist:
        hist[int(n)] += 1

… 

Next, find some data.  I started close to home by looking at the data from a monthly report of online performance data and financial performance data for the employer.  For this data, the histogram of 1 month’s data looks like Figure 1 below.

 

April Performance Report Histogram

 

 

Figure 1. Distribution of digits 0-9 in monthly performance data for AdPay, Inc.

 

For a quick comparison, let’s find some data on the Web for rainfall and population statistics.

Rainfall Histogram

Figure 2.  Distribution of digits 0-9 in rainfall data. Is the digit ‘3’ unusually common in rainfall data?

 

Distribution Population Historgram 

Figure 3.  Distribution of digits 0-9 in population data from a combination of several countries.

 

For more information:

Advertisement
No comments yet

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: