Skip to content

Dp-means: Optimizing to get the number of clusters

July 19, 2012

In my last post I compared dp-means and k-means error functions and run times.  John Myles White pointed to some opportunities that come from \lambda being a continuous variable.

Evolving the test code I posted on github, I developed a quick-and-dirty proof of concept.

First, below is the parameter vs. error graph in its latest incarnation.  There are two important changes from the analogous graph from last post:

  • Instead of using the k-means cost function to make the timing, error comparisons as I did before, I am now plotting the traditional k-means cost function for k-means and the cost function for dp-means,

\text{Cost(K-means)} + \lambda k

  • I am not plotting against \text{data range}/\lambda for comparison
  • I am plotting errors for a data set not used in training (called cross-validation in the code).

The cost function for dp-means shows a clear minimum. This graph is slightly confusing because the parameter for k-means, k, the number of clusters, increases left-to-right, while the number of clusters in dp-means goes down with increasing parameter \lambda.

I wrote a small script that leverages SciPy to optimize the dp-means cost function in order to determine the optimal value of \lambda, and therefore the number of clusters.

Here is an example on one of the data sets included as an example “input” directory.  This code runs slowly, but converges to a minimum at,

lambda: 5.488
with error: 14.2624

Here is a sample training at the optimal value with only the data as input (the code determines everything needed from the data.)

Figure shows training iterations for centers, training
data membership, cross-validation data membership.

The code is rough and inefficient, but the method seems robust enough to proceed to work on smoothing things out and run more tests. Neat.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: