At some point in 1997 I sat down to think about graph clustering. It occurred to me that random walks in the setting of graphs possessing cluster structure should exhibit characteristics reflecting that cluster structure. Much later I found that this thought had occurred to other people as well. They had chosen a different path than the one that occurred to me though. The difference was that they incorporated true randomnization and sampling into their processes. From the very begining I was interested in the pure classical distribution of random walk probabilities on a graph - which is in itself a fully deterministic concept.

I then wrote a simple perl script to compute random walks on graphs, i.e. a simple application that implemented matrix multiplication and used stochastic matrices. Indeed, inspection of the transition matrices revelead that intra-cluster edges were more likely to be visited than inter-cluster edges. Of course, it could not have been otherwise.

Obviously, again, this behaviour washes away after computing a few iterations. I was testing on a small undirected graph. For such a graph, the equilibrium state correlates with the degrees of the nodes. This reflects some structural information but not a whole lot and certainly not enough to perform clustering. At that point I realized something else was needed.

Without giving it much thought I decided to square all the matrix entries of the 2-step transition matrix and rescale the columns to be stochastic again. That would certainly boost higher probabilities over lower probabilities, and do so in a natural localized manner. It was clear doing this only once would not suffice as one ends up with the equilibrium state of a transformed graph. Again without giving it much thought I stuck the new operator in the loop computing the expanded walks, and did it such that the modified matrix was simply squared. This seemed to make more sense than combining the input matrix or one of its powers with the modified matrix. Several people have later tried to define other processes by mixing different matrices, but as far as I know to little avail.

I was testing on a very small undirected graph with a lot of symmetry in it. The first experiment did not work. A little inspection revealed that return probabilities were causing problems. I added loops to the input graphs and voila!

All the subsequent experiments immediately suggested this approach worked very well.

What followed was painstaking work to create some mathematical description of what was going on. This succeeded to a large extent, the main result showing that MCL iterands are nice in a way that can be precisely described. It implies that the iterands posses a DAG structure which is a direct generalization of the structure found in the MCL limits. Both structures induce overlapping clusterings in a natural way - but in the limit clusterings are almost always strict partitionings. Although some mathematical results have been derived, I believe that there are many interesting and hard questions still waiting to be answered - you will find a few in my PhD thesis.

The time-span in which the MCL process and MCL algorithm came to me was no more than five minutes. It was something that happened to me, not something wrought by me. It feels like a nice discovery, a phenomenon of nature.