This introduction was copied from the technical report A stochastic uncoupling process for graphs. There is also a less technical introduction to the MCL algorithm.

An MCL process is characterized by an
infinite row of pairs (*e _{i}, r_{i}*), where the

*e*are integers greater than one, and the

_{i}*r*are real numbers greater than zero. An input matrix

_{i}*M*yields an infinite row of matrices

*M*by setting

_{(i)}*M*=

_{1}*M*, defining the even-labeled iterands by setting

*M*to

_{2i}*M*raised to the power

_{2i-1}*e*, and the odd-labeled iterands by

_{i}*M*= Γ

_{2i+1}_{r_i}(

*M*). The operator Γ

_{2i}_{r_i}transforms a column-stochastic matrix into another column-stochast matrix by raising each entry to the power

*r*and rescaling the result to be stochastic again. For stochastic matrices diagonally similar to a symmetric matrix, the type of limit invariably found is that of a doubly idempotent matrix; idempotent under both matrix squaring and matrix inflation.

_{i}
A nonnegative idempotent matrix *L* without zero columns
induces an overlapping clustering on the column indices with the property
that each cluster has at least one element not contained within any of the
other clusters. The number of clusters, say *k*, is equal
to the multiplicity of the eigenvalue 1 in the spectrum of *L*. The sets
of unique elements in the clusters form the strongly connected components in
the associated graph of *L*, the number of which is also *k*. Experiments
yield that initiating an MCL process with an input matrix, the associated graph of
which has only one strongly connected component, may in general give an
idempotent limit with a larger number of connected components.
Interpreting the connected components as clustering yields a clustering
whose distribution is invariably strongly related to the density
characteristics of the input matrix. In this sense, the MCL process appears to be
useful.

The MCL process converges quadratically in the neighbourhood of doubly idempotent stochastic matrices for which all columns have precisely one nonzero entry. It converges quadratically on a macroscopic scale (in terms of block structure) for doubly idempotent stochastic matrices in general. The clustering associated with such a matrix is stable under perturbations of the MCL process (that is, it is essentially defined by the block structure), except for the phenomenon of overlap. This is covered extensively in this technical report.

The limit resulting from an MCL process is in general extremely sparse. Current evidence suggests that limit matrices which have columns with more than one nonzero entry imply the existence of a nontrivial automorphism for the underlying graph of the input matrix. The sparseness of the limit, and the sparseness in a weighted sense of intermediate iterands have nice and important repercussions for the scalability of the cluster algorithm based on the MCL process.

The MCL process is interesting from a mathematical point of view,
since it apparently has the power to inflate the spectrum of a stochastic matrix, by
pressing large eigenvalues towards 1. This effect is strong enough to
overcome the effect of matrix exponentiation, which has the property of
exponentiating the associated eigenvalues.
The fundamental property established is
that Γ_{r} maps two nested classes of stochastic matrices with
real spectra onto themselves. The largest
class is that of diagonally symmetrizable stochastic matrices, i.e. matrices
which are diagonally similar to a symmetric matrix without further
constraints. This class is mapped onto itself by Γ_{r} for
arbitrary *r* in **R**. Defining *diagonally positive semi-definite*
(*dpsd*) as the property of being diagonally similar to a positive
semi-definite matrix, the second class is that of stochastic *dpsd*
matrices. This class is mapped onto itself by Γ_{r} for *r* in *N*.

Using the property that minors of a *dpsd* matrix *A* are nonnegative, it is
shown that the relation ~~> defined on the nodes of its associated
graph by *q*~~> *p* iff |*A _{pq}*| >= |

*A*|, for

_{qq}*p*and

*q*different, is a directed acyclic graph (

*DAG*) if indices of identical (modulo multiplication by a scalar on the complex unit circle) columns, resp. rows are lumped together. This generalizes the mapping from nonnegative idempotent column allowable matrices onto overlapping clusterings, and it sheds some light on the tendency of the MCL process limits to have a larger number of strongly connected components than the input graph. It is then shown that applying Γ

_{inf}to a stochastic

*dpsd*matrix

*M*yields a matrix that has spectrum of the form {0

^{n-k}, 1

^{k}}, where

*k*, the multiplicity of the eigenvalue 1, equals the number of endclasses of the ordering of the columns of

*M*provided by the associated

*DAG*. It is not necessarily true that Γ

_{inf}(

*M*) is idempotent. However, the observation is confirmed that Γ

_{r}tends to inflate the spectrum of

*M*for

*r*>1, as Γ

_{r}(

*M*) may be regarded as a function of varying

*r*for fixed

*M*, and as such is continuous.