OPAW: Independent Component Analysis

Independent Component Analysis (ICA) fascinates me [DEMO]. It’s a neat method for isolating signals from a signal mix, as long as multiple recordings of the signal mix from different perspectives are available. An application would be isolating speakers in a conference room with many people talking (over each other), while a set of different microphones records the chatter at the same time at different positions. ICA has been around since the 80s, but was popularised by Pierre Comon who based it on a formal mathematical framework, which I personally find both exciting (I get to read math I understand!) and sad, because ICA is actually accessible through intuition.

The paper starts like this:

Assume the following linear statistical model: y = M x + v
where x , y and v are random vectors with values in or C and with zero mean and finite covariance, M is a rectangular matrix with at most as many columns as rows, and vector x has statistically independent components.

This is as dry as it gets, but there’s a simple translation. In this context, “random” doesn’t mean “toss the dice”, but more like “real” or “not computed”. y is the recorded signal mix. x is the original signal; that’s what we’re after, but we don’t know that yet. M is the mixing matrix, eg. the acoustic model of the conference room where the distance between speakers and microphones attenuates the signals and the room mixes them into a chatter. v is noise. We know y and want to figure out x.

The action plan:

normalise y
compute pair-wise scores for all y_i,y_j
extract an angle θ from the scores y_i,y_j
rotate the signal pairs by that angle
if the angle is below a threshold for all pairs then we are done and y ≈ x
goto 2

ICA is iterative: the solution improves with each iteration until it is good enough.

Of noise and signals

The more independent signals you mix together, the more the resulting mix sounds like noise. The probability distribution of Gaussian noise is the bell curve we all known and dread:

The x-axis is the loudness of the signal, the y-axis is the chance of the particular loudness being observed. ICA’s key observation is that the less a signal looks like noise, the more likely it is to be a useful signal. The mix of recorded signals should look more like noise than the component signals. Thus, ICA tries to decompose the signal mix into components that don’t look like noise. So the first task is: assign a “not-noise-like” score to a signal.

ICA uses the 4th order cumulant [CUML]. In statistics, the 1st cumulant is the statistical mean, the 2nd is variance, the 3rd is skewness κ₃=E[y₃] and the 4th is κ₄=E[y₄]−3(E[y₂])². It so happens that κ₄ = 0 for a perfect Gaussian curve. So now we have a way of rating a signal.

The next step is to decompose the signal mixes into components and, you guessed it, we’ll try to maximise the cumulant for each component. ICA does this by looking at pairs of y signals and computing 5 pair-wise cumulants for them: kurtosis (“spikiness”) of y_i (with itself), of y_j (again with itself), the product of y_iy_j, the product of y_jy_i, and the shared variance. The scoring function combines those pair-wise cumulants with an angle θ.

Combines? θ? Ok, here’s the hardest part of the paper. A Givens rotation matrix [GIRM] mixes two 1D arrays (the recorded signals) under an angle to a new 2D array, and the new mixed signals can be read from the 2D array. Why on Earth would we want to mix two already mixed signals? The twist: the matrix is rotated by θ, which is chosen such that it maximises the scoring function. When y_i and y_j are rotated by the θ rotation matrix, the resulting signals are “less” entangled. This process is applied iteratively over the rotated signals and with each iteration, the θ angle gets smaller and smaller, which is an obvious exit condition for the algorithm.

Noise

The math largely ignores noise and factors it into M and x. Later there is a section in the paper which deals with ICA’s practical robustness in the presence of noise; unsurprisingly, more data leads to higher noise tolerance.