Why did Chris Wallace use a scale-free prior for hidden factor analysis?


We may never know, of course. But for other priors that might be fat tailed (eg number of classes) he tended to use an exponential prior.

To be precise (though omitting a normalization constant and a scaling factor),

h(b) = (1+|b|^2)^(-(K+1)/2)

where b is a hidden factor and K is the dimension of the data. Since b is a K-vector, the prior on |b| (ignoring that annoying +1, which is only important for small |b|) is about

h(|b|) = |b|^(-(K+1)/(K-1))

which may or may not be important. (It's just a little higher than is necessary to keep the prior normalizable. As the dimension increases the prior gets a fatter and fatter tail.)

Anyway. Consider a multiple hidden factor problem. Chris Wallace was expressing an expectation that such problems would commonly conform to this prior. Even in a single problem, there might be a mix of short factors and long factors conforming roughly to this prior.

So a problem will tend to have this mix of big hidden factors and small hidden factors. Or it might have several clusters, each with its own set of factors.


Anyway, it would be interesting to know why Chris chose this prior. Presumably mathematical elegance played some part.