My first post was about how to slice-sample from the variance component of linear regression when the half-Cauchy prior is assigned.


Enough for the review of what the horseshoe distribution is and let’s move on to what I found interesting while I was reading this paper. The variance component of the linear model could be sampled from its full conditional because the distribution for slice sampling was not degenerate. But if we want to assign a horseshoe prior for each regression coefficient, which essentially shrinks the estimates toward the origin, we can’t use slice sampling.

(A short review for what a horseshoe distribution is. Horseshoe distribution does not have a PDF that we can write down but it can be written as a scale mixture of normal,

\begin{array}{rcl}X\,|\,\sigma &\sim& \mathcal{N}\left(0,\sigma^{2}\right)\\ \sigma&\sim& \mathrm{C}^{+}\left(0,1\right) \end{array}

where \mathrm{C}^{+} denotes the half-Cauchy distribution which is a sort of a truncated distribution. Then, the marginal of X becomes the horseshoe which in fact cannot be represented in terms of a PDF unfortunately.)

So what they’ve done is, they found out a hierarchical representation of the half-Cauchy distribution.

\begin{array}{rcl}X^{2}\,|\,a&\sim& \mathcal{IG}\left(\dfrac{1}{2},\dfrac{1}{a}\right)\\ a &\sim& \mathcal{IG}\left(\dfrac{1}{2},\dfrac{1}{A^{2}}\right) \end{array}

Then, the marginal X\sim \mathrm{C}^{+}\left(0,A\right). Let’s prove this.

It’s just integration.

\begin{array}{rcl} p_{X^{2}}\left(x\right) &=& \displaystyle\int_{0}^{\infty} \dfrac{1}{\Gamma(1/2) \sqrt{a}} \left(x^{2}\right)^{-3/2}\exp\left(-\dfrac{1}{ax^{2}}\right) \cdot \dfrac{1}{\Gamma(1/2)A}a^{-3/2}\exp\left(-\dfrac{1}{aA^{2}}\right)\,da\\&=& \dfrac{1}{x^{3}A\pi}\displaystyle\int_{0}^{\infty} a^{-2}\exp\left(-\dfrac{1}{a}\left(\dfrac{1}{x^{2}}+\dfrac{1}{A^{2}}\right)\right)\,da\\ &=& \dfrac{1}{\pi Ax^{3}}\left(\dfrac{1}{x^{2}}+\dfrac{1}{A^{2}}\right)^{-1}\\ &=& \dfrac{1}{\pi A}\left(x+\dfrac{x^{3}}{A^{2}}\right)^{-1} \end{array}

But this is the density of X^{2} not X so we use change of variable, U = X. Then, U=\sqrt{X^{2}} and U^{2} = X^{2}\implies 2u\,du=dx^{2}.

\begin{array}{rcl} p_{X}(u) &=& \dfrac{1}{\pi A}\left(u+\dfrac{u^{3}}{A^{2}}\right)^{-1}\cdot 2u\\ &=& \dfrac{2}{\pi A}\left(1+\dfrac{u^{2}}{A^{2}}\right)^{-1} \end{array}

That is the density function of a half-Cauchy(0,A) distribution. Using such a hierarchical representation simplifies the derivation of Gibbs sampler and thus facilitates the inference.

 


[1] Wand, M. P., Ormerod, J. T., Padoan, S. A., & Fuhrwirth, R. (2011). Mean field variational Bayes for elaborate distributions. Bayesian Analysis, 6(4), 847-900.

Advertisements