There have been a large number of papers on why we should refrain from using the inverse-gamma distribution as the prior for the variance term in linear regression. However, although half-Cauchy is proposed as the default prior, many people seem to rely on the Metropolis-Hastings (MH) algorithm since the full conditional is not a recognizable distribution that we know of. I wanted to show how we can get around this problem and get a much better sampler for the half-Cauchy variance.

The model is very simple.’

y_{i}=\mathbf{x}_{i}'\boldsymbol{\beta}+e_{i},\qquad e_{i}\sim\mathcal{N}(0,\sigma^{2})

We will assign a Gaussian prior for \boldsymbol{\beta}, \boldsymbol{\beta}\sim\mathcal{N}(\mu_{0},\mathbf{I}) and a standard half-Cauchy prior for \sigma^{2}, \sigma^{2}\sim\mathrm{C}^{+}(0,1). Half-Cauchy has a density function of the form f(x)=2/(\pi(1+x^{2})). The joint-posterior looks like

L(\boldsymbol{\beta},\sigma^{2}\,|\,\mathbf{y})\pi(\boldsymbol{\beta},\sigma^{2}) \propto (\sigma^{2})^{-n/2}\exp\left(-\dfrac{(\mathbf{y}-X\boldsymbol{\beta})'(\mathbf{y}-X\boldsymbol{\beta})}{2\sigma^{2}}\right)\exp\left(-\dfrac{\boldsymbol{\beta}'\boldsymbol{\beta}}{2}\right)\dfrac{1}{1+(\sigma^{2})^{2}}

Since \pi(\boldsymbol{\beta}\,|\,\text{rest}) is not the distribution of interest, I will just skip it. What I want to do here is sample \sigma^{2} efficiently without relying on MH. I suggest you use the ‘slice sampler’. I don’t want to go deep into what slice sampling is doing theoretically but there is a hand-waving way of determining whether a slice sampler would be available or not. That is, look at the full conditional and get rid of some term. If it is of some standard distribution, then you’re set to go.

Now here is how it works. Let’s pick out terms and form the full conditional of \sigma^{2}.

\pi(\sigma^{2}\,|\,\text{rest}) \propto (\sigma^{2})^{-n/2}\exp\left(-\dfrac{(\mathbf{y}-X\boldsymbol{\beta})'(\mathbf{y}-X\boldsymbol{\beta})}{2}\middle/\sigma^{2}\right)\dfrac{1}{1+(\sigma^{2})^{2})}

If (1+(\sigma^{2})^{2})^{-1} term disappears, it is the kernel of an inverse-gamma distribution. This is where the uniform random variable comes to rescue. Take the nuisance term and set it as the maximum value of the uniform variable. U\sim \mathcal{U}(0,(1+(\sigma^{2})^{2})^{-1}). Of course, this is when \sigma^{2} is given. Then, the joint full conditional would be

p(U,\sigma^{2}\,|\,\text{rest}) \propto  (\sigma^{2})^{-n/2}\exp\left(-\dfrac{(\mathbf{y}-X\boldsymbol{\beta})'(\mathbf{y}-X\boldsymbol{\beta})}{2}\middle/\sigma^{2}\right)\mathbb{I}\left(0<U<\dfrac{1}{1+(\sigma^{2})^{2}}\right)

If you integrate out the uniform variable, it will recover the original full conditional. Let’s think reversely this time. If we, say, have the uniform variable U given, what will the distribution of the variance \sigma^{2} be? We should first reverse the constraint so that it is arranged in terms of \sigma^{2}, which becomes

\sigma^{2} < \left(\dfrac{1}{U}-1\right)^{1/2}

Therefore, if we know the value of U, \sigma^{2} should be drawn from the truncated inverse gamma distribution!

\sigma^{2}\sim \mathcal{IG}\left(\dfrac{n}{2}-1,\dfrac{(\mathbf{y}-X\boldsymbol{\beta})'(\mathbf{y}-X\boldsymbol{\beta})}{2}\right)\mathbb{I}\left(\sigma^{2}<\left(\dfrac{1}{u}-1\right)^{1/2}\right) It is usually more comfortable to sample from the truncated gamma distribution than the inverse gamma distribution so it is better to say \dfrac{1}{\sigma^{2}} \sim \mathcal{G}a\left(\dfrac{n}{2}-1,\dfrac{(\mathbf{y}-X\boldsymbol{\beta})'(\mathbf{y}-X\boldsymbol{\beta})}{2}\right)\mathbb{I}\left(\dfrac{1}{\sigma^{2}}>\left(\dfrac{1}{u}-1\right)^{-1/2}\right)

A truncated gamma distribution has a closed-form inverse-cdf so sampling from it is not hard. I would like to attach my matlab code for the sampler but wordpress doesn’t seem to have such a functionality so if anybody needs one, please leave a comment with your email address and I will send you the file.