Logistic distribution is ones of the most widely used distributions in statistics. Even though we are not conscious of it, we are using the logistic distribution whenever we do logistic regression. The frequentist way of fitting a logistic model is quite straightforward. However, the Bayesian logistic model isn’t quite as easy due to its non-conjugacy. In those cases, it is very useful to use the hierarchical representation of the distribution. This post is aimed at reviewing 2 types of modelling logistic regression in a Bayesian way.
The first is to use the logistic distribution directly. Choi and Hobert (2013) proposes a Gibbs sampler using the Polyá-Gamma distribution which isn’t easy to sample from. Those who are interested in the random variate generation algorithms are referred to Devroye (1986). In fact, Devroye released the book free of charge online (here). Anyway, let’s review logistic regression briefly. The probability density function (PDF) of a logistic distribution is as follows:
for location parameter and and scale parameter . The logistic regression is modelled as the linear combination of regressors mapped to the probability parameter of a Bernoulli distribution.
Such a function can be any cumulative distribution function (CDF). If it’s chosen as the CDF of a normal distribution, it’s called the ‘probit regression’ and if it is the CDF of a logistic distribution, it’s called the ‘logistic regression’, as we all know. This is why the likelihood function of looks like
For simplicity, let’s assign a standard multivariate normal distribution for . The posterior would be
The trick here is to introduce another random variable that is totally irrelevant if is given and could be integrated out so as to restore the original posterior. But we do it so that it makes sampling easier for . We all know that the denominator is the hindrance here so let’s make it gone.
Let’s not care about that ugly function and if we use the identity
the joint posterior is expressed as
Surprisingly, is the density function of a Polyá-Gamma distribution with parameters (1,c), i.e. . Since we have introduced a random variable that is irrelevant, we call this the ‘data augmentation’ technique. This implies that if we can generate Polyá-Gamma random variates, it gets much easier to sample from rather than since the density of the Polyá-Gamma density cancels out the cumbersome denominator of the logistic CDF. For completeness,
where . So the resulting expression is a multivariate normal distribution whose mean vector and covariance matrix are
Therefore, by introducing Polyá-Gamma random variables, the Gibbs sampler proceeds by repeating sampling from the following two distributions
Next method is using the hierarchical representation of the logistic distribution as a scale mixture of normal as proposed in Stefanski (1991). That is the logistic PDF can be recast as
where follows the Kolmogorov-Smirnov (KS) distribution whose density is
This time we don’t model the regression model directly but we will rather resort to the latent variable representation. That is, we will assume there is a latent variable that decides which value would take on between .
We merely replace with its hierarchical representation. Let be the PDF of the standard normal distribution. Then the joint posterior becomes
The key point to remember is that if we change our perspective about the order in which and are determined, that is, the modelling assumed once is determined (though it is not observed), we get to see what is. However, during the estimation, we think since is decided, should be have the correct value accordingly. For example, if , then because we have already observed which means must have been some negative value. So the indicator functions before the normal density just tell us that the support (informally the range of values that a random variable can take on) of should be ‘truncated’.
where are left-truncated and right-truncated normal distributions at zero. isn’t affected by the indicator functions so we go as usual:
Next is . Since we do not know what distribution the full conditional of is, we shall resort to a Metropolis-Hastings scheme. That is, we sample from the prior , and accept it with probability
So I have reviewed 2 modelling methods for Bayesian logistic regression and they both have advantages. However, I would say the Kolmogorov-Smirnov modelling is more general in the sense that it can be used with more complex models such as nonparametric regression.
For the samplers, I have Matlab codes here.
- Choi, H. M., & Hobert, J. P. (2013). The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electronic Journal of Statistics, 7, 2054-2064.
- “Non-Uniform Random Variate Generation, Luc Devroye, Springer-Verlag, 1986,
the University of California, 16 Dec 2010, ISBN:0387963057, 9780387963051”
- Stefanski, L. A. (1991). A normal scale mixture representation of the logistic distribution. Statistics & Probability Letters, 11(1), 69-70.