The paper I’m about to review is “Quintana, F. A., Johnson, W. O., Waetjen, L. E., & B. Gold, E. (2016). Bayesian nonparametric longitudinal data analysis. Journal of the American Statistical Association, 111(515), 1168-1181.“.

It was very fascinating since I have always been curious how we can add more flexibility to the covariance structure than the forms that we know of, for example, AR structure, symmetric compound structure, exponential covariance structure, etc. I’ve read somewhere that physically it is very reasonable to assume AR structure where correlation dies out as time progresses because pretty much everything in life loses connection if it becomes chronologically far enough.

Quintana et al models the serial correlation as a Dirichlet process mixture of Gaussian processes with exponential covariance function which is often referred to as the Orstein-Uhlenbeck process in a large portion of statistics literature. The AR covariance structure is equivalent to assuming an element that follows a Gaussian process with the exponential covariance function. Thus, DPM of GP would capture the correlation structure of the data with much flexibility.

However, my issue with the paper is threefold:

  • The paper conveniently omits the Gibbs samplers as well as the likelihood function and the prior densities.
  • Too much flexibility in the model also means overfitting. The plots of the fitted curves in the paper so obviously point to overfitting. Also, I have tested the model myself with simulated data and real data (sitka spruce data). The results are hardly generalizable.
  • The overall covariance is estimated very accurately but the components are unidentifiable according to our simulation results.  It could be carefully conjectured that the overlapping components for the covariance structure damage the model identification.

Unlike my prior expectation, the model frustratingly suffers from so many problems. Perhaps any attempt to solve any of the above bullet points could possibly become a new paper?

(If there is anyone who wants my implementation of the MCMC for the paper’s model, please tell me and I’ll email it to you.)


[References]

[1] Quintana, F. A., Johnson, W. O., Waetjen, L. E., & B. Gold, E. (2016). Bayesian nonparametric longitudinal data analysis. Journal of the American Statistical Association, 111(515), 1168-1181.

 

Advertisements