Bayesian 2-LC Fixed Effects Model

Authors

Ian Schiller

Nandini Dendukuri

Published

November 14, 2023

Introduction

This article is intended to give the reader basic instructions on how to run an rjags script to perform a Bayesian analysis of diagnostic test accuracy and disease prevalence in the absence of a perfect reference test with a 2-latent class fixed effects model (Dendukuri and Joseph (2001)). The script is implemented in R using the rjags package, which interfaces with the JAGS (Just Another Gibbs Sampler) software for Bayesian analysis.

The term “2-latent class” refers to the presence of two hidden or latent classes in the data - often referred to in diagnostic test accuracy research as target condition positive and target condition negative.

Conditional dependence among observed diagnostic tests is modeled using the covariance between tests within the target condition positive and target condition negative populations.

An example dataset is provided for the user to familiarize themself with the script. It is from a study conducted to estimate the prevalence of Strongyloides infection among a group of Cambodian refugees to Canada (Joseph and Coupal (1995)).

Download `rjags` Script

The full script, can be downloaded here.

Script Instructions

Suggested `R Package`

Below is a list of packages we recommend installing. Aside from rjags, which is mandatory, the other packages are optional when performing LC analysis. We do recommend them as they are used in the script. Be aware that some functionalities of the script may not work if you do not install every package listed below.

require(rjags)     # PACKAGE TO RUN THE jags MODEL. MANDATORY
require(MCMCvis)   # THIS PACKAGE CONTAINS THE MCMCsummary FUNCTION USED IN THIS SCRIPT
require(mcmcplots) # USED FOR THE CREATION OF THE CONVERGENCE PLOTS
require(DT)        # THIS LIBRARY ALLOWS A NICE DATA DISPLAY WITH THE SEARCH BAR OPTION.

Strongyloides Dataset

The Strongyloides dataset is taken from a study conducted to estimate the prevalence of Strongyloides infection among a group of Cambodian refugees to Canada). It includes participants with results on 2 diagnostic tests. From a notation point of view, we suppose here that the Stool examination is the reference test and the Serology test is the index test.

n11 cell = Number of patients positive on both tests
n10 cell = Number of patients positivie on first test (index test) and negative on second test (reference test)
n01 cell = Number of patients negative on first test (index test) and positive on second test (reference test)
n00 cell = Number of patients negative on both tests

We recommend to save the Strongyloides dataset in a . txt extension file as Strongyloides.txt in the same folder as the script. The data can be uploaded with the read.table function. The data comprises a single row and 4 columns whose entries are the number of patients falling in each of the 4 categories defined above (n11, n10, n01, n00).

DATA <- read.table("Strongyloides.txt", header=TRUE)
datatable(DATA, extensions = 'AutoFill')#, options = list(autoFill = TRUE))

	n11	n10	n01	n00
1	38	87	2	35

The data need to be stored in a list object which we will call dataList. N denotes the total sample size and ythe cross-classification of the diagnostic test results given above.

# Cross-classification results of the serology test and stool examination
y <- c(DATA$n11, DATA$n10, DATA$n01, DATA$n00)
# Number of patients
N = sum(y)

dataList <- list(y=y, N=N)

Bayesian Latent Class Fixed Effects Model

Implementing the Bayesian 2-latent class fixed effects model in rjags involves specifying the priors, likelihood, and the structure of the latent classes. Markov Chain Monte Carlo (MCMC) methods, such as Gibbs sampling, are then employed to estimate the posterior distribution of the model parameters.

The rjags model is saved on the current directory (where your script and data should already be saved ideally) as model.txt. Below is the model following the JAGS syntax.

modelString =

"model {

  #============
  # LIKELIHOOD 
  #============
  
  y[1:4]~dmulti(p12[1:4],N)
  
  # probabilities of observing different cross-classifications of two dichotomous diagnostic tests
  p12[1]<- prev*(se[1]*se[2]+covs12)+(1-prev)*((1-sp[1])*(1-sp[2])+covc12)
  p12[2]<- prev*(se[1]*(1-se[2])-covs12)+(1-prev)*((1-sp[1])*sp[2]-covc12)
  p12[3]<- prev*((1-se[1])*se[2]-covs12)+(1-prev)*(sp[1]*(1-sp[2])-covc12)
  p12[4]<- prev*((1-se[1])*(1-se[2])+covs12)+(1-prev)*(sp[1]*sp[2]+covc12)
    
    #=======================================
  # upper limits of covariance parameters
  #=======================================
  us<- min(se[1],se[2])-(se[1]*se[2])
  uc<- min(sp[1],sp[2])-(sp[1]*sp[2])
  
  #==============================================
  # adjustment of range of covariance parameters
  #==============================================
  covs12<- u.covs12*us
  covc12<- u.covc12*uc

  #==================================================
  # Prior distributions of prevalence, sensitivities
  # and specificities
  #==================================================
  prev~dbeta(1,1)
  se[1]~dbeta(21.96,5.49)
  sp[1]~dbeta(4.1,1.76) 
  se[2]~dbeta(4.44,13.31)
  sp[2]~dbeta(71.25,3.75)

  #==============================================================
  # prior distribution of transformed covariances on (0,1) range
  #==============================================================
  u.covs12~ dbeta(1,1)
  u.covc12~ dbeta(1,1)


}"

writeLines(modelString,con="model.txt")

Prior Distributions

The prior distributions used in the script are inspired from those provided in Dendukuri and Joseph (2001).

A Beta(1,1) prior distribution is used for the prevalence parameter which is equivalent to a uniform(0,1) vague prior.

Beta prior distributions for the sensitivity and specificity are as follows:

For test 1 (index test =Serology test)

$s e [1] \sim B e t a (21.96, 5.49)$
$s p [1] \sim B e t a (4.1, 1.76)$

For test 2 (reference test =Stool examination)

$s e [2] \sim B e t a (4.44, 13.31)$
$s p [2] \sim B e t a (71.25, 3.75)$

The covariance parameters $c o v s 12$ and $c o v c 12$ follow a Generalized beta distribution with lower and upper limits determined by the sensitivity and specificity as follows:

$(s e [1] - 1) \cdot (1 - s e [2]) \leq c o v s 12 \leq m i n (s e [1], s e [2]) - s e [1] \cdot s e [2]$
$(s p [1] - 1) \cdot (1 - s p [2]) \leq c o v c 12 \leq m i n (s p [1], s p [2]) - s p [1] \cdot s p [2]$

This is implemented in our program by creating variables ( $u . c o v s 12$ and $u . c o v c 12$ ) that follow a Beta(1,1) distribution and then transforming them to lie within these limits.

Both covariance lower bounds are truncated at 0 to reflect the authors were only interested in the situation when the two tests are positively correlated.

Initial Values

Initial values are needed as the starting point for estimating and updating parameters of the model in rjags. We strongly encourage the user to provide their own method of generating initial values rather than counting on rjags to generate them. Initial values can be provided in different ways in rjags. We propose one method below based on the creation of a home made function to randomly generate initial values based on the prior distributions. For more options on how to provide initial values, please see A guide on how to provide initial values in rjags

# Initial values
GenInits  = function(){
  
    se1 <- rbeta(1,21.96,5.49)
    sp1 <- rbeta(1,4.1,1.76) 
    se2 <- rbeta(1,4.44,13.31)
    sp2 <- rbeta(1,71.25,3.75)
    u.covs12 <- rbeta(1,1,1)
    u.covc12 <- rbeta(1,1,1)
    prev <- rbeta(1,1,1)
    
    se <- c(se1, se2)
    sp <- c(sp1, sp2)
   
    list(
      se = se,
      sp = sp,
      prev = prev,
      u.covs12=u.covs12,
      u.covc12=u.covc12,
      .RNG.name="base::Wichmann-Hill",
      .RNG.seed=321
    )

}

Below we use our created GenInits function to initialize 3 chains. ** We provide a Seed value for reproducibility : **

# Initial values
set.seed(123)
initsList = vector('list',3)
for(i in 1:3){
    initsList[[i]] = GenInits()
}

Compiling the model with `rjags`

We compile the model with the jags.model function.

# Compile the model
jagsModel = jags.model("model.txt",data=dataList,n.chains=3,n.adapt=0, inits=initsList)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 1
   Unobserved stochastic nodes: 7
   Total graph size: 58

Initializing model

Posterior Sampling

The posterior samples for the parameters of the model are obtained by running more than one independent chain having its own starting values to assess convergence of the MCMC algorithm. Here in the script, we elected to run 3 separate chains.

The posterior sampling step is in fact a 2-part step.

First we discard a certain number of iterations with the update function. This step is often referred to as the Burn-in step and is needed to prevent the posterior samples including samples obtained while the algorithm had not yet converged. Here, we elected to discard 5,000 iterations.
Then we use the coda.samples function to sample another 5,000 iterations from the posterior distribution. The posterior sample assembled is stored in the output object.

Generally, the number of burn-in and sampling iterations needed will depend on the complexity of the model, the prior distribution as well as the quality of the initial values.

#jagsModel$state(internal=FALSE)

# Burn-in iterations 
update(jagsModel,n.iter=5000)
# Parameters to be monitored
parameters = c( "se","sp", "covs12", "covc12", "prev")

# Posterior samples
posterior_results = coda.samples(jagsModel,variable.names=parameters,n.iter=5000)
output = posterior_results

Posterior Results

The MCMCsummary function will provide the following posterior statistics.

The mean,
The standard deviation (sd),
The median (50%)
The 95% credible interval (2.5% and 97.5%).

Convergence statistics are also provided.

Rhat is the Gelman-Rubin statistic (Gelman and Rubin (1992), Brooks and Gelman (1998)). It is enabled when 2 or more chains are generated. It evaluates MCMC convergence by comparing within- and between-chain variability for each model parameter. Rhat tends to 1 as convergence is approached.
n.eff is the effective sample size (Gelman et al. (2013)). Because the MCMC process causes the posterior draws to be correlated, the effective sample size is an estimate of the sample size required to achieve the same level of precision if that sample was a simple random sample. When draws are correlated, the effective sample size will generally be lower than the actual numbers of draws resulting in poor posterior estimates.

res = MCMCsummary(output, digits=4)
datatable(res, extensions = 'AutoFill')

Show entries

Search:

	mean	sd	2.5%	50%	97.5%	Rhat	n.eff
se[1]	0.838345	0.0485985	0.744608	0.838054	0.929415	1	2736
se[2]	0.286218	0.0498875	0.204984	0.280705	0.403121	1	2219
sp[1]	0.646426	0.18469	0.284364	0.658249	0.950355	1	2713
sp[2]	0.949255	0.0254577	0.890332	0.953281	0.986471	1	6165
covs12	0.0283864	0.0142935	0.00309211	0.0280147	0.0574662	1	3833
covc12	0.0163242	0.014316	0.000553171	0.0124731	0.0533023	1	6144
prev	0.822728	0.116468	0.53762	0.841761	0.988028	1	1517

Showing 1 to 7 of 7 entries

Previous1Next

Convergence Diagnostic Plots

Visual inspection of convergence for key parameters can be studied using different tools. We opted to write our own code as it gives us more flexibility and control on what we want to display. For a given parameter, panel (a) shows posterior density plot; (b) the running posterior mean value; and (c) the history plot. Each chain is identified by a different color. Similar behavior from all 3 chains would suggest the algorithm has converged. For example, here is the 3-panel plot for the prevalence parameter.

Index Test SEROLOGY TEST

Sensitivity and Specificity

for(k in 1) {
  for(i in 1:2) {
    #  tiff(paste(parameters[i],"[",j,"].tiff",sep=""),width = 23, height = 23, units = "cm", res=200)
    par(oma=c(0,0,3,0))
    layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE))
    denplot(output, parms=c(paste(parameters[i],"[",k,"]",sep="")), auto.layout=FALSE, main="(a)", xlab=paste(parameters[i],"",sep=""), ylab="Density")
    rmeanplot(output, parms=c(paste(parameters[i],"[",k,"]",sep="")), auto.layout=FALSE, main="(b)")
    title(xlab="Iteration", ylab="Running mean")
    traplot(output, parms=c(paste(parameters[i],"[",k,"]",sep="")), auto.layout=FALSE, main="(c)")
    title(xlab="Iteration", ylab=paste(parameters[i],"[",k,"]",sep=""))
    mtext(paste("Diagnostics for ", parameters[i],"[",k,"]","",sep=""), side=3, line=1, outer=TRUE, cex=2)
    #  dev.off()
  }
}

Reference Test STOOL EXAMINATION

Sensitivity and Specificity

for(k in 2) {
  for(i in 1:2) {
    #  tiff(paste(parameters[i],"[",j,"].tiff",sep=""),width = 23, height = 23, units = "cm", res=200)
    par(oma=c(0,0,3,0))
    layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE))
    denplot(output, parms=c(paste(parameters[i],"[",k,"]",sep="")), auto.layout=FALSE, main="(a)", xlab=paste(parameters[i],"",sep=""), ylab="Density")
    rmeanplot(output, parms=c(paste(parameters[i],"[",k,"]",sep="")), auto.layout=FALSE, main="(b)")
    title(xlab="Iteration", ylab="Running mean")
    traplot(output, parms=c(paste(parameters[i],"[",k,"]",sep="")), auto.layout=FALSE, main="(c)")
    title(xlab="Iteration", ylab=paste(parameters[i],"[",k,"]",sep=""))
    mtext(paste("Diagnostics for ", parameters[i],"[",k,"]","",sep=""), side=3, line=1, outer=TRUE, cex=2)
    #  dev.off()
  }
}

Covariance Parameters (target condition positive)

# Plots to check convergence for parameters shared across studies:
for(i in 3) {
  jpeg(paste(result_folder,"/",parameters[i],".jpeg",sep=""),width = 23, height = 23, units = "cm", res=200)
  par(oma=c(0,0,3,0))
  layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE))
  denplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(a)", xlab=paste(parameters[i],"",sep=""), ylab="Density")
  rmeanplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(b)")
  title(xlab="Iteration", ylab="Running mean")
  traplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(c)")
  title(xlab="Iteration", ylab=paste(parameters[i], sep=""))
  mtext(paste("Diagnostics for ", parameters[i], sep=""), side=3, line=1, outer=TRUE, cex=2)
  dev.off()
}

Covariance Parameters (target condition negative)

# Plots to check convergence for parameters shared across studies:
for(i in 4) {
  jpeg(paste(result_folder,"/",parameters[i],".jpeg",sep=""),width = 23, height = 23, units = "cm", res=200)
  par(oma=c(0,0,3,0))
  layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE))
  denplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(a)", xlab=paste(parameters[i],"",sep=""), ylab="Density")
  rmeanplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(b)")
  title(xlab="Iteration", ylab="Running mean")
  traplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(c)")
  title(xlab="Iteration", ylab=paste(parameters[i], sep=""))
  mtext(paste("Diagnostics for ", parameters[i], sep=""), side=3, line=1, outer=TRUE, cex=2)
  dev.off()
}

Prevalence

for(i in 5) {
  # tiff(paste(parameters[i],".tiff",sep=""),width = 23, height = 23, units = "cm", res=200)
  par(oma=c(0,0,3,0))
  layout(matrix(c(1,2,3,3), 2, 2, byrow = TRUE))
  denplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(a)", xlab=paste(parameters[i],"",sep=""), ylab="Density")
  rmeanplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(b)")
  title(xlab="Iteration", ylab="Running mean")
  traplot(output, parms=c(paste(parameters[i], sep="")), auto.layout=FALSE, main="(c)")
  title(xlab="Iteration", ylab=paste(parameters[i], sep=""))
  mtext(paste("Diagnostics for ", parameters[i], sep=""), side=3, line=1, outer=TRUE, cex=2)
  # dev.off()
}

References

Brooks, S. P., and A Gelman. 1998. “General methods for monitoring convergence of iterative simulations.” Journal of Computational and Graphical Statistics, no. 7: 434–55.

Dendukuri, Nandini, and Lawrence Joseph. 2001. “Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests.” Biometrics, no. March: 158–67. http://www.jstor.org/stable/2676854.

Gelman, A, J. B. Carlin, H. S. Stern, D. B. Dunson, A Vehtari, and D. B. Rbin. 2013. “Bayesian Data Analysis.” Chapman & Hall/CRC Press, London, Third Edition.

Gelman, A, and D. B. Rubin. 1992. “Inference from iterative simulation using multiple sequences.” Statistical Sciences, no. 7: 457–72.

Joseph, Gyorkos, L., and L. Coupal. 1995. “Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard.” American Journal of Epidemiology, no. 141: 263–72.

Citation

BibTeX citation:

@online{schiller2023,
  author = {Ian Schiller and Nandini Dendukuri},
  title = {Bayesian {2-LC} {Fixed} {Effects} {Model}},
  date = {2023-11-14},
  url = {https://www.nandinidendukuri.com/LCA/Bayesian_2-LC_Fixed_Effects_Models.html/},
  langid = {en}
}

For attribution, please cite this work as:

Ian Schiller, and Nandini Dendukuri. 2023. “Bayesian 2-LC Fixed Effects Model.” November 14, 2023. https://www.nandinidendukuri.com/LCA/Bayesian_2-LC_Fixed_Effects_Models.html/.

Introduction

Download rjags Script

Script Instructions

Suggested R Package

Strongyloides Dataset

Bayesian Latent Class Fixed Effects Model

Prior Distributions

Initial Values

Compiling the model with rjags

Posterior Sampling

Posterior Results

Convergence Diagnostic Plots

Index Test SEROLOGY TEST

Sensitivity and Specificity

Reference Test STOOL EXAMINATION

Sensitivity and Specificity

Covariance Parameters (target condition positive)

Covariance Parameters (target condition negative)

Prevalence

References

Citation

Download `rjags` Script

Suggested `R Package`

Compiling the model with `rjags`