Bayesian variable selection and modelling for metastatic breast cancer data

Sarini Sarini, James McGree, Kerrie Mengersen

Abstract


A Bayesian model selection procedure is applied to data on 90 women with metastatic breast cancer. Protein covariates are measured on nucleus, cytoplasm, membrane, and stroma of primary breast carcinoma and lymph node metastasis tissue. Multiple imputation is performed to deal with missing data. Zellner's g-prior is used in the Bayesian variable selection procedure. The model space is reduced using posterior variable inclusion probabilities, and then posterior model probabilities are used to derive a candidate set of models. Bayesian model averaging is employed to robustly estimate survival time, and the goodness of fit of the derived model assessed by the correlation between estimated and observed survival times. The results show evidence of proteins having different rules in different parts of the tissue cell with respect to patient survival. Therefore, a recommendation is given on which part of the cell to observe certain proteins for prognosis. The models obtained are robust toward censoring and showed correlations between the observed and the predicted data between 0.7 and 0.84.

References
  • J. Adams, P. J. Carder, S. Downey, M. A. Forbes, K. MacLennan, V. Allgar, S. Kaufman, S. Hallam, R. Bicknell, J. J. Walker, F. Cairnduff, P. J. Selby, T. J. Perren, M. Lansdown, and R. E. Banks. Vascular endothelial growth factor (VEGF) in breast cancer: comparison of plasma, serum, and tissue VEGF and microvessel density and effects of tamoxifen. Cancer Res., 60(11):2898–2905, 2000. http://cancerres.aacrjournals.org/content/60/11/2898.
  • F. Bray, J.-S. Ren, E. Masuyer, and J. Ferlay. Global estimates of cancer prevalence for 27 sites in the adult population in 2008. Int. J. Cancer, 132(5):1133–1145, 2013. doi:10.1002/ijc.27711.
  • M. Clyde and E. I. George. Model uncertainty. Stat. Sci., 19(1):81–94, 2004. http://projecteuclid.org/download/pdfview_1/euclid.ss/1089808274.
  • A. R. T. Donders, G. J. M. G. van der Heijden, T. Stijnen, and K. G. M. Moons. Review: a gentle introduction to imputation of missing values. J. Clin. Epidemiol., 59(10):1087–1091, 2006. doi:10.1016/j.jclinepi.2006.01.014.
  • J. Ferlay, I. Soerjomataram, D. F. M. Ervik, F. Bray, R. Dikshit, S. Elser, C. Mathers, M. Rebelo, and D. M. Parkin. GLOBOCAN 2012: Estimated cancer incidence, mortality and prevalence worldwide in 2012. International Agency for Research on Cancer, World Health Organization, 2014. http://globocan.iarc.fr.
  • E. I. George and R. E. McCulloch. Variable selection via Gibbs sampling. J. Am. Stat. Assoc., 88(423):881–889, 1993. doi:10.1080/01621459.1993.10476353.
  • E. I. George and R. E. McCulloch. Approaches for Bayesian variable selection. Stat. Sinica, 7(2):339–373, 1997. http://www3.stat.sinica.edu.tw/statistica/j7n2/j7n26/j7n26.htm.
  • J. Geweke. Variable selection and model comparison in regression, volume 5 of Bayesian statistics, pages 609–620. Oxford University Press, 1996.
  • J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky. Bayesian model averaging: a tutorial. Stat. Sci., 14(4):382–401, 1999. http://www.jstor.org/stable/2676803.
  • D. G. Kleinbaum and M. Klein. Survival Analysis: A self-learning text. Statistics for Biology and Health. New York, Springer-Verlag, 2011. doi:10.1007/0-387-29150-4.
  • E. E. Leamer. Regression selection strategies and revealed priors. J. Am. Stat. Assoc., 73(363):580–587, 1978. doi:10.1080/01621459.1978.10480058.
  • E. E. Leamer. Specification searches: Ad hoc inference with nonexperimental data. Wiley New York, 1978.
  • S. Mallett, P. Royston, R. Waters, S. Dutton, and D. G. Altman. Reporting performance of prognostic models in cancer: a review. BMC Med., 8(1):21, 2010. doi:10.1186/1741-7015-8-21.
  • H. C. McCosker. Prognostic significance of IGF and ECM induced signalling proteins in breast cancer patients. PhD thesis, School of Biomedical Sciences, QUT, 2012. http://eprints.qut.edu.au/53580/.
  • T. J. Mitchell and J. J. Beauchamp. Bayesian variable selection in linear regression. J. Am. Stat. Assoc., 83(404):1023–1032, 1988. doi:10.1080/01621459.1988.10478694.
  • K. Pantel and R. H. Brakenhoff. Dissecting the metastatic cascade. Nat. Rev. Cancer, 4(6):448–456, 2004. doi:10.1038/nrc1370.
  • R. Radpour, Z. Barekati, C. Kohler, W. Holzgreve, and X. Y. Zhong. New trends in molecular biomarker discovery for breast cancer. Genet. Test. Mol. Bioma., 13(5):565–571, 2009. doi:10.1089/gtmb.2009.0060.
  • A. E. Raftery, D. Madigan, and J. A. Hoeting. Bayesian model averaging for linear regression models. J. Am. Stat. Assoc., 92(437):179–191, 1997. doi:10.1080/01621459.1997.10473615.
  • A. Zellner. On assessing prior distributions and Bayesian regression analysis with g-prior distributions, volume 6 of Bayesian inference and decision techniques: Essays in Honor of Bruno De Finetti, pages 233–243. North-Holland, Amsterdam, The Netherlands, 1986.
  • A. Zellner. An introduction to Bayesian inference in econometrics. New York: John Wiley and Sons, 1996.

Keywords


bayesian model averaging; bayesian variable selection; gibbs sampler; metastasis breast cancer; weibull regression; zellner's-g prior

Full Text:

PDF BIB


DOI: http://dx.doi.org/10.21914/anziamj.v55i0.7812



Remember, for most actions you have to record/upload into this online system
and then inform the editor/author via clicking on an email icon or Completion button.
ANZIAM Journal, ISSN 1446-8735, copyright Australian Mathematical Society.