Information recovery from near infrared data


  • Robert Scott Anderssen
  • Frank Robert de Hoog
  • Ian J. Wesley



derivative spectroscopy, calibration and prediction, information recovery, NIR (Near Infrared), partial least squares, SVD, PCR


In many practical situations, classical modelling protocols are either inappropriate or infeasible for the recovery of an estimate of some specific property of an object, such as the protein content of wheat, from available indirect (spectroscopic) measurements. In such situations, some form of calibration-and-prediction (machine learning) is a popular alternative. For a representative set of samples, inexpensive and rapidly available indirect (spectroscopic) encapsulations of the property are recorded for each sample along with an independent laboratory measurement of the value of the specified property. Because many more indirect measurement values are recorded for each sample than the number of samples tested, the resulting system is highly under-determined in the sense of performing the calibration step: the identification of a predictor which can be applied to the indirect measurements of a new sample to predict its value of the property. Various dimension reduction methodologies have been proposed for performing the calibration step, including principal component regression, partial least squares, independent component analysis and neural network analysis. Independently, because of the high accuracy with which near infrared spectra are recorded using computer controlled instrumentation, derivative spectroscopy techniques can be utilised to explore differences in the molecular structure of cereal grains. For the optimisation of the recovery of estimates of the property from the indirect measurements of new samples, two questions are explored in this article; one practical, the other theoretical: (i)~To what extent should preprocessing (such as fourth differentiation) be applied to the indirect measurements before the calibration step is performed? (ii)~Is there an algorithmic advantage in viewing partial least squares as an implementation of simultaneous minimisation? References
  • B. Anderssen, F. de Hoog, and M. Hegland. A stable finite difference ansatz for higher order differentiation of non-exact data. Bull. Austral. Math. Soc., 58(2):223--232, 1998.
  • R. S. Anderssen, E. Carter, B. G. Osborne, and I. J. Wesley. Joint inversion of multi-modal spectroscopic data of wheat flours. Appl. Spectro., 59(7):920--925, JUL 2005.
  • R. S. Anderssen and F. R. de Hoog. Finite-difference methods for the numerical differentiation of non-exact data. Computing, 33(3--4):259--267, 1984.
  • R. S. Anderssen and R. Haraszi. Characterizing and exploiting the rheology of wheat hardness. Euro. Food Res. Tech., 229(1):159--174, MAY 2009. doi:10.1007/s00217-009-1037-9
  • R. S. Anderssen and M. Hegland. Derivative Spectroscopy---An enhanced role for numerical differentiation. J. Integ. Eqn. Appl., 22(3):355--367, 2010. doi:10.1216/JIE-2010-22-3-355
  • R. S. Anderssen, B. G. Osborne, and I. J. Wesley. The application of localisation to near infrared calibration and prediction through partial least squares regression. JNIRS, 11(1):39--48, 2003.
  • L. Elden. Partial least-squares vs. Lanczos bidiagonalization---I: analysis of a projection method for multiple regression. Comp. Stats and Data Anal., 46(1):11--31, 2004. doi:10.1016/S0167-9473(03)00138-5
  • I. E. Frank. Beyond linear least-squares regression. Trac-Trends in Anal. Chem., 6(10):271--275, 1987.
  • I. E. Frank. Intermediate least-squares regression method. Chemo. Intel. Lab. Systems, 1(3):233--242, 1987.
  • G. H. Golub and C. Reinsch. Singular value decomposition and least squares solutions. Numer. Math., 14(5):403--420, 1970.
  • R. Gosselin, D. Rodrigue, and C. Duchesne. A Bootstrap-VIP approach for selecting wavelength intervals in spectral imaging applications. Chemo. Intell. Lab. Systems, 100(1):12--21, 2010. doi:10.1016/j.chemolab.2009.09.005
  • A. Hyvarinen and E. Oja. Independent component analysis: algorithms and applications. Neural Net., 13(4--5):411--430, {} 2000.
  • T. Naes, T. Isaksson, T. Fearn, and T. Davies. A User-Friendly Guide to Multivariate Calibration and Classification. NIR Publications, Chichester, UK, 2002.
  • B. G. Osborne, T. Fearn, and P. H. Hindle. Practical NIR Spectroscopy with Applications in Food and Beverage Analysis. Longman Scientific and Technical, Harlow, UK, 1993. McGraw-Hill Series in Higher Mathematics.
  • I. J. Wesley, O. Larroque, B. G. Osborne, N. Azudin, H. Allen, and J. H. Skerritt. Measurement of gliadin and glutenin content of flour by NIR spectroscopy. J. Cereal Sci., 34(2):125--133, 2001.
  • P. R. Wiley, G. J. Tanner, P. M. Chandler, and R. S. Anderssen. Molecular classification of barley mutants using derivative spectroscopic analysis of NIR spectra of their wholemeal flours. J. Agri. Food Chem, (2), 2009.





Proceedings Computational Techniques and Applications Conference