Model relationships can still be non-linear

In order to use NIR spectroscopy for chemical analysis, you first need to develop a calibration model that relates the NIR signal to the property of interest, usually the concentration of a specific molecule. Now, this is easy enough if the relationship between the NIR signal and the property is linear, in which case it can be done using simple multivariate statistical techniques such as principal components analysis (PCA) or partial least squares (PLS) regression.

In many cases, though, the relationship between the NIR signal and the property isn’t linear, either because the relationship is inherently non-linear or because background noise, such as light scattering effects, introduces unavoidable complexity. Producing calibration models for this kind of non-linear relationship requires more advanced statistical techniques and although several are available they each tend to have their strengths and weaknesses. So a team of researchers from Denmark, led by Wangdong Ni from the University of Copenhagen, set about comparing a whole suite of these technique to find out which is best at modelling non-linear NIR data.

The tested the ability of the techniques, which comprised Kernel PLS (KPLS), Support Vector Machines (SVM), Least-Squares SVM (LS-SVM), Relevance Vector Machines (RVM), Gaussian Process Regression (GPR), Artificial Neural Network (ANN) and Bayesian ANN (BANN), on three different sets of non-linear NIR data. The first set analysed the concentration of the active substance in drug tablets; the second set analysed the concentration of moisture, fat and protein in pork meat; and the third set analysed the protein content of wheat kernels. Using each of the techniques to produce calibration models for each data set, Ni and his team assessed the techniques in terms of computational time, model interpretability, potential over-fitting using the non-linear models on linear problems, robustness to small or medium sample sets, and robustness to pre-processing.

As they report in Analytica Chimica Acta, GPR and BANN turned out to be the most effective techniques for non-linear NIR data. Both techniques produced calibration models that accurately predicted the properties of interest with and without pre-processing, and were able to handle datasets with both strong and weak nonlinearity. Following close behind was LS-SVM, which produced models that gave accurate predictions without needing a lot of computational time.

Blog tags: