Some considerations about NIR spectroscopy: Closing speech at NIR-2009
Pierre Dardenne, ICNIRS Chairman
Walloon Agricultural Research Centre (CRA-W), Quality of Agricultural Products Department, Chaussée de Namur n^{o}24, 5030 Gembloux, Belgium. E-mail: [email protected]
It is the tradition that the ICNIRS Chairman closes the conference (NIR-2009) and, besides acknowledgments to the participants and the organising team, I have tried to develop some ideas which seemed to me quite important.
Limit of detection
During the last summer there was a big debate on the NIR discussion forum about melamine and its limit of detection; at least two papers published on melamine detection were the starting point of the internet exchanges. In JNIRS,^{1} Karl Norris wrote a letter to the editor and I report his words “...it is necessary to prove that the separation is specific and is not caused by one or more of the noise sources. Those responsible for developing calibrations for constituents at ppm levels must demonstrate the impact of the possible noise sources on their results before suggesting possible limits of detection. Of course, before putting the results into practice the technology must be tested with real food process samples”.
The data presented by Vincent Baeten and the application note distributed by Foss during the conference reached similar conclusions; the LOQ for melamine is between 500 ppm and 1000 ppm depending on the matrix. Baeten reported a SECV equal to 0.3% (3000 ppm!) on a set of 750 samples of soyameal contaminated with 0–5% melamine. The contamination was done with melamine from six origins and no one soyameal sample was used more than once. Mixing the same matrix with the same melamine sample at different level does not create information and will lead to a binary mixture which can be solved with one wavelength. Any wavelength used will likely give a perfect discrimination. Any mixture design must incorporate the normal variation of the contaminant and the matrix.
Validation
Developing a feasibility study, cross-validation is a practical method to demonstrate that NIR can predict something but the actual accuracy must be estimated with an appropriate test set. A good test set is independent of the calibration set. What does independent mean? The validation samples have to come from experiments, harvest times or new batches with spectra all taken at a time different from the calibration spectra. In agriculture, it is easy to validate correctly by removing one year at a time and cross-validating each harvest separately. Using this cross-validation, the final model can be calculated using all the information (keeping the lowest SEPs) of the optimised parameters (data pre-treatments, number of terms, variable selection, regression methods etc.). The accuracy for the prediction of future samples will generally be better by using the maximum of information [calibration + validation set(s)] in the final model.
At CRA-W, the NIR lab is accredited under ISO17025 for feed analysis and the auditors accepted our validation method based on a test set. We did not even mention how the calibration was done and which samples were used for the modelling. Based on historical data sets gathered for more than 25 years, the traceability of most of the samples is gone but the spectral information is still valid to build robust models. A proper validation and continuous monitoring processes are essential parts of the NIR technology.
Calibration and validation tables
During NIR-2009, the abstract book contained papers in which it was difficult to find information about the data sets and sometimes I could not even find the units of the parameters. Then I had the idea to present this kind of table which can be used by NIR users to report their results internally but also for publication. It is probably presumptuous, but I hope that these tables will become a “standard” to report results. Using them will make the reading of articles and the comparison between results much easier. Table 1 is an example of a calibration report.
Table 1. Statistics of calibration results.
Parameters | Fat |
Units | %DM or as is |
SEL—Reproducibility | 0.25 |
N | 65 |
Outliers | 0 |
Min | 6.39 |
Mean | 10.49 |
Max | 15.08 |
SD | 1.59 |
SEC | 0.33 |
R^{2}C | 0.93 |
SECV | 0.38 |
R^{2}CV | 0.94 |
NIR repeatability | 0.19 |
Number of terms | 3 |
RPD_{C} | 4.82 |
RPD_{CV} | 4.18 |
Segments (LOO) | 4 |
WL range/step | 100–2498/2 |
Pretreatment(s) | SNV-DE-D1,4,4 |
Reg. method | PLS |
The first two lines are obvious: the name of the parameter and the units; it is good to mention also if the values are expressed on a dry matter basis or as is. The line SEL refers to the error of the reference data. Duplicate (not necessary for all the samples) determinations must be performed. If you asked for analyses outside your own lab, incorporating several blind duplicates (same batch and different batches) is probably the best way to estimate the error of the reference method. It can be the intermediate reproducibility if working in the same lab or the reproducibility if working inside a network.
The number of samples (not the number of scans) is reported with the number of outliers removed. Outliers cause a big debate! The influence of the outliers is more important when the number of samples is quite small—the statistics can change drastically just by removing one or two large errors in the predictions. If the outlier shows an unusual spectrum, it is easy to detect it and to remove it legitimately. If an outlier is an atypical but actual sample, the question is more complicated and will depend on the goal of the model: predict well the average samples or predict the extreme samples best.
The stats of the Y data include the minimum, the mean, the maximum and the standard deviation of the reference values.
At this stage if SEL and SD are known (no NIR involved yet), it is possible to calculate the maximum R^{2} of any calibration. If,
and if SEC is replaced by SEL, the result is the maximum R^{2} we could get with no error in the spectra or the model. Sometimes, these two values are sufficient to give up NIR model development: it means a range too narrow and/or a reference method not sufficiently precise.
Then the model statistics are reported: standard error of calibration, calibration R^{2}, standard error of cross-validation and cross-validation R^{2}. A large gap between SEC and SECV is an indication of a too small sample set.
The next parameter, NIR repeatability, is omitted most of the time in the publications. It is not used because generally with homogenous samples the NIR is very repeatable and the sampling error (sampling from the container) is small regarding accuracy. When dealing with whole grain or wet forage, the repeatability declines as samples become more heterogeneous, so to ensure accuracy and repeatability it is sometimes necessary to use more than one sub-sample and to repack the sample cups. A rule of thumb is to get SE_{NIR} < 0.5 SECV. As the errors are squared, the impact of the repeatability is “only” 25% of the accuracy.
Next statistics are RPD for calibration and cross-validation. Notice that RPD is directly linked to R^{2} [RPD = 1 / √(1 – R^{2})]. It is anyway more discriminant than R^{2} especially when high R^{2} is close to 1.
The next line refers to the number of cross-validation segments: the values can vary from two to N; N being the leave-one-out method. SECV decreases with the number of CVs and then this parameter is also important. My rule of thumb here is to work with only two cross validation segments and a difference between SEC and SECV less than 5% indicates that the model is robust enough for routine work.
The three last lines report the wavelength range, the pre-treatment(s) and the regression method. If this later is not PLS, additional information would be needed such as the architecture for ANN or regularisation parameters for SVM.
The validation table (Table 2) reports the same elements until reference SD, followed by the R^{2} of prediction, the root mean square of prediction errors and the standard error of prediction (= RMSEP corrected for bias). Notice that a degree of freedom is lost and when the bias is very small or zero, SEP can be higher than RMSEP. RSD is the residual standard deviation meaning the errors after bias and slope correction or the errors along the calculated single regression line (loss of two degrees of freedom). The NIR repeatability is again reported if the prediction set has been measured with replicates. The next three lines report the bias, the intercept and the slope of the regression Y_{ref} = a + b × Y_{NIR}. Do use the regression in this form. The reverse Y_{NIR} = a′ + b′ × Y_{ref} as in Unscrambler gives a slope even for the calibration data, which seems strange. It has to be noticed here that bias and intercept are two different statistics: the bias only equals the intercept when the slope is equal to 1.
Table 2. Statistics of validation or monitoring results.
Parameters | Fat |
Units | %DM or as is |
N | 20 |
Outliers | 0 |
Min | 5 |
Mean | 8 |
Max | 11 |
SD | 1.21 |
R^{2}P | 0.65 |
RMSEP | 2.12 |
SEP | 1.22 |
RSD | 0.73 |
NIR repeatability | 0.19 |
Bias | –1.74 |
Intercept | 3.13 |
Slope | 0.49 |
Av. GH | 5.6 |
Av. NH | 2.5 |
The last two parameters are probably the most important ones. They are called here global H and neighbourhood H, as in Foss’ WinISI package. Each package has its own terminology, but Mahalanobis distances (PLS score space is better than PCA score space) are needed or any statistic related to these distances. They tell if the model is suitable for the samples analysed. Very high distances mean that the samples are not yet represented in the calibration data set and it is well known that extrapolations are very dangerous in the NIR analyses.
The validation or monitoring processes are generally based on much fewer samples than the calibration and a biplot of the prediction vs reference values is advised. In an Excel spreadsheet, it is easy to get this kind of graph in which all the statistics of the table are represented.
Figure 1 is the scatter plot of predictions vs reference. The graph has a 45° line, the 45° line corrected for bias, the least rectangle line (applying this slope makes the standard deviation of reference and predicted values equal: slope = SD_{ref} / SD_{NIR}) and the classical least square line (slope = cov(Y_{NIR}, Y_{ref}) / SD^{2}_{NIR}).
Figure 1. Scatter plot of predicted vs reference values.
This is the essential part of my message of the closing ceremony at NIR-2009. I hope it will be helpful. I would appreciate your feedback and I am open to receive comments about it from anyone.
Reference
- K. Norris, J. Near Infrared Spectrosc. 17, 165–166 (2009). doi: 10.1255/jnirs.844