IASIM 2018 Advertisement

RMSEC or RMSECV ¿Which is better?

ForestBiotech's picture

Hi all
I new user in NIR, I'am trying to calibrate honey samples by PLS in Termo microPHAZIR GP, when I realized the model obtained two error RMSEC and RMSECV and a optimal number of factors based on these (in the photo the RMSEC lowerst use 16 factors and RMSECV lowerst use 1 factor)
¿Which should I choose? 
Thank in advance
Jorge G. 

Uploaded Images: 
gabiruth's picture

Dear Jorge,
Please let us know first what is the format of the spectra - absorance, or first or second derivative?
Second, could you tell us what are you trying to calibrate for? What are the units? %?
Once we know this it will be easier to understand if  16 PC's are too much.Do you have a function in your chemometric package called loading weights for each PC?
Gabi Levin

shileyda's picture

Hello Jorge,
Yes Gabi has a good suggestion to examine loading weights, if the loading weights appear to be just noise and not real features that would be an indication of overfitting. I would also recommend that you check your RMSE plots too. 
Regarding which is better, RMSECV. The cross-validation technique is much better estimator of true error than RMSEC. But neither of these is a measure of true error. That requires an independent test set that was not used in creation of the model and calculation of RMSEP. You should always reserve a portion of the total available samples to use as an independent test set. In a properly fitted model the RMSECV should be very similar to RMSEP.
Best regards,

ForestBiotech's picture

Hi Gabi and Shineyda
Your help is greatly appreciated!
About the question
My spectra data are as Log 1/R and I'm trying to measure moisture (%) in 60 honey samples, of these I use 48 (80%) in the calibration.
The software gives me a plot of loading, but I've not taken into account this (photo)
Using a validation samples (25%) obtained following values

RMSE= .8722175

Average Error= .1801532

R-squared= .240706948376612

Slope= .3439107

Offset= 11.17503


Thank in advance 

Uploaded Images: 

Jorge M. González Campos
Instituto Forestal (INFOR). Chile

dwhopkins's picture

Hi Jorge,
Your validation stats show that the 16-factor model is not acceptable, you should see a slopeof at least 0.8, or 0.9 would be better, with an RMSEP (RMSE)  closer to your RMSEC.
You need to look at the RMSECV value using the same number of factors as the model you would like to evaluate.  Clearly NF=1 is not good.   To my eye the Loading at NF = 10 may be acceptable, as it still has signal at 1940 nm, where we expect to see information for water.  You need to look at a plot of SEC vs NF to see where model overfitting becomes a problem, which is apparently at NF < 16.  Also look at SECV vs NF.
What is the accuracy of your moisture reference method?  A good method is to calculate the SDD (Std Dev of Differences) for 10 to 20 samples measured in duplicate by the ref method.  It is a good idea to measure the NIR spectra in duplicates too, so that you can calculate the SDD for the NIR method too, and compare to the lab method.
How did you measure the NIR spectra?  Did you use vials or round test tubes, or cuvettes with flat windows?  What was the optical path?  Were the samples shielded from room light?  Can you show us the spectra (log(1/R) ) for the calibration set?  You did not mention whether you used any spectral pretreatments, such as derivatives and MSC or SNV.  Honey samples will probably require spectral pretreatments for optimum PLS results.
Best wishes,
Dave Hopkins

td's picture

Hello Jorge G.,


Welcome to the Forum.


(This reply was posted on 9 Feb but did not get sent [my mistake] so you have answered most of my questions and you have received good advice)


The quick reply is RMSECV is better.

RMSEC is measuring how well the calibration worked. It does NOT tell you how well it will predict future samples.


RMSECV should be the result obtained by measuring an independent set of samples and does indicate the possible performance of your calibration.


However, the experiments must be done correctly and beware, some software will give you an RMSCEV value when you have not done the correct experiment.


Would you like to provide some additional details of your experiment so that I can give you a more confident answer. I would also be interested to know what your are measuring in your honey samples.


I first worked on honey analysis in 1965!!


Best wishes,



Best wishes,




ianm's picture

Neither one is any good. For RMSEC 16 factors is too high, and for RMSECV 1 factor is too low, it is impossible for a single PLS factor to account for all of the variance in both spectral and reference data. You have not stated for what constituents you are trying to calibrate. If it is sugar content and your reference method is a reducing sugar method you should know that honey can contain 1–3% sucrose (possibly more). Sucrose is non-reducing and will introduce a source of error to your reference testing. I suggest that you start over, with at least 80–100 samples of honey.


(posted by Ian Michael on behalf of Phil Williams)