RMSEP vs RMSEC Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » RMSEP vs RMSEC « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (td)
Moderator
Username: td

Post Number: 190
Registered: 1-2001
Posted on Tuesday, May 19, 2009 - 3:55 am:   

Howard,

Sorry, I was a little sloppy. I should have said that in the first case (200 samples) when we combine the first two sets (calibration and testing) to determine the "final" calibration we obtain an SECV. There is always a slight suspicion that this over-optimistic. When you have a large number of samples you do not need to combine the calibration and testing sets and so you have a true (independent) SEC.

Best wishes,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 229
Registered: 9-2001
Posted on Friday, May 15, 2009 - 10:25 am:   

Tony - I don't see the distinction you're drawing between sample sets with 200 or fewer samples, and "large" sets, with 1,000 or more. In both cases you're dividing the samples into three subsets: factor selection, calibration, and validation.

I don't even see that the allocation of samples among those three sets would be greatly impacted. In both cases you'd want to follow the same principles for sample allocation, regardless of the details. Am I missing something?

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

venkatarman (venkynir)
Senior Member
Username: venkynir

Post Number: 84
Registered: 3-2004
Posted on Thursday, May 14, 2009 - 11:41 pm:   

Thanks for Davies write up about RMSEP and RMSEC .I use to read your article in spectroscopy.Some of them are nice.
Calibartion , Validation and predcition may be explained in the simple format like this .
Usually We use to go strich new shirt (Indian'S) through tailor. we use to go wearing fit shirt and proper explaination provide to the tailor for new one. The tailor takes measurmenets and provides few suggesiton based on the strucure of the body.Finally the user and tailor come to conclusion and the tailor fix appoint to user for fitting test.User along with well wisher (friends) visit on the day of fitting and suggest for correction if any.Once that is accepted .The tailor do the final, stretch and makes it ready. The tailor gives the code number for user.Once the strecthing was well appriciated by the one and all next time the user send the materials and code only to the tailor.
In NIR case first part is calibration ,there the user , model maker ( examples -tailor ) sit and makes it.
In the case of validation like fitting test user goes to the model maker suggest few correctionthat he needs.
In predction the code was send by the user the model (like tailor) maker do the rest.
Still people (NIR work Groups) are not clear in disgushing between SEP and SEV.We have work for it .
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (td)
Moderator
Username: td

Post Number: 188
Registered: 1-2001
Posted on Thursday, May 14, 2009 - 6:08 pm:   

Hello Dan,

"My" advice nearly always comes from Tom Fearn, which makes it much more valuable!
We are very opposed to selection by PCA (or any other technique) of the calibration set, if you then use what is left over for validation.

I think most people sort the data according to the analyte value and then take samples in turn into two or three sets. Use sets 1 and 2 to determine the pre-processing and number of factors for the best calibration. Then combine the two sets to determine a calibration using the choosen combination of pre-processing and number of factors via cross-validation. Then test this calibration with the third (validation) set.

Assuming that you have a reasonable number of samples (about 200) then we work with this 2:1 ratio but if you have really large data sets (1,000s) then you can probably do no better than have three completely separate sets.

Remember that the rule with outliers is that it is a good idea to remove them from the calibration but you need really good (independent) evidence against a sample before you remove it from the validation set.

The decision on numbers of samples in the final calibration and validation sets is a compromise between getting as much variation into the calibration set while having confidence in the SEP from the validation set. If the calibration is "good" then the SEP will be similar to the SE of the reference chemistry. If the calibration is "no too good" then you might want to have more samples in the validation set; especially if the SEP is surprisingly good.
While the goal is to use samples only once you may well have to repeat the calibration several times but you need to try to be unbiased in how you form the sets.

Best wishes,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

Bruce H. Campbell (campclan)
Moderator
Username: campclan

Post Number: 116
Registered: 4-2001
Posted on Thursday, May 14, 2009 - 3:40 pm:   

Here is the response from Dan Miller.

Thanks to all of you for your responses. I can�t think of any other field in which one has such access to recognized experts.

First, the calibration and validation data sets are balanced in terms of range and the distribution of the data. Here are some statistics:

Calibration: Validation:
RMSE 0.15 0.10
R^2 0.996 0.997
slope 0.996 0.992
intercept 0.06 0.15
range 10.0-18.2 10.0-18.2
mean 13.6 13.4
SD 2.2 2.2

Units for RMSEC, RMSEP, intercept, range, mean, and SD are % H2O w/w. Note that the intercept is greater for the validation. These samples have a high water content, so calculation of the intercept involves a considerable extrapolation of the data.

Dr. Kramer - your comments on validating a data set are certainly thought provoking. I believe that Tony Davies also shares your viewpoint (i.e., that selecting a �winning� calibration using a given data set no longer makes the data set useful for independent validation). Your argument makes sense, and I will follow your suggestion and see what I learn. Before I do that, I have a few questions:

Let�s assume that I have 200 spectra (from 200 independent samples), and that I split the data set into:
80 samples for calibration
60 samples for selecting the spectral range, preprocessing technique(s), and number of factors (i.e., for selecting a �winning� model)
60 samples for external validation of the model.

If I then arrive at a final model, I would then use the third data set to validate the model. Another approach in which I �lumped� the second and third data sets into one group (after selecting the �winning� model) would bring me back to the original problem (i.e., using some of the same data twice) and would not be recommended. Would you agree?

A comment that surprised me was that you recommended to use more validation samples than calibration samples. I was always under the impression that one wanted to capture the greatest amount of variance in a calibration data set, and therefore, err on the side of a larger validation data set. Do you have any suggestions of how you would split the above 200-point data set? What would you think of doing a PCA of all 200 samples and then selecting the calibration samples so that they spanned the principle component space, and then splitting the remaining samples into data sets �two and three� (and, all the while, making sure that the distributions of the y-values among the three data sets are similar)?

I recognize that prescribing an exact number of samples, factors, and how to split a given data set cannot usually be done without knowing more about the problem. Though the answer to such questions is often �it depends�, it still would be helpful to use your suggestions as a starting point for what I should try next.

Thanks again to all of you for your time and consideration. I am learning a lot from you�
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (td)
Moderator
Username: td

Post Number: 198
Registered: 1-2001
Posted on Tuesday, August 11, 2009 - 9:04 am:   

Hello Salman,

Welcome to the NIR community!

There have been some good studies of protein structure by NIR spectroscopy.

I think it is unlikely that a UV/vis/NIR spectrometer will be optimised for NIR measurements.
In order to do a useful protein study you need an excellent spectrometer which was designed for the NIR region.

Are you located in Japan? If you are I suggest you contact Prof. Yuki Ozaki (ksc.kwansei.ac.jp).

Best wishes,

Tony

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.