NIR Discussion Forum: Judging PLS models for NIRS prediction

Judging PLS models for NIRS prediction Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » Judging PLS models for NIRS prediction

« Previous Next »

Author

Message

David Cameron (david_cameron)
New member
Username: david_cameron

Post Number: 5
Registered: 3-2011

Posted on Sunday, March 06, 2011 - 11:17 pm:

I suggest you look at ASTM 1655, and other ASTM documents you will get if you search on the word MULTIVARIate.
In brief, the document will detail how to do a T test on a validation set.
R and Rsquared are of little value in this exercise, as are cross validation SEP and a number of other modelling parameters. In fact, they are not mentioned in the validation procedure. If you look hard at the validation procedure in this document you will find out that they are doing a standard laboratory analyser comparison. They have two sets of numbers, one from NIR, the other from a lab method, they want to see if the two sets of results are "the same". This, of course, does not require any of the multivariate package outputs.

It is, in my experience, exactly what most lab managers are entirely familiar with

Fernando Morgado (fmorgado)
Junior Member
Username: fmorgado

Post Number: 10
Registered: 12-2005

Posted on Monday, December 13, 2010 - 7:49 am:

Hello :

SEP calculate is valid for all the range since yet you don�t know if you have more mistake in the low range , middle or high range. Sometimes model work fine in one range and bad in other range, that depend of the distribution of your samples, parameter, etc.
For me phenolic is not fine. You have 7.65 SEP, and range start in 5.0 , this that mean in the below range you have more 100% mistake , and in the high range you have 15% ( teorical). For example If your sample result is 20, this that mean the error can be 20 minus 7.65 or 20 + 7.65 . That is not good, but depend the error you accept in your work. The same for the other parameter.

Some people start to eliminate samples of the model only with the proposal to decrease SEP, but if you have not reason for put a sample out the model you can not make this. If sample is outlier put out, if some sample have a low or high value and not other samples near ( leverage) you can eliminate it, but remember you will short the analitical range of the Model.

Is not easy only with SEP, R, etc, decide if you can increase the performance of your Model.

Answering your question about acceptable SEP you decide it. If for you 7.65 of error is ok....is OK

You don�t mention the error of the reference method. For evaluate SEP always is fine to know the chemical error method. If you say to me the chemical error is 0.5 a SEP 7.65 is too bad. If you say to me the chemical method error is 8.0, your are obtained a nice SEP.

I hope you can undertand me, my Inglish in not good.

A last question....wich software you are using?

Fernando

Huiying Wu (huiying_wu)
New member
Username: huiying_wu

Post Number: 2
Registered: 9-2010

Posted on Monday, December 13, 2010 - 7:23 am:

Fernando and Gustavo:
Thank you so much for your help!
I just test the models with a 10-sample validation set. Regression shows the R2 are 0.80 for phenolics and 0.71 for nitrogen. SEP are 7.65 for phenolics in the range of 5~60, and 0.09 for nitrogen in the range of 1.1~2.0.
What is the maximum acceptable SEP??
Thank you!

Gustavo Figueira de Paula (gustavo)
Intermediate Member
Username: gustavo

Post Number: 17
Registered: 6-2008

Posted on Thursday, December 09, 2010 - 8:12 am:

Wu,

A good model is a model that works.

By (my) definition, a model that predicts the expected variable within the expected calibrated range, with deviation smaller than the maximum accepted deviation from the reference method, is a good model.

The calibration parameters are very useful to compare models, in the quest to find the most parcimonious one to do the job.

I agree with Fernando: only performed predictions will enable you to check if your model is good (it works) or not (don't work).

But to decide if Model A is better than Model B, several criteria can be applied. But it's more art than science, since the general advice of parcimony is too vague.

Fernando Morgado (fmorgado)
Junior Member
Username: fmorgado

Post Number: 8
Registered: 12-2005

Posted on Thursday, December 09, 2010 - 7:49 am:

Hello :
First, the number of factor for me is very high considering the number of samples. If your software decide 6 or 7 factor will be correct, if you decide the number of factor will be necesary check it.

The error you mention don�t have sence without know the normal range of the parameter you want measure.

Remember samples used in the model normaly give good prediction ( lab v/s Nir), but this don�t mean the model is correct.

R square for me is not usefull for decide, for example if you have two set of data, and one of this set have a big constant Bias comparate with the other , the R square will be near to 1, but the real error will be biger.

For check a Model is necesary predict some samples ( minimun 10) not included in the Model and analize that samples in laboratory. Is necesary the samples are inside the range of the Model , have low, middle and high values, and will be not Outliers.

When you have this the better is apply some statistical analisys , as T Student, F analisys, et. Only stadistical analisys can say to you if samples predicted with your model will give similar result to Chemical analisys.

I make the same question some days ago, wich is the better tool for decide if a Model is working or not?

Regards
Fernando Morgado

Huiying Wu (huiying_wu)
New member
Username: huiying_wu

Post Number: 1
Registered: 9-2010

Posted on Thursday, December 09, 2010 - 7:15 am:

Hi there,
How to judge the PLS models?
I am using NIRS for leaf nitrogen & phenolics analysis and have made PLS models using 40 leaf samples. The model for nitrogen (6 factors): R square are 0.93-calibration, 0.60-full cross-validation; RMSE(range 0.9) are 0.055-calibration, 0.134-full cross-validation. The phenolics model (7 factors): R square are 0.97-calibration, 0.56-full cross-validation; RMSE (range 55) are 2.04-calibration, 8.49-full cross-validation.
How to determine if it is acceptable to use these models to predict new unknown samples?
Thank you!