Which model is more robust? Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Which model is more robust? « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Nebojsa Todorovic (nebojsa)
Junior Member
Username: nebojsa

Post Number: 6
Registered: 10-2010
Posted on Tuesday, March 15, 2011 - 2:24 am:   

Hi Tony!

many thanks to you and Suresh for your suggestions. Can you tell me, in short, how to use F test to find the outliers?

Best regards,
Nebojsa
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (td)
Moderator
Username: td

Post Number: 249
Registered: 1-2001
Posted on Monday, March 14, 2011 - 3:46 pm:   

Hi Nebojsa!

I have a reply for you from my friend Suresh who is a chemometrictian at Camo.

"The Hotelling�s statistic as ellipse in 2D or as a critical limit after a certain number of PCs is the square of the Mahalanobis distance.

In Unscrambler the score space is the basis for the Hotelling�s T2, not the original variables.

When the term �Mahalanobis� is used in other software it is not always said what is the basis.

We use Q-residuals and F-test statistics to find the outliers in the residual space.

Regards,

Suresh"

Thanks to Suresh,
Best wishes to you both,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

Nebojsa Todorovic (nebojsa)
New member
Username: nebojsa

Post Number: 5
Registered: 10-2010
Posted on Friday, March 11, 2011 - 4:14 am:   

Dear Jose,

thank you for your helpful advice.

Best wishes,
Nebojsa
Top of pagePrevious messageNext messageBottom of page Link to this message

Jose Miguel Hernandez Hierro (jmhhierro)
Member
Username: jmhhierro

Post Number: 15
Registered: 4-2008
Posted on Friday, March 11, 2011 - 3:23 am:   

Hi Nebojsa,

Hotellng T2 95% confidence ellipse can be included in score plots and reveals potential outliers. This feature is similar to use a Mahalanobis distance(e.g GH=3)to the mean value.

Best regards

Jose
Top of pagePrevious messageNext messageBottom of page Link to this message

Nebojsa Todorovic (nebojsa)
New member
Username: nebojsa

Post Number: 4
Registered: 10-2010
Posted on Friday, March 11, 2011 - 3:00 am:   

Hello everyone,

Can someone tell if I can use hotelling T2 ellipse like Mahalanobis distance in the Unscrambler software,

Thank you,
Nebojsa
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (dhopkins)
Senior Member
Username: dhopkins

Post Number: 183
Registered: 10-2002
Posted on Saturday, February 19, 2011 - 11:33 am:   

Hi Fade,

It is impossible for me to advise you what to do next to make progress with your application. You need to apply your knowledge of the samples, and the their chemistry. As I said before, you need to try to understand the B-vectors that you obtain when you perform various calibrations, and observe the factors to be sure that they represent information rather than noise.

You really have a large database. I wonder if you can subdivide the soils into classes and identify a class or two that allow good predictions? Does PCA show any groups in your data set? It may be that you can identify outliers using tools of cross-validation or leverage plots, and thereby improve the calibrations on the remaining samples.

It may be that some spectral transformations or pretreatments will help-you have not mentioned any information about that. It may be that other analytical methods beyond PCR or PLS will be useful. I know that the field of soils analysis is growing, and recent publications may give you clues on your next steps.

Best wishes,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

deng fan (fade)
New member
Username: fade

Post Number: 2
Registered: 12-2010
Posted on Saturday, February 19, 2011 - 10:53 am:   

Dear Dave and Gustavo,
I have transfered the SD and RMSE to the original C data, SD increased from 0.49 to 0.83, and RMSEP increased from 0.23 to 1.4, and the RPD of course now becomes lower than 1.4, which suggest a very bad prediction.that means both of the model is not working.So what should I do now? DO you have any suggestions?
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (dhopkins)
Senior Member
Username: dhopkins

Post Number: 180
Registered: 10-2002
Posted on Thursday, February 17, 2011 - 3:09 pm:   

Hi Fade,

You are working on a difficult application, and I see you have done a lot of work already. Welcome to the Discussion Group.

I have a feeling that your Method I may give you a more robust calibration, in the sense that the calibration has more sources of variability built into the model. In Method II, you actually have 2 Validation Sets. Are the RMSEP and RPD values essentially equivalent for the 2 sets? And, how much smaller are these values than the RMSEP and RPD for the Set of 1500 samples in Method I?

I recommend you look at the B Vectors for the 2 approaches, and judge whether you can understand the peaks and troughs, or whether one method seems to have more noise peaks than the other, and select the less noisy one, with lower coefficients, as the "most robust".

The RMSEP values are meant as a guide to the agreement of the estimated levels of C to the reference method. Therefore, you should convert the SQRT of the reference values to the native concentration values.

What is the range of concentration values that you are working with? I am concerned that the use of the SQRT compresses the range too much? It would be useful if you would share with us the plots of the 3 validation sets by the 2 methods, to judge the performance and the effects of any outliers, as Gustavo has mentioned. I think your plots should also be with the concentrations, rather than the SQRT.

Best wishes,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

Gustavo Figueira de Paula (gustavo)
Advanced Member
Username: gustavo

Post Number: 21
Registered: 6-2008
Posted on Thursday, February 17, 2011 - 7:23 am:   

Fade,

When you predicted the carbon content with your second approach (after PCA scoring ), have you tried with the 2573 left out? I think that running a prediction with this dataset will result in several outliers.

If not, I guess it will be a better model since the RMSEP is lower. But in my opinion, you must evaluated also the samples left out to ensure that your universe is well represented.

Gustavo de Paula.
Top of pagePrevious messageNext messageBottom of page Link to this message

deng fan (fade)
New member
Username: fade

Post Number: 1
Registered: 12-2010
Posted on Thursday, February 17, 2011 - 6:29 am:   

I work with soil carbon using NIRs, after scanning 3000 soil samples, I have tried two different way to build models. First, equally devide the dataset into Calibration, validation and prediction. The second way is using PCA score to select 427 out of 3000, out of this 427, use 300 as calibration, and 127 as validation, then the rest 2573 as prediction, and with C reference. SO the model performance , R2, RMSE, RPD have better result for the second way, it has also the better prediction. But what I am wondering is, is this model robust?

THe second question is, when I build the model, I use SQRT transfer for C data, so do I need to transfer back to the original data for RMSE and standard deviation? Thanks a lot for your help!

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.