NIR Discussion Forum: Selection of spectral variables

Selection of spectral variables Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » Selection of spectral variables

« Previous Next »

Author

Message

Tony Davies (td)
Moderator
Username: td

Post Number: 276
Registered: 1-2001

Posted on Tuesday, January 24, 2012 - 3:00 pm:

Hi Barry,

Thanks for answering for me - been rather occupied for the last few days!

That was exactly what I was referring to. Derivatives can give rise to negative correlations - first derivatives can be very confusing!
Tony

Barry M. Wise (bmw)
Member
Username: bmw

Post Number: 11
Registered: 2-2011

Posted on Tuesday, January 24, 2012 - 1:03 pm:

Hi Venkynir:

I think what Tony Davies was trying to get across was that you don't want your calibration for component A to be based on the absence of component B. In systems with closure constraints, where all the components have to add up to 100%, then models often pick up on the fact that you can correlate increasing A with decreasing spectral features from B. This is precarious because any changes in the system that lead to a change in the closure constraint, or even a substitution of A with something new, will lead to erroneous results.

However, this is not to be confused with the issue of whether spectral features with negative regression coefficients should be included in the model. In many cases this are absolutely required because of the overlap of interferent signals with those of the target analyte. In fact, we have a short section in our course on Variable Selection that demonstrates this.

And speaking of which, this year's Eigenvector University is May 13-18 in Seattle. Our course on Variable Selection will be on Wednesday, May 16. You can get complete information about EigenU 2012 at:

http://www.eigenvector.com/courses/EigenU.html

Best regards,

BMW

Barry M. Wise, Ph.D.
President
Eigenvector Research, Inc.
3905 West Eaglerock Drive
Wenatchee, WA 98801

Phone: (509)662-9213
Fax: (509)662-9214
Email: [email protected]
Web: eigenvector.com
Blog: eigenvector.com/evriblog/

venkatarman (venkynir)
Senior Member
Username: venkynir

Post Number: 145
Registered: 3-2004

Posted on Friday, January 20, 2012 - 9:11 am:

Hi Davies;
can you brief your statment "You may have reason to believe that the use of the inverse correlation could be unsafe for use with future samples" why ?.

Tony Davies (td)
Moderator
Username: td

Post Number: 275
Registered: 1-2001

Posted on Friday, January 20, 2012 - 9:00 am:

Hi Jos�:

Quote: "it does not hurt to be wise using this procedure in selecting spectral variables".

I'm sure Barry would agree!

Best wishes,

Tony

Jos� Antonio Cayuela S�nchez (joseacayu)
Junior Member
Username: joseacayu

Post Number: 6
Registered: 11-2009

Posted on Friday, January 20, 2012 - 6:53 am:

Hi Tony,
Thank you very much. Thus, it does not hurt to be wise using this procedure in selecting spectral variables.

Jos� Antonio

Tony Davies (td)
Moderator
Username: td

Post Number: 274
Registered: 1-2001

Posted on Friday, January 20, 2012 - 5:51 am:

Hi Jos�:

I fully agree with Barry's comments. Just to give you added confidence Harald (Martens, the originator of Unscrambler) told me that he always intended there to be a selection of significant variables. However it was sometime before the uncertainty test was included in Unscrambler and by that time most people had got use to the idea of using the model as selected.

As far as reasons for variable selection goes I could offer the case where Unscramber has found an inverse correlation between a constituent of interest and some other constituent. You may have reason to believe that the use of the inverse correlation could be unsafe for use with future samples.

Best wishes,

Tony

Jos� Antonio Cayuela S�nchez (joseacayu)
New member
Username: joseacayu

Post Number: 5
Registered: 11-2009

Posted on Friday, January 20, 2012 - 1:07 am:

Hi Barry,

Many thanks. The main purpose is 1, and also 3. I am relieved to confirm, according your what seemed logical to understand. Thank you so much that I confirm what I meant.

Jos� A. Cayuela

Barry M. Wise (bmw)
Junior Member
Username: bmw

Post Number: 10
Registered: 2-2011

Posted on Thursday, January 19, 2012 - 6:34 pm:

Hi Jos�:

As far as methods for selecting spectral variables goes, eliminating those that aren't significantly different than zero (as determined by Marten's Uncertainty Test, I assume, as it is in Unscrambler) isn't too likely to overfit much. Generally, the chance of over-fitting is increased when the flexibility of the selection method is increased and the rigorousness of the testing is decreased. As far as variable selection methods go, I don't think you have much of an issue on either account assuming you've done a good many splits of the data.

But I have to ask, what is the purpose for the variable selection? Generally, variable selection is done for one of three reasons (that I can think of, I suppose there may be more):

1) Improvement of the prediction error of the model
2) To identify a small number of variables to use in a less expensive instrument (e.g. filter spectrometer or diode laser sources)
3) To learn about the system under study.

I'd expect Marten's Uncertainty test to help with 1), but not be useful for 2), and maybe moderately useful for 3). We really like the interval-PLS method, iPLS, as it is especially good at 3), and often quite good with 1). Genetic Algorithms, GAs, have the highest risk of over-fitting; depending on how they are set up, they can be overly flexible and not tested enough, but they can be good at 1) and set to work for 2), but are less good for 3).

OK, just my two cents worth!

BMW

Jos� Antonio Cayuela S�nchez (joseacayu)
New member
Username: joseacayu

Post Number: 4
Registered: 11-2009

Posted on Thursday, January 19, 2012 - 5:58 am:

Please, I wanted to raise a query.

One way I know to select the spectral variables used in a predictive model in Unscrumbler, is the elimination of spectral variables with a coefficient close to zero, with cycles of elimination successive, by using the plot 'Regression Coefficients' and 'mark with rectangle'. Someone told me that this operation may lead to overfitting. I would greatly appreciate someone who could clarify if indeed this can happen, because I do not understand why it could be the result.