Wavelength Selection in NIR Chemometric Calibrations

Submitted by ianm on 13 November 2017 - 12:20pm

Forums:

1. Wavelength Selection in NIR Chemometric Calibrations

Here is a topic that had many replies and variations on the original question(s). The next few paragraphs outline the topic to which there have been many replies. Those replies follow.

There seems to be two approaches to using or not using all of the spectra in a principal components regression calibration. One approach is to use all of the spectra, or at least all but parts that may for one reason or another be noisy. The other approach is to use only the parts that have maximum information, such as regions that are not close to or at zero intensity. These two approaches may be said to be inclusionary and exclusionary.

Some of the advantages of the exclusionary are to avoid possible noise, faster operation (fewer data points to use in calculations), and focussing on maximum information regions. Advantages of the inclusionary are to not have to make decisions about which regions are important and have useful information, ensuring that if a new band (perhaps from an impurity) arises it has a higher probability of being detected (thereby avoiding an incorrect result if one is using statistical measures of the goodness of calculation), and when using faster computers calculation times are small anyway so decreasing the number of data points is not so important.

Do you know of any other reasons for either approach? Is one approach much better than the other? How much advantage of one approach over the other is related to equipment being used versus "theory"?

Replies

I've always looked at the question of "significance" as the area where the practicioner can "make a difference".

Having originally been trained on LT Industries software, I liked to compare a plot of simple correlation to that of a typical spectrum to see what matched up.

Then I picked up a tip from Harold Martens to look at residual variance as a function of wavelength and number of factors in Unscrambler.

But mostly I look at the loading vectors generated during PCR or PLS calibration to see if they are extracting information in areas of the spectrum that I'd agree are relevant.

And then as a final step, I like to show the regression coefficient vector plotted on the same scale as an average spectrum as a means to visually validate the model.

Is there software to do this? Perhaps, but I'm not aware of it.

Dave Russell

From: [email protected]

Just a couple of things....

To Dave Russell - The Unscrambler will do all of that.

Also, you can plot the leverage of the x-variables. This gives a very clear picture of which wavelengths are important.

From: Lois G. Weyer

Yes, there is such a procedure. It is called "stepwise regression"! Just picking the one best spectral "region" (or wavelength, which actually defines a region) doesn't solve the more difficult calibrations. The primary wavelength (or region) may require help from other wavelengths (regions) to correct for nonlinearities or interferences. So, restricting a PLS calibration to a narrow range may be only the first step.

Both stepwise regression and PLS should be tried for any data set.

Let's not generate new misconceptions here.

From: Howard Mark

In the context of MLR calibrations there is such a statistic: it is the t statistic for the regression coefficients (or, what is equivalent, the one called "the F for deletion statistic - this is numerically the square of the t statistic). These are both standard statistics, which can be used and compared with standard statistical tables. That is the plus. The minus side of these statistics is that they are both highly affected by the intercorrelations we find in normal NIR data. This is the reason why a good statistician will not use any one single statistic as an absolute guide to selecting a wavelength, but rather try to understand the underlying structure of the dataset using all the statistical (and non-statistical) information available.

When you go to the full-spectrum techniques, then even these criteria for the wavelengths disappear, although some workers are looking at ways to limit the wavelength ranges to use, as has been discussed in the recent past. Keep an eye open for the next few Spectroscopy columns, some of these topics are discussed, although somewhat obliquely.

On the other hand, as spectroscopists, we know something about the underlying structure, because that reflects the spectra of the components of the samples. This leads to the argument of using the spectral peaks, etc. as the wavelengths to use. There are two problems with this:

1) Some of the corrections that need to be made do not have spectral structure (the prototype example is particle size)

2) Partly (but not entirely) because of problem #1, using only spectral structure as a guide to wavelength selection generally does not provide calibration models that give the best numbers (whether you use SEE, SEP, SECV or any other criterion) and this is unacceptable to most people -they want the best numbers they can get.

I suspect that a lot of the problems people have with things like calibration transfer would go away if they got over this hangup with the numbers.

Because of all this, I tend to suspect that the original question is wrong: that there is no single "best" set of wavelengths, even in principle. The "best" set will depend on your goals and criteria for deciding when you've reached it: one set is "best" for the numbers, a different set is "best" for robustness, etc.

Howard

From: Lois G. Weyer

If a new band from an interfering substance arises, why would you want to see it? A robust calibration would ignore irrelevant spectral features.

Cutting down the region would make PCR or PLS a little closer to the stepwise regression/genetic algorithm approach, which makes for more robust calibrations in the face of irrelevant changes in the samples.

An identification loop can always be put in if one wants to tag unnusual samples.

From: Ronald Rubinovitz

Here's my slant on this issue...

You've done a good job summarizing the relative strengths and weaknesses of each PC/PLS method, but I would like to add the following points:

When omitting spectral regions, as you say, the goal is to avoid forcing your PC analysis to model irrelevant information. This can regions that 1) are noisy [as you point out], 2) are non-linear [typically, "flat" or nearly flat regions of extremely high absorbance], 3) or contain irrelevant information. Please note that I have listed these in order of ease of identification.

That is, it's pretty cut and dry to find the noisy regions. It takes a little bit of expertise to identify non-linear regions of a spectrum. And finally, it's not at all simple to look at the spectrum of a complex mixture and determine what remaining points are "irrelevant".

People tend to gravitate towards the "exclusionary" method because these calibrations will tend to give better results (if they didn't, calibration development would be one step simpler!).

Your concern that exclusionary calibrations will not flag some spectra that contain new features can be addressed as follows.

First, I'm not convinced that "inclusionary" calibrations will always allow new spectral types to be flagged. Some calibrations like these give seemingly meaningful predictions on samples that are not at all represented in the calibration set (such as containing an impurity). One would get higher assurance of effective "screening" of spectra if there was a dedicated qualitative analysis before the prediction step.

In summary, I question the capability of "inclusionary" calibrations to be an effective screen against "bad" spectra, and "exclusionary" calibrations tend to be more effective calibrations since they don't have to attempt to model noise and non-linearity.

'Hope some of this is useful.

Best Regards

Bruce - another round of comments here: first of all it was certainly interesting to see the varied points of view.

In response to Lois' question there are two answers. The direct one is that it is a rare case where the extraneous material will not affect one or more of the wavelengths that you are using; even when dealing with pure materials with narrow bands, these usually have multiple absorbances, so it becomes fairly likely for one or more of them to overlap the one(s) you're using, and cause an error. While the overlap may not be obvious, the full extraneous band may be much easier to identify.

A more generic, and fundamental, answer is more in line with what Ron R. is saying - not everybody has the same goals, or approach to the goal. While the main purpose of "standard" calibration methodolgy is accurate quantitative analysis, some analysts may want to prescreen the samples, as one suggests, others may want to flag the presence of a contaminant whether or not it causes an error, etc. Thus, what you do during your calibration exercise will depend on what you want to get out of it.

Howard

From: Gerard Downey

There are a number of other considerations which arise in deciding which spectral regions to use in PCA/PCR.

One is to try to use only those regions which may be expected to contain absorptions due to specific chemical moieties relevant to the problem under investigation. Another is to deliberately exclude spectral regions known to arise from moieties irrelevant to the problem - ib the mid infrared, for example, it is common to exclude the major water absorption area in favour of the "fingerprint" region ( 5000 - 12500 nm).

I am interested in evaluating approaches to determine the optimum wavelengths for such regressions in the absence of any knowledge of important chemical moieties - how advantageous for example would it be to simply select those wavelengths with the "largest" variances? How could such a decision be made objectively?

From: Howard Mark

Bruce - I should probably keep my mouth shut about this because my own (unpopular) minority view is that once you consider restricting the wavelengths, why not go to the limit and eliminate all the wavelengths except those few that have all the necessary information content - - - which line of thought leads you to using MLR with wavelength selection instead of PCA/PLS or any other full spectrum method. This is where NIR started out and except for the hype I have yet to see any global or generic improvement in results from the full spectrum methods, or any other benefit except for elmination of the need to select wavelengths, as you mentioned. (I think we'd better both duck for cover!)

Howard

From: Emil W. Ciurczak

I have two minds on this, myself. Numerous attempts at PLS equations have shown me that there is no one good way to generate an equation. Just as the absolute number of factors is up for debate, I have seen the exclusion approach give both better and poorer statistics for an equation.

I do like the idea of a region where egregious differences might easily been noticed, failing the sample. I tend to think that there will never be a consensus on this or almost any part of how to choose the 'right'equation.

Emil Ciurczak

From: Howard Mark

Emil has a good point: For a long time now I've come to believe that even IN PRINCIPLE there is no single "best" solution to the calibration problem. In the vast majority of cases almost any approach, whether you consider data pretreatments, calibration algorithms, wavelength sets, factor sets, etc., can provide equivalently good results. In the few cases where it might make a difference, the "best" way will depend on the detailed nature of the characteristics of the data; hence in one case one approach will be "best", in another case, another one, and so forth.

Howard

Several comments relating to the above. (Editorial addition?)

1. A few months ago Richard Kramer said the three most important actions one should do is validate, validate, and validate. Thus, in deciding if some or all of a spectrum is used in a calibration, one needs to validate. But as Howard says, what does one validate for? Robustness? Precision? I guess another way to say that is one should have set goals for acceptance before validation.

2. The t statistic for regression coefficients is a very useful one. Not only do the magnitudes give an indication of the goodness of calibration, but the magnitudes for individual parts of the spectrum indicate relative importance. Further, in my experience, if the t values for the regression coefficients resemble a noisy spectrum, there is a good chance large amounts of noise are being incorporated in the calibration.

From: Howard Mark

Subject: Re: Some recent comments

I don't think anybody would disagree with Richard about the need, desire and utility of validating. The point is, and I'm pleased that you picked up on it, is that validation, as it has been defined and practiced, only tests how well a given calibration model can be expected to perform on "real" samples. This is a critcal piece of information for the practical use of any spectroscopic application, but falls into the category of "necessary but not sufficient".

The reason it is not sufficient is just that it sheds no light whatsoever on questions such as the one you originally posed: how to improve a given model, or more to the point, HOW TO TELL if you've made a real improvement to a model. A critical point that has been missed is that even if you validate two models on a completely separate set of data, the one with the smaller SECV is not necessarily the better model.

To illustrate this, imagine two models that fundamentally have identical characteristics, but use different wavelengths. One of them will give a smaller SECV due solely to the different noise structure of the wavelength set used. This is the meaning of the statistical term "non-significant". Thus, even if one is slightly worse, it could give a better-seeming number, due solely to the noise structure.

The purpose of the science of Statistics is just to determine when differences are meaningful, and that is the reason the t-statistic is so useful (when it can be applied). If the t value shows that a given wavelength is significant (in the statistical sense), then the associated wavelength is making a real contribution to the model.

The limitation of t is that it can be masked by intercorrelations. Two wavelengths may be individually very important to a calibration model, but if they are highly intercorrelated with each other (two different bands of the same chemical component, for example) then that fact will make each one look unimportant. Another way to interpret the situation is to recognize that each one individually is NOT important, because the other one can do the same job, if necessary, so that when both are present you can get this misleading result.

If you have so many wavelengths that you can actually have the t-values "resemble a noisy spectrum", as you propose, then you almost certainly have so many wavelengths that the true effects of the important wavelengths are masked by this phenomenon, so to that extent I have to disagree with your statement in that regard, even though you have the right idea, fundamentally. The problem is not necessarily that the data are noisy, but that the intercorrelations are emphasizing what noise is there, which is a related, but separate effect of intercorrelation.

(Another part of the question, sent to the group)

There is another aspect to choosing which part(s) of the spectrum to use in a calibration. This is the effect of noise. Granted, we agree that a noisy part of a spectrum should not be used, but this only addresses the visible part of the noise. What about the noise that is not visible? That noise also has an influence. If one includes a major part of the spectrum, instead of a small part, does the noise become less of a factor? It may, because the noise, say at one wavelength, being random, would tend to be cancelled by noise in other parts of the spectrum. However, I don't know of any equation or relationship that would indicate how many points would reduce the effects of the noise by, say, 90%. I could do the calculation for one point, but when the relationship is through the regression values, is such a calculation possible?

Comments?

Howard

From Richard Kramer.

The point is, and I'm pleased that you picked up on it, is that validation, as it has been defined and practiced, only tests how well a given calibration model can be expected to perform on "real" samples. This is a critcal piece of information for the practical use of any spectroscopic application, but falls into the category of "necessary but not sufficient".

The issue Howard raises is an important one. However, I disagree with his characterization of validation and with the resulting conclusion. It all depends upon what one means by the concept of validation.

If validation means the ongoing validation of a plurality of alternative models (my preferred meaning), it DOES become the means of selecting one model over others. And importantly, it permits selection of models which exhibit the best performance with respect to time-related properties such as robustness. It is not uncommon to observe that the model which initially appears to be optimum is the one whose performance degrades most rapidly as time passes.

Validation over time also provides a means of gaining insight into which portions of the data might contain more confusion than information and would be best discarded. In particular, it can be interesting to look at the data residuals over time. It is not uncommon to find that the residuals in some parts of the data space increase more rapidly, over time, than the residuals in other parts of the data space. Generally excluding (or de-weighting) the former from the model can improve the model's performance, short term and long term.

Richard Kramer

My comments - I think Rich & I agree more than we disagree. If you use his definition of validation then what he says follows. However, that definition is not the one in common use - the MUCH more common definition is simply the one that tells you to separate your calibration samples & keep some out of the calibration calculations, then use those to validate.

Once you've gone to the trouble to collect data over time then your options expand greatly. Not only can you use that data for ongoing validation, you can also include those new readings in the calbration calculations. There are at least two ways to do this:

1) As Richard implies, one way is to gradually replace he older data with the new as it becomes available. This has been standard practice for a long time, for example in the agricultural industry, where old samples will never be seen again. A grain elevator, e.g., will never again have to measure another sample from the 1989 crop year.

2) The other obvious extension, which is more useful for the case where you may still have to measure samples with the same characteristics as the old ones, is to simply keep adding to and expanding the calibration set as new samples become available. The new samples then not only allow you to test for robustness, but inclusion of such samples will actually make the calibration more robust. I think we all know this intuitively, but I have also been able to prove this mathematically.

Howard

Response from Don Dahm ([email protected])

I have done a complete turn around on the issue of restricting wavelength ranges in the calibration. Clearly, many practioners have done sucessful calibrations using the full data. In some senses, I agree completely with Emile and Howard that there is no one best way to do a calibration. However, I think one of the values of a group like this is that some principles might emerge among those of us who like to think about such things that are worth sharing with the rest of the world, particularly if we can collect experimental data which clearly shows the correctness of our conclusions and the limitations of them.

Bruce is able to state very clearly the starting point of my thinking. He has shared with me the following: "I learned from an expert in linear algebra (read multivariate regression as a subset of that). He said PCRA, meaning both PCR and PLS, are linear in the coefficients, not necessarily so in the variables. His explanation was one even I could understand. It goes like this: The variables can be transformed into linear responses such as Beer's Law if one wants to, but the regression applications do so inherently. Only the coefficients, the B terms, in the matrices are linear."

Statements like this invite us to conclude that the sophisticated regression techniques can handle non-linear data just fine. If that is so, than why not use all the data just "in case there is something you need in that range". That is a quote from Harvey Gold at a meeting where several of us were being paid to share our wisdom(?) with NIRSystems. I agreed with him whole heartedly and our position was politely (Harvey is far more polite than I am) disagreeing with others including Karl Norris when we were wrestling with this issue.

Since then, Karl and others have shown me data that convinced me that restricting the data range for PLS improves the calibration in some specific cases. I have shared some of my thoughts on this with Howard Mark and he agrees with the conclusion that data cropping is good, but he does not seem to share my reasoning as to why. It is my opinion that it is not so much that we are "using linear regression techniques to fit non-linear data" (quote from my inflamatory statement below) but that there are several (many) different non-linearities in the response of an individual component in the spectra.

As far as PCR goes, I'm afraid that I have been a frustrated person when it comes to PC regression. (People with egos like mine hate it when things like this happen). I had stood in front of classes and said that "PCR is ideal when you had several pure major components all of which added up to 100% (like a Pharma product). PLS, on the other hand was better for small components, because the weighting emphasized the wavelengths at which there was correllation with the analyte".

Unfortunately, when challenged to test this assertion, I found that PLS did better for a simple test case mixture of 5 compounds. (Not all compounds were in all samples.) The test was with full spectra in both cases.

Since a consultant makes a living dispensing knowledge, I try to codify mine. Consequently, I have since adopted a new rule of thumb, which testing might find equally wrong. "PLS is better than PCR when you have significant non-liniarities in the wave-length range under consideration."

I think that this is an important area for people like us to spend some of our professional "spare time", (and we do have to make a living). I am not as expert in the development of calibrations as many of you. Personally, I spend all of my spare time trying to better adress the root cause of this problem. It is my (non-expert) belief that if we address the non-linearity in the data, the answers to some of these issues will resolve themselves.

We do use data, in reflectance (and in transmission on samples with significant scatter), that is inherently non-linear, and the degree of non-linearity does change within the spectra (here I speak as an expert). It is not surface reflection or stray light or instrment noise that causes it. It is inherent in the data we use. It is the form of our Absorbance function. The Kubelka-Munk function is no better.

My position: It is INSANE (Here I speak as a lay person, but I do have a sister who decides which wackos to lock up in Missouri. I stay away from there as much as possible.) to use non-linear regressions techniques on non-linear data, if you have a choice.

So I would say that one should limit the wave length range to exclude the most non-linear portions unless there is information that you just have to have there. I'm new to the group and perhaps haven't been privy to all the comments, but I would raise the question: "Does anyone have a reason why This is not a good rule of thumb?"

As far as wanting a quantitative analysis to be sensitive to the presence of impurities goes; all those who spend time trying to make an analysis insensitive to such problems may be justified in hitting anyone who expouses such a philosophy. There are qualitative tools to do this. And here you tight wads who don't want to waste any data can use the whole data range.

The noise issue is another deal. It seems obvious that using a region of high noise which does not contain important information is not constructive. While Howard is far more knowledgeable than me on this (O.K.. on almost everything else as well), it seems to me that the need to work in regions of considerable noise is a reason to go to these multi-multi wavelength techniques.

Thanks for indulging the new kid, and I look forward to sharing the results of my "spare time work" with some of you at Chambersburg. It was good to see some of you at EAS, where I first found out about the group from Bruce.

From: Howard Mark

Subject: Re: [Fwd: OK to share with the group - but I've been wrong before]

Bruce, Don and the group - I think Don put his finger on the main problem with full spectrum methods. It can be illustrated with the following thought experiment:

Imagine an ideal system: working in transmittance, noise-free spectrometer, in a clear, non-absorbing solvent, and a completely homogeneous solution of a single analyte. Suppose this single analyte has two absorbance bands, one is perfectly linear with concentration, the other one is proportional to the square of the concentration (you can come up with reasons for this yourself, they're not pertinent to the thought experiment).

Theoretically, a single analyte should be describable using a single degree of freedom, or a single PLS/PCA factor. In this case, however, it is clear that when there is non-linearity in the data, even though it is permitted, you can no longer use a single, all-encompassing factor to represent the entire spectral variation. The first question would be: what would that factor look like? The next one is, which band should it model? From the formulation of the thought experiment, is it clear that if it modelled the linear band, it could not accurately model the quadratic band, and vice versa. The actual factor you computed for this data would probably split the difference, and model both bands equally well (or poorly, to be more precise), with an attendent error in the analytical results.

In this contrived case, of course, a single wavelength anywhere within the range of the linear absorbance band would give a perfect model.

Howard

An added part to the recent exchanges dealing with which parts of the spectrum should be used in a calibration is as follows. According to statisticians, the importance of the bands can be ranked from high to low, and if each band has a positive or a negative influence.

The easiest procedure to use to accomplish this is to overlay the regression coefficients on the average spectrum, after doing a full spectrum calibration (not including noise, of course). One can mentally multiply the regression coefficients by the intensities of the corresponding bands at each wavelength. If the product at the wavelength of the band being tested for its influence is positive, the band has a positive influence on the calibration. If the sign is negative, the band has a negative influence. The larger the product, the larger the influence.

Thus, one could rank the relative importance of the bands having a positive influence and those with a negative influence. This doesn't answer which parts of the spectrum should be included or not, but it does give a roadmap of what parts of the spectrum should be used, or at least which ones should be eliminated first (the ones with the smaller product, either negative or positive).

What could be done other than using an overlay, is to have some way to multiply regression coefficients by the band intensity at each wavelength. The products would be listed as a function of wavelength. This would make the task easier. Does anyone have such a program?

Here is a second round of discussion on the same general topic.

The next three paragraphs were distributed recently. Comments follow them.

Another facet of how much of spectrum to use for a calibration is to determine how much importance each part of the spectrum has. I know there is software to show relative importance for wavelength regions in a calibration. There is a question in my mind, though, which is given below.

Assume one ranks the spectral regions or bands starting with the most important. One could then keep adding regions to the most important, starting with the next most important. A test on a validation set may indicate if the calibration was improved. If improved, the third most important region could be added. A point should be reached when addition of regions doesn't aid the calibration. (Conversely, one could start with the complete spectrum and delete the least important region, etc.)

The question is: Is there a theoretical procedure or equation to indicate the absolute importance of wavelength regions instead of going through the exercise above? If there isn't such a procedure, is it beyond our present capabilities and understanding to develop one?

A comment from Dave Russell

I've always looked at the question of "significance" as the area where the practitioner can "make a difference".

Having originally been trained on LT Industries software, I liked to compare a plot of simple correlation to that of a typical spectrum to see what matched up.

Then I picked up a tip from Harold Martens to look at residual variance as a function of wavelength and number of factors in Unscrambler.

But mostly I look at the loading vectors generated during PCR or PLS calibration to see if they are extracting information in areas of the spectrum that I'd agree are relevant.

And then as a final step, I like to show the regression coefficient vector plotted on the same scale as an average spectrum as a means to visually validate the model.

Is there software to do this? Perhaps, but I'm not aware of it.

Dave Russell

From: Lois G. Weyer

Both stepwise regression and PLS should be tried for any data set.

Let's not generate new misconceptions here.

From: Howard Mark

Bruce - In the context of MLR calibrations there is such a statistic: it is the t statistic for the regression coefficients (or, what is equivalent, the one called "the F for deletion statistic - this is numerically the square of the t statistic). These are both standard statistics, which can be used and compared with standard statistical tables. That is the plus. The minus side of these statistics is that they are both highly affected by the intercorrelations we find in normal NIR data. This is the reason why a good statistician will not use any one single statistic as an absolute guide to selecting a wavelength, but rather try to understand the underlying structure of the dataset using all the statistical (and non-statistical) information available.

1) Some of the corrections that need to be made do not have spectral structure (the prototype example is particle size)

I suspect that a lot of the problems people have with things like calibration transfer would go away if they got over this hangup with the numbers.

Howard

Several comments relating to the above. (Editorial addition?)

IM Publications

Wavelength Selection in NIR Chemometric Calibrations

Periodicals

Information for

About

Follow