PLS for identifications Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » PLS for identifications « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Doubleblind
Posted on Wednesday, October 12, 2005 - 6:21 am:   

To all the kind contributors ,

I am rather busy and I 'd like to thank you for the support.

I hope I 'll be back soon with more details.

All the best

DB
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (Td)
Posted on Monday, October 10, 2005 - 9:15 am:   

Hello again Db,

Sorry it has taken so long to get back to you - other pressing matters!

If 1.0 is good and 0.0 is (pure?) bad then an answer of, say, 0.3 means that the spectrum of the sample is more similar to the bad spectrum than the good one but we would guess (without knowing anything about the spectra!) that such a sample would be a very long way out of spec.

Many years ago I made several (well at least two) suggestions that the food industry should use NIR spectroscopy for general surveillance operations by comparing spectra of incoming samples to spectra of known �good� samples and using PCA to plot the variations. My belief was (and still is) that important variations would be easily spotted. However, I did not attract any interest and we never did a real study. Perhaps you can do it now!

Unless you really have a process in which you only have one known contaminant then I do not think you should set up a quantitative calibration to check a maximum level of the known contaminant. You have not checked for other contaminants and they might not have any effect on your calibration.

Best wishes,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Friday, October 07, 2005 - 8:34 am:   

Doubleblind,

I think all 4 commenters agree that now you need to go back and design a new experiment with new samples. Then you can compare the two approaches, discrimination vs quantitation of the known contaminant.

In any case, you need to answer the question, was is the limit of purity (impurity) that is acceptable, so you can make samples that cover the range from unacceptable to acceptable.

How are you taking the baseline for each of the cuvettes? I don't think using an empty cuvette is going to be a good enough procedure, for the sensitivity you probably want. The optics of an empty cuvette and one filled with a liquid may not be close enough to give you useful baselines. Are you using such nasty samples that you have to use disposable cuvettes? Why not use quartz cuvettes?

Best wishes,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

DJDahm
Posted on Friday, October 07, 2005 - 7:22 am:   

Hello, Doubleblind:

You said: " this would actually be a quantitative calibration". Right on!!

As a crotchety old man, getting meaner by the day, it seems to me that you are being stubborn. (Not that I know anything about that trait.) You have a clean material available; you have the likely contaminant; you have a liquid system in which you have a good chance of actually being able to synthesize samples that would be meaningful for calibration. I think you should first try the classical quantitative calibration, and resort to classification methods in more complex cases. If you have such a complex case at hand, then you will certainly need to consider more samples in the model.

There are some principles of a "sampling strategy for classification" that have been hinted at in this discussion. In a past discussion, Howard Mark said:
"A material can be 'out of spec' in many different ways, presumably most of those would affect the material's spectrum differently. If you don't know beforehand ALL the ways that out-of-spec can occur, then you'll likely miss some. Then, when one of those happens during routine use, it will erroneously register as being 'in-spec' - not good! By creating a model based only on 'good' samples, you're essentially telling the algorithm, 'this is what good material looks like, let me know if you see anything different'.

Measuring out-of-spec samples during the validation step is sometimes called 'challenging' the qualitative model, to verify that it can, at least, flag samples that are known to be bad. The more thoroughly you can challenge your model, the more confidence you can have that it will perform correctly in routine use."

Now, you are one step closer to a good model than the guy Howard was writing to. You have included a contaminant spectrum. In my experience, which is certainly not as extensive as that of the fine gentlemen who have been writing, building "GOOD - BAD" classification models are easy in the extreme, and very hard if you want fine discrimination. If you need such fine discrimination, then the most important samples to have in the model are those that are "just out of spec", and to have them labeled as "BAD". A way to get such samples is to develop a model and predict the "Quality" of samples. Then when you encounter material that you "PASS", and the reference method "FAILS", the sample becomes one of those important new entries into your data base.

I understand that you are motivated, at least in part, by the desire to learn something from this problem, (maybe have some fun as well), and hopefully even solve your employer's problem. If I was having such fun, I would consider doing principle component analysis (using only the weakly absorbing spectral regions to avoid non-linearity), and see how much of each actual components showed up in the principle components. (Bosses are usually impressed with this if it works.)

Now to answer the question that you asked (very bluntly):
" The question was and is : does this mean that an unknown sample predicted to be 1.08 or 0.92 has only a very small chance to be a good sample ? "
In my opinion, we don't have a clue as to what the standard error values mean because you have not set up a meaningful sample set. You have, in essence, picked two points on a curve and assumed that you are working with a straight line. If the function is truly linear, then you may have a semi-meaningful estimate of random errors in your data, and you can use statistics. If it is curved, you are asking a meaningless question.

See, now I've taken the pressure off you, and someone else can argue with my points, and we will all learn something. Are you planning on coming to EAS? We'd love to break the Doubleblind code.
Top of pagePrevious messageNext messageBottom of page Link to this message

Doubleblind
Posted on Friday, October 07, 2005 - 1:51 am:   

Hi David ,

samples are transparent liquids.
I understand that PMMA absorbs in the NIR but blanks of each cuvette are taken before the scan and that signal is subtracted to the spectrum.
I can follow your suggestion to use samples that cover the spec area and something more ( this would actually be a quantitative calibration ).
Let me go through the full story :
in the begining i was looking for a cluster calibration but than I heard about semiquantitative calibrations and I decided to try that way.
The results of this last attempt is the topic of this tread.
A typical cluster analysis gives an answer like in or out the cluster and I think I understand the meaning of that but when it comes to semiquantitative calibrations than I am a bit concerned about the concept of a sample being good or bad.
For example if I had a SEE of 0.02 than I would expect that 68 % of my good sample fall within an interval of 1 +/- 0.02.When it comes to measure unknown samples than the SEP would be used to calculate the same interval.
The question was and is : does this mean that an unknown sample predicted to be 1.08 or 0.92 has only a very small chance to be a good sample ?
Rgds
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Thursday, October 06, 2005 - 8:26 am:   

Hi Doubleblind,

I agree with what Tony says, but I think you may have some other procedural problems. What kind of samples are the pure and contaminant materials? Liquid? Powders?

Why are you using PMMA cuvettes? As PMMA absorbs in the NIR, small variations in the cuvettes may mask the differences you are looking for. You may need to rethink your sampling strategy.

What are the tolerance limits you need to set on the purity? You need samples that span that range so you can determine whether you can make a useful discrimination.

Best wishes,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

Doubleblind
Posted on Thursday, October 06, 2005 - 5:25 am:   

Thanks Tony ,

could you elaborate on that "more similar to " concept?

Best regards
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (Td)
Posted on Thursday, October 06, 2005 - 5:18 am:   

Hello Db!

With regard to the "normal" use of PLS you are in danger of over-fitting. However your use is not "normal" and I am not convinced that you have set it up correctly. I think that you want to detect if your test sample is more contaminated than a certain specification but the calibration you have described will tell you if your test sample is more similar to your contaminant or the "good" material.
As you have the main contaminant available, I think you should make mixtures of your material (the purest sample you can get) and the contaminant; 0 to 10% contaminant. Then you can run your PLS on these more typical samples (you would assign 1.0 to satisfactory samples and 0.0 to all samples that are more contaminated than your specification.
If you have or can get a really pure sample (99.9%?) then you could just use spectroscopy. Look at the differences between the pure and test samples.

Best wishes,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

Doubleblind
Posted on Thursday, October 06, 2005 - 2:02 am:   

Thanks David ,

OK I have a total of 41 spectra collected from 8 chemically different materials.
The software automatically assigns 2/3 of the spectra to calibration and 1/3 to validation.
To be more precise 4 bacthes of the pure contaminant and 4 batches of the good material.
Only three factor were needed for the discrimination.
Each spectrum is the average of three scans.
Samples are read in a 0.5cm pathlenght cuvette ( PMMA ).
The contaminant has a very different spectrum from the good material.
Good material is more than 99.5 % purity and the main contaminant is less than 0.5 % ( Curiously the pure material and the contaminant must sum up to 100 % ! )
Does this all means that my calibration is overfitted ? I don't have too many batches of this material as we only purchase it seldom.
What do you think ?
Thanks
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Wednesday, October 05, 2005 - 10:36 am:   

Hi Doubleblind,

It appears that you have a very good discrimination of the good and bad batches of material. The SEP of 0.02 means that 95% of your observations lie within 0.04 units ("2 standard deviations") of the appropriate reference reading. You can very safely use a cut-off of 0.5 for the discrimination of good vs bad.

A sample with a predicted value of 0.95 can confidently be accepted as a good batch, as it is extremely far from 0.0 and less than 3 SDs from the target value of 1.0. There is nothing you can say about the percent composition.

What you have not told us is, how many samples were in the training set, whether you have a test set, and how many PLS factors you selected for the calibration. The usefulness of your calibration will depend on your answers to these questions. You have to guard against overfitting, and from the statistics, I'd guess you may be overfitting, as that low a standard error sounds too good to be true.

Best wishes,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

Doubleblind
Posted on Wednesday, October 05, 2005 - 10:01 am:   

I have loaded many different batches of the material I want to identify and many pure samples of the main contaminant for this material.
Good samples were given a qualitative score of 1 and contaminant samples 0.
A PLS was run with a SEP of 0.02.
The question is what am I actually predicting ?
What is the meaning of having a sample predicted as being 0.95 ?
Has this something to do with percent composition ?
Thanks

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.