Available software? Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Calibration transfer » Available software? « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 31
Registered: 9-2003
Posted on Thursday, March 13, 2008 - 3:34 am:   

I forgot to mention that this is one of the topics of the next 3-days course on uncertainty estimation in multivariate and multiway calibration (October 13-15, Nijmegen, The Netherlands). A flyer, as well as some example slides, can be downloaded from

http://www.chemometry.com/Training.html

The example slides form a presentation given at the meeting of the Danish Chemometrics Society.

Kind regards,

Klaas Faber
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 30
Registered: 9-2003
Posted on Thursday, March 13, 2008 - 3:31 am:   

I apologize for dropping into what seems to be a closed discussion, but it has been some time ago that I looked at this list.

Current validation amounts to trial and error. In equation form:

MSEP = (1/n) sum((Yprd-Yref)^2)

The use of that formula is essentially correct ONLY if Yref is sufficiently precise, otherwise it is problematic (an understatement really). With imprecise reference values, the use of that formula overestimates the true average predictive uncertainty. A straightforward correction is to subtract out the variance of the reference method. That correction is essentially the same as the one that improves an ordinary least-squares straight line fit when the x-values are noisy. [Random noise adds to the x-variance as a result of which the x-values get too much weight in the regression; since the functional form of that effect is known, it can be corrected for.] The correction introduces an uncertainty, unless the variance of the reference values is exactly known, otherwise it is just another trade-off of variance against bias. Bruce Kowalski and I recognized that this correction should be used with caution and we therefore proposed a "soft" correction that will enable one to make a statement that one is 95% confident (say) of not over-correcting. All of that assumes that one has an estimate of the error variance that is well-characterized. [N.B. A "conservative" estimate automatically leads to over-correction.] It is the same with correcting a straight line for which the slope estimate is biased towards zero because of noisy x-values. It has a probability of 50% of over-correcting. That's no reason to stick with a biased line.

The crux is that these problems are caused by using a formula for what it is not intended: a single equation with 2 unknowns instead of 1. They are not caused by what is a general solution scheme.

Regards,

Klaas Faber
Top of pagePrevious messageNext messageBottom of page Link to this message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 88
Registered: 9-2001
Posted on Saturday, March 24, 2007 - 2:57 pm:   

Pierre - obviously the n must be there in the denominator, otherwise the computation would give continually increasing values as differences from more and more samples were included (BTW, I mention this not for you, I'm sure you already know that; I mention it for the benefit of any novices in NIR and statistics who might be following the thread).

The other point, though, is more subtle, as I said. Very often the laboratory error is not small; in many cases it can be the dominant error source giving rise to the observed differences between the reference and the NIR readings. This is confirmed in a paper by Rocco diFoggio, that I consider classic (Appl. Spectr.; 49(1); p.67-75 (1995)).

This is also confirmed by many NIR papers appearing in the literature, where the SEL is comparable to the SEPobserved, so that at the very least, it is not a rare occurrence.

Here's where the subtleties come in. The variances corresponding to the SEL and to the SEPobserved ae both chi-square distributed. The 95% confidence interval of chi-square/d.f., for a reasonable number of samples (let's say 50, to make the example concrete) goes from 0.798 to 1.40. That means that the measured variance on sets of 50 samples could randomly vary from 0.798 to 1.40 times the true variance; this corresponds to the standard deviation having random variation that can go from 0.893 to 1.18 times the true standard deviation.

Again for the sake of a concrete example, let's say that the SEPobserved for some particular calibration is 0.12, and the SEL is 0.10; these are not unreasonable values to come across.

The 95% confidence limits for the two standard errors in this case would be:

0.0893 to 0.118 (for the SEL)

0.107 to 0.142 (for the SEPobserved)

For those values, the calculated value of the SEPnir could range between the following limits:

Best case:
sqrt (0.142^2-0.0893^2) = sqrt(0.0201 - .0079)
= sqrt (0.0122) = .110

Worst case:
sqrt (0.107^2 - 0.118^2) =
sqrt (0.0114 - 0.0139)
= sqrt (-0.0025)!

In the cases where you wind up with a negative value for the argument of the square root, you are at least warned that something is amiss.

But, in fact, in any particular case, you could randomly get any value in between what I show as the best and worst cases, so even if you get something that appears reasonable, such as 0.11 as we found for the best case, this is still only a value computed essentially at random and therefore, is not a reliable indicator of the "true" NIR error.

The only time you can get a reliable value for SEPnir this way, is if the SEPobserved (as we're calling it) is MUCH larger than the SEL. How much larger it needs to be requires more comprehensive discussion than we can deal with here; it is covered in books on basic Statistics, however.

Klaas Faber and I have been discussing this issue since he wrote a paper describing the calculation of SEPnir via the differences of other SEP's. Klaas agrees with me that the discussion in that paper is incomplete in that it doesn't give warning about this type of situation arising. We've been discussing what should be done about what appears to be a growing misperception about this, but I think that when we come across a case of it, we need to at least warn the people involved about this pitfall that exists for the unwary, in this respect.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Pierre Dardenne (dardenne)
Advanced Member
Username: dardenne

Post Number: 23
Registered: 3-2002
Posted on Saturday, March 24, 2007 - 12:46 pm:   

Hi Howard,

Thanks to correct me. You are right. The division is by 2n
SEL=SQRT(SUM((Rep1-Rep2)^2)/(2*n))

"This is especially pernicious if the NIR SEP is small". I do not understand entirely your subtle considerations, but most of the time SELab is small regarding SEPobserved.

Then the idea is that NIR predictions are not much affected by the error from the reference values. In practice it means than I prefer 100 samples done in single than 50 in duplicates for the calibration step. In validation, we can use duplicates or more if needed.

Pierre
Top of pagePrevious messageNext messageBottom of page Link to this message

Natalia R. Sorol (naty)
Junior Member
Username: naty

Post Number: 6
Registered: 9-2006
Posted on Friday, March 23, 2007 - 11:25 am:   

thanks to both of you. I�ll follow your advices and then I will tell you how it was
Naty
Top of pagePrevious messageNext messageBottom of page Link to this message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 87
Registered: 9-2001
Posted on Wednesday, March 21, 2007 - 7:39 pm:   

Pierre - whoops! - you left out an important term from your expression for computing SEL from duplicates. The expression should be:

SEL=SQRT(SUM((Rep1-Rep2)^2)/(2*n))

Where: n is the number of different samples from which you take the duplicate readings.

But I have also another point I've been wanting to mention, from an earlier posting on this thread.

When you gave the equation for computing the error of NIR readings from the difference in variances between the total difference: sum(NIR-lab)^2 and the measured SEP of the lab, using the equation:

SEPnir=SQRT(SEPobserved^2 - SELab^2)

there are potential problems in using it. The problem areas are subtle, but potentially serious and can give rise to large errors in the calculation of the NIR SEP.

This is especially pernicious if the NIR SEP is small compared to the other two quantities. In that case you not only have the general problem of computing a small difference between two large values, but also the additional issue that the two large quantities are themselves randomly and independently varying.

Under these circumstances, it's entirely possible for the variations in the large SEPs involved to actually be greater than the actual SEP of the NIR readings, and in this case the computed SEP of the NIR readings is, essentially, a random number having nothing to do with the actual SEP of the NIR measurements.

In the most extreme case, the SELab^2 can actually be greater than the SEPobserved^2, and in that case the difference will result in a negative value for SEL^2. In one sense, that's a preferred outcome, since the computer will invariably warn you of the situation by flagging a "square root of negative number" error.

But in the less extreme cases, you'll have no warning that anything is going wrong, nevertheless the computed SEPnir will still be little more than a random number.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Pierre Dardenne (dardenne)
Advanced Member
Username: dardenne

Post Number: 22
Registered: 3-2002
Posted on Wednesday, March 21, 2007 - 6:15 pm:   

Natalia,

Bruce already answered.

From the 3700 samples I used only 2 segments of cross validation. As I have the information (ref values), removing redundant information does not help the future predictions.

Using SELECT to remove the redundant information makes your validation set totally inside the cal set. This is the reason your SEP are lower than the SEC. A true validation would be to test the samples of the next year(s). Then, you will have independent set(s).

The SEL can be estimated as you said. But instead of analyzing 1 sample 10 times, you can use the classical duplicates on many samples and the SEL is simply

SEL=SQRT(SUM((Rep1-Rep2)^2)/2)

As Bruce said, SEL can be different if the duplicates are done the same day (hour) by the same operator on the same equipment than if they are done at different days by different operators. This latter will give higher SEL and then if you subtract it from the total SEP, you will obtain a better SEP_NIR: anyway, the PLS models with high R2 and thousands of samples are rather independent of the wet chemistry error.


Pierre
Top of pagePrevious messageNext messageBottom of page Link to this message

Bruce H. Campbell (campclan)
Moderator
Username: campclan

Post Number: 96
Registered: 4-2001
Posted on Wednesday, March 21, 2007 - 4:44 pm:   

Natalia,
The lab error is often given in two pieces. One is the error associated witn one analyst performing the operation a number of times within one day and using one piece of equipment. The resulting error value is often underestimated in this fashion. A more complete error value is for more than one analyst to perform the operation on the samples spread out over more than one day and using more than one piece of equipment.
A personal observation is that the first error type I gave above is often two times smaller than the second one. Also, with the large number of samples you ran, I think it would not be possible for one analyst to do them in one day, so the lab error you have could very well be somewhere between the two types of error I mentioned.
Bruce.
Top of pagePrevious messageNext messageBottom of page Link to this message

Natalia R. Sorol (naty)
New member
Username: naty

Post Number: 5
Registered: 9-2006
Posted on Wednesday, March 21, 2007 - 7:20 am:   

I have another question. How do you calculate SELab?
I think it�s the standard deviation of a number of measures (for example 10) of the same sample. Am I correct?
Thanks
NAty
Top of pagePrevious messageNext messageBottom of page Link to this message

Natalia R. Sorol (naty)
New member
Username: naty

Post Number: 4
Registered: 9-2006
Posted on Wednesday, March 21, 2007 - 6:41 am:   

thank you pierre! thats very important for me. That gives me the clue that I`m working correctly. I have another question for you. First I had 2049 spectra. I have Win ISI software and I�ve used the sample selection from spectra file option to identify the "redundant spectra". So I made the equations only with the selected 990 spectra. Is this process reliable? Did you do the same thing or used the 3700 spectra to do the equations?
Best regards...
Naty
Top of pagePrevious messageNext messageBottom of page Link to this message

Pierre Dardenne (dardenne)
Advanced Member
Username: dardenne

Post Number: 21
Registered: 3-2002
Posted on Tuesday, March 20, 2007 - 3:04 pm:   

Natalia,

I got recently SECV of .25 for Brix and .15 for POL on a set of 3700 spectra. Your results are OK.
If you knew the error of the reference methods, you could estimate the final
SEPnir=SQRT(SEPobserved^2 - SELab^2)

Pierre
Top of pagePrevious messageNext messageBottom of page Link to this message

Natalia R. Sorol (naty)
New member
Username: naty

Post Number: 3
Registered: 9-2006
Posted on Tuesday, March 20, 2007 - 7:34 am:   

sorry, in the validation values it isn�t SECV but SEP. My mistake
Naty
Top of pagePrevious messageNext messageBottom of page Link to this message

Natalia R. Sorol (naty)
New member
Username: naty

Post Number: 2
Registered: 9-2006
Posted on Tuesday, March 20, 2007 - 6:27 am:   

Thank you David for those useful comments. I would like to know if there is any one in this forum who works with sugar cane juices and shredded cane. It wold be interesting to share some info about this for me to know if I�m in the right way, if I�m working correctly.
For example I�m studing Pol%juice and Brix%juice and the values I have already are:
Calibration N R^2 SECV
Pol%juice 908 0.9842 0.3079
Brix%juice 882 0.9934 0.1882

Validation N R^2 SECV
Pol%juice 1059 0.976 0.295
Brix%juice 1059 0.989 0.178

Are this values acceptable for you? What do you think?
Thank you very much for your help. Any thoughts appreciated...
Natalia
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (dhopkins)
Senior Member
Username: dhopkins

Post Number: 105
Registered: 10-2002
Posted on Monday, March 19, 2007 - 11:49 am:   

Hi Natalia

Yes, you can take the SEP as an error, but it is a "1-sigma" value, so that to obtain 95% confidence, you should use 2*SEP. Sorry, it makes your results look worse, but that's the way the statistics work. Of course, that is the interval between the reference value and the NIR result, and the reference value has it's own associated error. You should be sure to measure the reproducibility of repeat measurements by NIR, so you can see the error of the NIR method by itself.

Good luck on your project!

Best wishes,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

Natalia R. Sorol (naty)
New member
Username: naty

Post Number: 1
Registered: 9-2006
Posted on Monday, March 19, 2007 - 10:36 am:   

hi,you all. I�m working on my first assignment in NIR in juice cane and shredded cane in Argentina. This forum is very useful for me because I don`t have any experince in the field of NIR spectroscopy.
I�m using a FOSS NIR System 6500 for the determination of parameters like Brix and Pol.
My question is: Can I take SEP as an error? for example: Brix +/- SEP?
Is SEP the standard desviation of the residuals with a probability of 95%??
thanks for your help. It�s very important for me

Natalia
Top of pagePrevious messageNext messageBottom of page Link to this message

Lois Weyer
Posted on Wednesday, February 25, 2004 - 10:02 am:   

Is there a commercially available calibration transfer software package? Perhaps something compatible with GRAMS, Unscrambler, or Pirouette?
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, February 25, 2004 - 10:56 am:   

Lois - as far as I know there is no commercially available general-purpose "calibration transfer" package. Among other reasons, I'd say, is that no two people create their "transferable calibrations" the same way. But at the heart of the algorithm there is usually a slope and bias correction, so if you've got software that will do that for you then you should be most of the way there.

Howard

\o/
/_\

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.