A reference for SEP vs SEL? Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » A reference for SEP vs SEL? « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Friday, September 16, 2005 - 8:20 am:   

Bastianelli - your analysis is correct as far as it goes, but there is one other point that you need to consider. As you said, 95% of the data from a Normal distribution should be within +/-2 standard deviations. The corollary to that is that 5% should be outside those limits - even without any mistakes.

When you have 20 samples, 5% corresponds to 1 sample. Therefore, one sample SHOULD be outside the +/-2 SD limits. The problem you encounter here is that alone doesn't tell us how far outside the limits that may be.

Can it be as much as +/-3 SD? Possibly. The probabilistic statements are based on the behavior of single samples. When you have multiple samples, you have to take into account the multiple possibilities for a "rare event" to happen.

Let's say that the probability for being outside +/-3 SD is 0.01: that's the value for a single sample, taken at random. With 20 samples, there are twenty chances that one of them might be outside the limits, and the probability that any of them might be at or outside the limit becomes:

P = 1 - 0.99^20 = 1 - .8179 = 0.1821

Surprise!

But you have to be careful in the interpretation of this. What that means is that if you take many sets of 20 samples, 18% of the time a set will have one (or more) samples at or beyond the +/-3 SD limit. So it's not going to happen all the time, but it's not going to be as rare as the raw figure of 0.01 would make you think.

So when you have sets of 20 samples, just about all of them will have at least one reading beyond +/-2 SDs and 18% of them will have a reading beyond +/-3 SDs.

And that's all part of the normal (and Normal) behavior of data. If it leads to a high estimate of the SD that's unfortunate, but the distribution of SDs does have a fairly long tail on the high side. The only sure way to improve things is to take more readings, to get a better estimate.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Bastianelli (Dbastia)
Posted on Friday, September 16, 2005 - 2:46 am:   

Dear all,

This discussion on SEP-SEL is really great and indeed corresponds to the questions everybody's faced with in everyday's life.

I would like to come back on SEL calculation.

SEL can indeed be calculated from SD of a large series of measurements on a reference sample or from duplicates.

In both cases there is a question mark on the management of analytical oultliers.

In the lab, there are 2 types of unaccuracy:
- what I will call "error" is the analytical error linked to the method and to the way to perform it in the lab (equipement, skill, know-how ...). This should generally more or less follow a normal law.
- what I will call "mistake" is the result of a problem like a bad management of samples (labelling, sampling ...) or a ponctual failure in equipement, etc.

The interesting thing when comparing NIR predictions to ref data is the "error".
Thus, when calculating the SEL, it is therefore useful to use lab data without "mistakes" (which are not always easy to detect and separate from "errors" !).
On a practical point of view it is possible to remove the lab data outside the normal law, before calculating SEL. (There are many other ways to detect outliers - it is not the topic here). Generally we can be severe here (p=0.01) because what we want to remove is the outliers which have a high effect on SEL.
For example if you have 20 values on a standard lab sample, leading to a normal law with mean=0 and SD=1 (95% of values between -2 and +2) then a single point of value=3 leads to a SD of 1.11 and 2 points of value = 3 to a SD of 1.32 ! (i.e. a SEL of 0.79 or 0.93 instead of 0.71)

This can be important because an overestimated SEL can lead to biaised (and over-optimistic) comparisons with SEP of NIR data.
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Thursday, September 15, 2005 - 9:41 pm:   

Peter - well, I'm certainly not going to make a firm pronouncement on what the ROT should be. The whole point of rules of thumb is that they are generally NOT specific hard-and-fast rules based on firm underlying theory or other justification for some exact value.

Mostly, as in the case under discussion here, they are based on experience. If different people have different experiences, then guess what: they will come up with different ROT's (or should that be R'sOT?).

The value for the ROT that's been found to be documented is 2. If you check my original posting that started this thread, you'll see that I expected a value for the ROT of 1.5

The value 1.4 for the ROT is sort of justifiable based on theoretical grounds, since that's what theory says you should get if the error of the NIR equals the error of the reference lab (and both errors are random), so making that your spec limit implies that the NIR will be no worse than the reference lab. The 1.5 figure is a rounding of that.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Bruce H. Campbell (Campclan)
Posted on Thursday, September 15, 2005 - 7:49 pm:   

Pierre,
I'm a little puzzled. If there is a large bias in a particular situation, wouldn't the statistics based on randomness have a large error? Also, the reference you cited shows illustrations of both lines with intercepts of zero, as close as I can see. Am I missing something here?
Bruce
Top of pagePrevious messageNext messageBottom of page Link to this message

Peter Flinn (Peterf)
Posted on Thursday, September 15, 2005 - 7:04 pm:   

Howard,

I was not questioning Tom's formula. Would I dare?? I am just pleased to see a definitive answer to this for NIR users. It seems there is a large gap in my education through not having read your book!

The only outstanding point is the ROT "factor". Should it be 2? Should it be sqrt(2)? Should people use one at all?

Thanks, PF
Top of pagePrevious messageNext messageBottom of page Link to this message

Pierre Dardenne (Dardenne)
Posted on Thursday, September 15, 2005 - 1:33 pm:   

Bruce,

I think that intercepts are not involved here. When we compare SEP and SEL, SEP is the STD of Yref-Ypred or SEP corrected for bias. It is not the RMSEP.

Pierre
Top of pagePrevious messageNext messageBottom of page Link to this message

Bruce H. Campbell (Campclan)
Posted on Thursday, September 15, 2005 - 12:01 pm:   

A note of caution that I haven't seen is that when comparing SEP with SEL the intercepts must be statistically undistinguishable from each other. I have seen a formula that one can use when there are significant differences in least squares for monovariant data. I don't recall what it is, other than much more complicated than the one used for intercepts that are essentially the same.
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Thursday, September 15, 2005 - 8:26 am:   

Peter - to a statistician, or to anyone who understands the statistics, there's no question: Tom's formula is correct. As it should be, since Tom himself is a trained statistician.

I don't think anyone would question the use of the "ordinary" standard deviation formula, i.e.,

SD = sqrt( sum ((x-xbar)^2) / (n-1))

and to get from there to Tom's formula is a matter of some relatively straightforward algebra, you don't need to be a great math wiz to follow it. In fact, Tom includes some of the algebra in his article, although he leaves out some of the details. If you have a copy of my book I do the same thing, and include all the details. Turns out it's not hard, it's just a matter of thinking about it the right way. Then the math just falls right out.

And if you do it that way, there's no confusion, because there's only one answer, no matter which calculation you use. There are some practical advantages to doing it one way or another, in different circumstances, which is why the different formulas were developed: to suit the different circumstances. And when they give the same answer, there's no problems: you jsut use the formula that applies to what you want to do.

As for past confusions, there's no help for that, but I'd say we're better off not promulgating it, even if it takes a little education to help people make the transition.

The only remaining question is where the various ROTs came from. The answer goes back to the dim history of NIR, and basically is a matter of the empirical observations made by the early workers, that they could usually meet that spec in a variety of applications, so that's what they quoted when asked "How good is NIR?" And that became the ROT.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Peter Flinn (Peterf)
Posted on Thursday, September 15, 2005 - 7:42 am:   

Thanks Pierre and Howard (and Tom),

That will teach me for not reading the last NIR News thoroughly! My excuse is ICNIRS politics (make sure you read the next issue).

I will make sure I use "SEL(2)" from now on, but some may be surprised at the level of disagreement and confusion which has existed on this point.

The issue also has implications for Howard's original point. Take an example where (using my arbitrary notation) SEL(1) = 1.5, hence SEL(2) = 1.06. Depending on the SEL definition and the "factor" you use (2 or 1.414), the rule-of-thumb maximum SEC or SEP could vary from 1.5 to 3.0!

Who is going to be brave and state the optimum procedure here? As one whose statistical knowledge could comfortably fit on the back of a matchbox, I think it would be great to clear up the confusion on this topic, which in some quarters at least, has lasted for years.

Cheers, PF
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Thursday, September 15, 2005 - 5:57 am:   

Wow! We certainly seem to have generated a lot of discussion all of a sudden! And I thank everyone who has found the exact citation and contributed them - this is what I needed.

As far as the formulas for the various calculations: I derived a long time ago, and published it in "Statistics in Spectroscopy" (Academic Press, now Elsevier, on pages 59-63) that by starting with the regular formula for calculating standard deviation (i.e., to measure SEL as the SD of multiple aliquots from one well-mixed sample - and this is essentially equivalent to Peter's SEL(2) - which you can verify by letting n become arbitrarily large) then you can derive the following formula for calculating the SD (SEL) from blind duplicates:

SEL = sqrt(sum (D^2) / (2n))

This is the same formula that Tom presents in his NIR News article that Pierre mentioned.

The lack of the 2 in the denominator is what leads to the formula that gives a value that is sqrt(2) larger, that Peter calls SEL(1). I have to comment that, while it can be used as the basis for a legal standard, SEL(1) is scientifically "wrong" in the following sense: it is entirely possible to take aliquots from one well-mixed sample and compute the standard deviation (SEL(2)) of those, as described above. You can then take the same data, randomly select pairs of samples and compute an SEL(1) from those pairs. As Peter said, SEL(1) computed according to this SED will be sqrt(2) larger than SEL(2), because of the missing 2 in the denominator.

But since the data didn't change, there should be only one value for the SEL, therefore we have to conclude that there is something wrong with that calculation. If you use Tom's formula, you will get an answer that is essentially the same (except for what is loosely called "statistical variability") as from the calculation using the regular standard deviation formula.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Pierre Dardenne (Dardenne)
Posted on Thursday, September 15, 2005 - 5:02 am:   

Hi

I have to make a correction to my statement.

If SEL is large enough, SEPactual can be smaller than SEL itself.

Pierre
Top of pagePrevious messageNext messageBottom of page Link to this message

Pierre Dardenne (Dardenne)
Posted on Thursday, September 15, 2005 - 2:09 am:   

Hi,

Tom Fearn in the last NIRS news (NIR news 16/5 - July/August 2005) explains the method to calculate SEL. This is well done as usual for Tom and it seems rigth: it is my way for years!!

Look at
http://www.spectroscopyeurope.com/chemo_16_1.pdf
in this article Faber detailed a method to calculate uncertainty.
s(yest-yref)=SQRT((1+h).SEC^2 - SEL^2)
There are more references at the end of the article.

If SEL is large enough, SEPactual can be smaller than SEPobserved.

Hope it helps

Pierre
Top of pagePrevious messageNext messageBottom of page Link to this message

Dennis Karl (Dennisk)
Posted on Wednesday, September 14, 2005 - 9:03 pm:   

This is another great discussion. Uncertainty of Measurement is now a requirement of ISO17025 accreditation. It seems that there is a great deal of 'uncertainty' as to how to describe the errors in NIR so Peters' last sentence is going to be all important. When we report results, and when we are forced to also report the 'uncertainty' of the measurement, we will also have to state how it was derived. This discussion is already up and running in NZ in some quarters, as there can be a perceived advantage in reporting results with a smaller uncertainty than your competitor (if you are in a commercial laboratory). I am not sure what prompted Howard to request the literature source, but again under ISO you are usually required to be able to reference your basis of calculation.
Fortunately all his 'uncertainty' as to the source will help us fudge our way through our next audit!!!
Keep up the search
Dennis
Top of pagePrevious messageNext messageBottom of page Link to this message

Peter Flinn (Peterf)
Posted on Wednesday, September 14, 2005 - 8:25 pm:   

G'day Howard,

Heinrich is correct - in the forage world at least, the ROT has been that SEC/SECV/SEP should be no more than twice the SEL (p.39, USDA Handbook 643).

Another useful reference is Bill Hruschka's chapter (3) in the Williams and Norris book (I only have the 1st edition). On p.39 he states that an acceptable SEC "usually means from one to two times the LE" (laboratory error).

Sounds like the ROT depends on the size of your thumb!

A related issue is how you measure SEL. I agonised over this years ago when writing my MSc thesis. I know of 2 definitions - (1) SED of differences between (blind) lab duplicates, and
(2) SE of a single measurement as per the formula in USDA Handbook 643, p.96 (Windham et al).

The point is, SEL (1) is (sqrt 2) times SEL (2) so which is right??

Again, refer to p.41 in Hruschka's chapter. He seemed to prefer SEL(1) and that is what I ended up using, but he also said that either definition is OK provided you state what it is.

However, I wonder how many people do that?

Cheers, Peter
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, September 14, 2005 - 11:13 am:   

Heinrich - well, that's great! As you said, it's not exactly what I'm looking for, but it's the best I've seen so far.

In fact, I do have a copy of that handbook, and I'd even scanned through it, but without success. Without a page number it will be difficult to find such a needle in the haystack, but as long as I know I'm looking in the right haystack, I've got a lot more incentive to look deeper.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

W. Fred McClure (Mcclure)
Posted on Wednesday, September 14, 2005 - 10:52 am:   

There are numerous Rules of Thumb (ROT). I developed my own years ago that has been very helpful. For any calibration - if the ratio of the standard error of calibration (SEC) to the mean of the lab values (constituent or property) is less than 10, you have a good chance that the calibration will work. If the ratio is greater than 10, then you need to be careful.
Top of pagePrevious messageNext messageBottom of page Link to this message

Heinrich Pr�fer (H_Pruefer)
Posted on Wednesday, September 14, 2005 - 10:32 am:   

Howard,
This is not exactly what you have been looking for, but at least it will contribute to the overall topic of recommended SEL/SEP ratio.
I remember that such rules of thumb already had been used in the eighties, but in my files I only could find this indirect citation:

"A commonly accepted criterion for useful NIRS equations is that the SEP (or with new equation development methods, SECV) should be less than two times the SEL (Marten et al., 1989)."
(J.L. Halgerson, C.C. Sheaffer, N.P. Martin, P.R. Peterson and S.J. Weston, Agron. J. 96:344-351 (2004)).

According to the references in this article, the primary source should be found in:

Marten, G.C., J.S. Shenk, and F.E. Barton, II. (ed.). 1989. Near infrared reflectance spectroscopy (NIRS): Analysis of forage quality. Agric. Handb. 643. USDA-ARS, Washington, DC.

I could not check this citation because for the moment I do not have access to this handbook. The factor of two seems to be a bit too large compared to the common sense, but it could perhaps reflect the improvement of NIR technology over the last two decades (or be a transcription error if the factor was originally related to the squared terms).

Personally I have applied the sqrt(2)-factor up to now because as long as my "secondary" spectroscopic method does not become the dominant source of prediction variance I can sleep better.

Best regards,
Heinrich
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Friday, September 09, 2005 - 11:23 am:   

Thanks, Dave. I'll alert her in case she's not following this thread

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

David Russell (Russell)
Posted on Friday, September 09, 2005 - 10:27 am:   

I recall some speakers from South America who I believe were doing Octane Number work. Perhaps Susan Foulk would know if she's connected to this board.
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Friday, September 09, 2005 - 10:07 am:   

Bruce - yeah, this one's a toughie. I found a couple of references for the formula that

SEP^2 = SEL^2 + SE(NIR)^2

and from that the sqrt(2) factor is "intuitively obvious". That may be the best we can do: by analogy, if you know how to add, then you don't need to separately document that 1+1=2, 2+1=3, etc.

But I'm going to keep my eyes open, and if someone can come up with an actual reference for the detail, I'd be appreciative.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Bruce H. Campbell (Campclan)
Posted on Friday, September 09, 2005 - 9:42 am:   

Howard,
I looked some more and still can't find the reference. I seem to remember it was in a paper or presentation from someone in South America, probably at a university or college.
Bruce
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Thursday, September 08, 2005 - 1:58 pm:   

Bruce - if you could find the reference I'd appreciate it. There's no problem finding places that give the formulas for the two statistics; the ASTM E1655 practice, for example, has the equations for both of them. The problem is to find a written recommendation for the relationship between them, that could be referenced.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Bruce H. Campbell (Campclan)
Posted on Thursday, September 08, 2005 - 1:29 pm:   

Howard,
I have seen a treatment of SEP and SEL written but I can't remember where. A quick search of my literature didn't find it.
Bruce
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Thursday, September 08, 2005 - 11:41 am:   

"Everybody knows" the rule-of-thumb that the SEP (Standard Error of Prediction) from an NIR calibration should be less than (or equal to) 1.5 times the SEL (Standard Error of the Laboratory). There's even theoretical justification for this, because if the "true" (although unknown) error of the NIR analysis equals the SEL then the SEP should equal (approximately, at least) the SEL times the square root of two. This much is even stated in the EMEA guidelines for NIR analysis.

The question posed here, however, is: does anybody know where any of these statements is explicitly written as such in the open literature? This can be in the general NIR, analytical or chemometric literature, or in an industry-oriented journal.

Howard

\o/
/_\

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.