The M-Distance Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » The M-Distance « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Jens Rademacher (Rademacher)
Posted on Tuesday, July 02, 2002 - 5:56 am:   

Hello
Usually one says, when measuring with NIR (Protein in grain for example), a m-disance up to 3 is acceptable. But what, if the m-distance reaches higher values: Are the results really wrong? If not: At which m-distance would one say: thiese results are not acceptable in any way? Is there any literature about this point?
Thanks
J Rademacher
Top of pagePrevious messageNext messageBottom of page Link to this message

Peter Tillmann (Tillmann)
Posted on Tuesday, July 02, 2002 - 6:27 am:   

Jens,

as far as I can judge it's Howard Mark who taught us H values (or m-distances):

Howard Mark 1986:
Normalized Distances for Qualitative Near-Infrared Reflectance Analysis
Anal Chem 58,379-384

Howard Mark, David Tunnell 1985:
Qualitative Near-Infrared Reflectance Analysis Using Mahalanobis Distances
Anal Chem 57,1449-1456

And any textbook on statistics with keyword Mahalanobis.

The point behind H values is the key question of NIRS whether the calibration samples fit to the sample of interest (or vice versa). By statistical means an H value of 3 is the limit for this question with an error of 1%.

But whether the predicted value of the samples can be trusted can't be said. This question can only be answered by experience with a specific material / calibration. And it will seldom be that samples below an H value of 3.0 give good results in validation and those with an H value above 3.0 will fail.

So using any H value will be a question of talking a chance. (And using 3 as a limit will be easily accepted from third parties because they can read textbooks as well.)


Yours

Peter Tillmann
Top of pagePrevious messageNext messageBottom of page Link to this message

Lois Weyer (Lois_Weyer)
Posted on Tuesday, July 02, 2002 - 7:13 am:   

I have used an M-distance of 25 in one situation. I arrived at the number by trial-and-error, checking back over results, keeping those spectra which gave good values while eliminating those that didn't. Some of the bad spectra had M-distances in the thousands. This was a Raman analysis, but the concept was the same as if it were near IR.
As Peter Tillmann said, your question can only be answered by experience with the specific material/calibration.
Lois Weyer
Top of pagePrevious messageNext messageBottom of page Link to this message

Kathryn Lee
Posted on Tuesday, July 02, 2002 - 7:28 am:   

I have found that samples with H values above (or even sometimes below) 3 can have very poor results for one component, yet good results for another component. This indicates that a high H value can be based on a spectral region that does not contribute much to the calibration of one component but contributes a lot to the calibration of another component.
Kathryn Lee
Top of pagePrevious messageNext messageBottom of page Link to this message

Stephen Medlin (Medlin)
Posted on Tuesday, July 02, 2002 - 7:41 am:   

Just to add my 2 cents worth.... There are times when I've had to accept H values that were large and the results were good. Typically, this occurred when specific software could only handle 1 calibration model for mulitple components (rather than an individual model for each component). In this case, one component may be high (or perhaps a small negative concentration predicted due to noise in the baseline) and the total H value will be high although each component is a good prediction.
Top of pagePrevious messageNext messageBottom of page Link to this message

Richard Kramer
Posted on Tuesday, July 02, 2002 - 8:04 am:   

This abuse of a guideline of M dist < 3 has been bothering me for quite awhile. It is frequently inappropriate.

Also, I think that the customary practice of calculating the inverse covariance matrix used for M dist based on the samples in the training set is wrong! More on that in a moment.

Also, M dist alone does not tell much of the story. Spectral residuals should also be considered. Indeed, I think spectral residuals is the more reliable primary indicator. M dist can sometimes provide insight, but relying on it as a primary, or mandatory validity indicator is sub-optimal, and can cause the unnecessary rejection of a perfectly valid estimate, particularly if the < 3 rule of thumb is applied. It can also lead to improperly applying a calibration to an unknown sample which could have significant spectral deviation in spectral dimensions which are not modelled by the M dist. This can yield an apparently acceptable M distance even though the spectal residuals are unacceptably high.

Here is a quick illustration of why M dist < [some number often greater than 3 and sometimes much greater than 3] is usually a more appropriate rule of thumb when using a "blind" rule of this type.

Let's consider two different small training sets of 15 data points each. These happen to be some typical calibration data I've worked with, so this is simply an anecdotal illustration.

I calculate the M distances for each point in each training set and show the result in two columns: first column - first set, second column - second set:

2.7443 3.2050
3.3273 3.5980
3.0456 3.4148
2.8252 3.4866
3.4260 3.3483
3.4335 3.2477
2.8480 3.0907
3.1381 2.8941
2.7597 2.8405
3.0485 3.0577
3.2006 2.3374
3.5167 1.8739
2.2797 3.2226
2.8495 3.0302
3.1333 2.7051

Note that there are a number of data points in these training sets with M distances greater than 3. The problem with the < 3 rule of thumb is easily understood if we consider, for example, a calibration generated using the second data set (column 2). If we were to use this calibration to estimate an unknown similar to the second sample in the training set it would have an M distance of around 3.6 (row 2, col 2), leading us to exclude this sample from eligibility for prediction with this calibration if we were to us the M dist < 3 guideline. Since this sample is, itself, a member of the training set, refusing to accept the validity of this calibration for this sample is clearly wrong! A check of spectral residuals, on the other hand, shows this sample to be well within the bounds of acceptability. For this particular calibration, an M dist boundary < ,perhaps, 4 would be somewhat more appropriate than the < 3 rule of thumb.

Now for my second point about these eligibility tests-

Whether we are concerned with M distance, spectral residuals, or any other type of test for the validity of using a calibration to estimate a particular unknown ....

The base population from which the threshold value is derived should be the samples used for validationg a calibration NOT the samples used to create the calibration. The reason is simple:

It is irrelevant how a calibration is derived. It can (and usually is) based on modelling a carefully chosen set of training samples, but a calibration can just a validly be generated in any other fashion. The calibration coefficients could be mapped or transformed from some other related calibration, or they could even be selected at random! Where the coefficients come from is absolutely irrelevant!

Accordingly, it is not of the slightest interest whether or not an unknown sample is represented in a set of training samples. Indeed, there need not be any training samples at all with which the unknown sample may be compared!

What IS relevant is that the calibration, however it may have been derived, must be properly validated using a suitably representative set of validation samples. Accordingly, the RELEVANT question is whether or not an unkown sample is properly represented in the set of samples which was used to VALIDATE the calibration in question. Thus, whenever we apply an M distance, spectral residual, or other screening test to ascertain whether or not a particular sample is eligible for analysis by a calibration, we should be comparing the unknown sample to the validation set used to validate the calibration, NOT to the training set.

Note some of the consequences of this important principle.

A. A calibration originally derived over a narrow range of analyte concentrations can be properly extended for use over a wider range of concentrations by properly validating it over the wider range.

B. A calibration originally derived over a wider range of analyte concentrations can NOT be properly used over the full wider range of concentrations if it was only validated with validation samples over a narrower range.

Richard Kramer
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Tuesday, July 02, 2002 - 10:42 am:   

Wow, what an animated discussion!

So I thought I'd stick my nose in, as well. First let me thank Peter for the compliment. But in all fairness I have to point out that we are all taught about Mahalanobis distances by Sir P.C. Mahalanobis. Plus, I might add, some unknown statisticians who used it to write the original (mainframe) SAS program, where I first came across the concept. As for the references to my work on that, Peter missed one(1), although I'm certainly flattered that he took the time and trouble to look up those two that he mentioned.

However, another point I must make is that, with all the discussion about whether three or some other number of Mahalanobis distances is the proper value to use for evaluating the validity of samples, everybody missed one very key point: the original papers mentioned were intended to describe the use of Mahalanobis Distances for QUALITATIVE analysis (identification), and the specification of three Mahalanobis Distances was intended for use in conjunction with making a decision as to whether a given sample belonged to one group or another. In passing, the possibility was mentioned that it might be applied to single groups to help decide group membership, but that was still intended to be used in the context of a qualitative assessment.

Even so, the value of three Mahalanobis Distances published there was a "rule of thumb" figure. Whitfield(2) subsequently published tables of the critical values of Mahalanobis distances to use for several confidence levels in various situation, including the case of small numbers of samples. From Whitfield's tables you can see that thresholds much greater than three must sometimes be used, even for the qualitative case.

Furthermore, the original papers describe its use in conjunction with data that exists in wavelength space, i.e., Mahalanobis distance calculations based on the use of data at a small number of individually selected wavelengths.

That is not to say that using Mahalanobis distances in a quantitative calibration situation is invalid or incorrect, nor that the concept cannot be extended to full-spectral types of situation (i.e., to be used in conjunction with PLS and PCR calibrations). The calculation is the same, of course, and the meaning of the corresponding confidence levels and confidence intervals is the same. But as most of the respondents pointed out, the meaning of these results has to be put into context, as well as having ramifications that need to be considered. Some of these ramifications have been alluded to by the various respondents.

Probably the biggest difference in the application to qualitative analysis is simply the existence of constituent information. The calculations for Mahalanobis Distance are based solely on the spectral data, and do not take into account any consideration of the constituent information, whereas the presence of that information is indispensable, and also has ramifications. The existence of the constituent information means that, whereas with qualitative analysis all distances are equivalent, in a quantitative analysis situation, distances in different directions are not equivalent; some distances lie in "good" directions: directions in which large values mean an extension of the range of the calibration, while other directions a "bad" directions, since they represent interferences or extraneous effect deleterious to the accuracy of the model. So the question as to whether a value greater than three (or any number) is "good" depends on the direction that distance lies in. If in a good direction then large values of not only OK, they are actively beneficial. In bad directions they are actively deleterious. But whenever you condense the information from an entire data set into a single number (a "statistic", and that's what Mahalanobis Distance is) some information is lost; different statistics inherently lose different information, so single statistics should not be considered in isolation, they have to be considered together.

This is really just another way of saying the same that other have said on this forum, but it puts into perspective that fact that Mahalanobis Distances, like any other single statistic, cannot be considered in isolation but has to be put into the context of all the information about the nature and behavior of the data. So to this extent I have to disagree with Richard: spectral residuals are not necessarily "better" or more reliable, they are simply a different way to obtain information about the relationship of a sample to the whole of a data set, and I see them as complementary to Mahalanobis Distances.


1. Mark, H.; Analytical Chemistry; 59 (5), p.790-795 (1987)

2. Whitfield, R. G., Gerger, M. E., Sharp, R. L.; Applied Spectroscopy; 41 (7), p.1204-1213 (1987)
Top of pagePrevious messageNext messageBottom of page Link to this message

Christopher D. Brown
Posted on Tuesday, July 02, 2002 - 11:28 am:   

Well ... silly me. I had no idea that a Mahalinobis distance > 3 was an active heuristic in outlier diagnosis.

As Howard and others have alluded to, the original formulation of P.C. Mahalinobis was for class membership questions, in which the class population was assumed to follow multivariate normality (and it is interesting to note that this criteria is seldom addressed in qualitative applications in chemistry).

The Mahalinobis distance is one tool for outlier diagnosis, although in the context of a generalized linear model, Mahalinobis distance is not the least bit theoretically indicative of an "outlier" (because the plane/hyperplane of the model extends infinitely). On the calibration side of things, it IS indicative of influence, leverage or "potential", as Cook and Weisberg prefer to call it. That is, a sample with a large Mahalinobis distance has the _potential_ to highly influence the parameter estimates (regression vector). It won't _necessarily_ have ANY influence on the parameter estimates, but it has the POTENTIAL to do so. In fact, throwing away a sample with a very large Mahalinobis distance has the potential really HURT your figures of merit.

The other side of the coin is the validation phase, where Mahalinobis distance is available as an outlier diagnostic. In this case you could use it to say that the sample in question is spectrally distant from the center of your model (which can be risky because of model bias), but if you're pretty confident that your model is unbiased and valid then the M-distance isn't a big consideration. So in short, one just has to keep in mind that in the real world when models aren't perfectly valid, that the M-distance is a measure of the probability of an acceptable prediction error, and not the absolutel value of a prediction residual.

Again on the calibration side of things (personal preference) I find Cook's distance is a much more quantitative measure of "weird" samples, because it combines the influence of the sample (m-distance, leverage, what-have-you) with the lack of fit in the property of interest. If a sample has an elevated Cook's distance you can definately toss it because you know its inclusion will markedly affect your regression parameters.

Another topic that has gone unmentioned in this discussion is the dependence of Mahalinobis distance on the rank of the model. If you're working with factor-based methods, the M-distance for a sample in a p-factor model will be less than (or equal to) the M-distance for the same sample on a p+1-factor model. In this case how can an MD<3 be a fully transportable heuristic??

Regards,
Christopher
Top of pagePrevious messageNext messageBottom of page Link to this message

Christopher D. Brown
Posted on Tuesday, July 02, 2002 - 11:33 am:   

Oh ... should have also thrown out the comment based on Richard's comments. I'm presuming that outlier picking in calibration-phase is done using cross-validation? If you just looked at the pure Mahalinobis distances or spectral residuals of the _fit_ you'd be looking through rose-coloured glasses, and missing the whole boat on measures of influence.

Regards,
Christopher
Top of pagePrevious messageNext messageBottom of page Link to this message

yonll
Posted on Wednesday, August 14, 2002 - 11:29 am:   

I had the similar experience as Richard's.

When conducting 2-class classification of samples based on NIR spectra, I found that (1) if samples were included in calibration sets, generally M-distances were much smaller than 3 and then both M-distance and spectral residual could be used for the classification; (2) if samples were measured in different dates and excluded from the calibration ets, the M-distance would be much greater than 3. Therefore, consideration of M-distance probably depends on the needs.

Best wishes,
Yong
Top of pagePrevious messageNext messageBottom of page Link to this message

linda
Posted on Tuesday, April 27, 2004 - 7:32 am:   

I would like know the formula to calculate the distance of Mahalanobis?
I'm working with winISI on data spectrum and i want to know how this software calculates H distance.
Thank you,
linda
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Tuesday, April 27, 2004 - 8:33 am:   

Linda - you can find the formula in the 1985 Mark and Tunnell article in Anal. Chem. that Peter Tillman gave the reference for in the second message in this thread. I'll try to condense it for you here, in pseudo computer code:

D = sqrt((X-Xbar) M (X-Xbar)')

Where:

X represents a vector for one sample's data (let's call it the absorbance at a given wavelength, although it could be the PC scores or any other multivariate data)

Xbar represents the vector of mean values of that data for each variable (i.e., the mean absorabnce at each wavelength)

M represents the matrix inverse of the variance-covariance matrix of the data. M has to be computed from a data set containing many readings, although once it is computed, D can be calculated for individual samples.

The prime on the second (X-Xbar) term represents the transpose of (X-Xbar). Before Richard gets up in arms, the question of whether the first or second one of these should carry the prime depends on whether the data vector is organized row-wise or column-wise.

BTW, on reviewing the comments in this thread, I noticed the values that Richard had in his list. There were actually very few of them that had small values, none were less than 1 and only one was less than 2. For such a small set you can't be sure, but it's likely that these values were NOT from a Multivariate Normal Distribution (MND). Even the guideline value of three is based on the assumption that the underlying data is MND, much less the more precise values that Whitfield computed.

A good data set for calibration (i.e., one where you have good representation of all the variability of the underlying samples), especially one following an experimental design, can be expected to behave this way. So while that's good for calibration, it's not necessarily good for using Mahal. Dist. for setting the limits on samples that should be selected (or rejected).

Howard

\o/
/_\

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.