Discriminant analysis & H score distr... Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » Discriminant analysis & H score distributions « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Martin Henery
Posted on Monday, November 15, 2004 - 12:31 am:   

Hi there
I have been using NIR discriminant functions (WinISI) to try to separate two populations. The separation is not dramatic so I plotted the H-scores for all the sample spectra in a histogram to look at the shape of the distribution. This leads me to my question.... What is the shape of the distribution of H-scores for a discriminant anlysis if the two groups of samples are able to be distinguished reasonably effectively? Is it bimodal around a value of 1.5 (no spearation)? What is the shape when separation is poor?

I haven't seen any examples of the results of this type of analysis in the literature or how to statistically verify whether the separation of groups is "good" or not. Can anyone suggest where to look or offer experience?

Martin
Top of pagePrevious messageNext messageBottom of page Link to this message

Mark Westerhaus (Mowisi)
Posted on Tuesday, November 16, 2004 - 5:21 am:   

Hi,

The distribution of H values for two populations depends on the relative number of samples as well as the separation of the populations. If the two populations are different, but are represented by 2000 samples for one group and 2 samples for the other, the distribution of H values will look similar to that of the 2000 alone with 2 outliers. If the number of samples are equal, then the grand average spectrum should be between the two groups, and there should be no small H values (since H is the distance from a sample spectrum to the average spectrum in score space, scaled so that the average H is 1.0). So as the separation of the two populations increases, the H distribution becomes more concentrated at 1.0.

Mark Westerhaus
Top of pagePrevious messageNext messageBottom of page Link to this message

Martin Henery
Posted on Tuesday, November 30, 2004 - 7:52 pm:   

Mark,
Thanks for your reply. I have similar sized sample groups (sort of.. 241 versus 158 or 334) but as I alluded to in my opening query, in WinISI the values it assigns to each sample in a dicriminant analysis(which I assumed are H-scores, Is this correct?)range from 0-3. Therefore according to this software's scheme as the separation of the population increases the scores cluster more around 2 for one of the groups (around 1 conversely for the other group) and a value of 1.5 rather than 0 presumably is where the average spectrum sits. Therefore is it correct to assume that small H values (distance from the mean spectra) instead of being close to zero are scaled to be close to 1.5?

Thanks for your help

Martin
Top of pagePrevious messageNext messageBottom of page Link to this message

Dennis Karl (Dennisk)
Posted on Sunday, December 12, 2004 - 7:57 pm:   

I too am getting into Discriminant analysis. Being a mere chemist with limited ability to 'visualise' statistical pictures and mathematical concepts I struggle to 'see' things in formulas!! Or do I think there is more to 'see' than there really is?
I am using WinISI V1.50.
I need help to understand what the meaning of the figures "Misses" "Uncertains" and "Hits" after I have done a Discriminant exercise.
I have 312 samples in a series of 9x *.cal files.For one of the series we suspect that the samples are different, or have a different balance of components in them to cause them to result in poor growth in livestock. However in doing a normal prediction of moisture and protein from our calibration equation we do not have problems indicated by high "H" values.
When I do the calculation under Discriminant Analysis I get 8 of the series showing a very high percentage of "hits" but the suspect series shows 100% "misses". The "uncertains" wobble about a bit.
The results are reasonably consistant using 1st and 2nd derivative and with and with out scatter correction using MSC.
Am I correct in assuming the result is saying the suspect series is indeed 'different' from the others? Or is it a bit more complicated than that?

I hope to understand more on this after Tom Fearns workshop in Auckland next year but I cannot wait that long to understand this part.

Dennis Karl
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (Td)
Posted on Monday, December 13, 2004 - 2:51 pm:   

Hi Dennis,

I am not a WinISI user so we need Mark Westerhaus to tell us what sort of discrimination is being used.
Over to you Mark!
Best wishes,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

Mark Westerhaus (Mowisi)
Posted on Monday, December 13, 2004 - 7:54 pm:   

To Martin Henery,

The WinISI discrim program uses two block pls to separate groups by creating one indicator variable for each group and assigning a 2 for the correct group assignment and 1 otherwise. The predicted values are just pls2 predictions based on 1's and 2's as "lab" data. In a perfect prediction, you would see one 2 and the rest would be 1's. In the real world, we just pick the highest number to indicate the group assignment.

To Dennis Karl,

Hits are obtained when the we evaluate the training set through cross validation and the highest predicted value does occur in the correct group. If the highest predicted value and second highest predicted value are too close (relative to the pls2 calibration secv), the result is called uncertain. If the highest number is associated with the wrong goup, the result is a miss. In your example, since the suspect group has 100% misses, it cannot be discrimated from the other groups

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.