10,000 spectra through discrimination? Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » 10,000 spectra through discrimination? « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

RJWilkie
Posted on Monday, September 24, 2001 - 11:16 pm:   

Dear Colleagues,

I have become keen to use discrimination to "group" several products. So far, 1,500 spectra has not been adequate in obtaining optimal differentiation, and I feel to include the necessary spectral variations I am likely to encounter, I will need to run around 10,000 of each product spectra.

Does anyone know of a software package that can do this? Any help would be great!

Regards,

Russell Wilkie
Top of pagePrevious messageNext messageBottom of page Link to this message

David Hopkins (Hopkins)
Posted on Tuesday, September 25, 2001 - 6:46 am:   

Greetings Russell,

It is nice to hear from you, after our meeting at the conference in Korea. I will look forward to hearing other responses about appropriate software packages. I have never had to use so many spectra, so I cannot speak to that issue. I would expect that Pirouette and Unscrambler can handle the load.

The issue I would raise is that it may be more important to consider how long it will take you to obtain those 10,000 additional scans, and how you will know that you have enough. Is it possible to obtain fewer scans through a designed experimental series to sample all the variables you can identify? Then you might be able to obtain the spectra in a reasonable time, and include the variation in your models, so that you would know that you have done the best job possible to obtain your discrimination? I recognize that some variability is difficult to plan or obtain, but variables like temperature and humidity might be easiest to start with, and are often the major ones affecting applications. Hope this helps.

Regards,
Dave Hopkins
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Tuesday, September 25, 2001 - 7:32 am:   

Russel - I have some other thoughts, in addition to what Dave has recommended:

If you're not getting enough discrimination now, just collecting more spectra of the same type may not solve the problem. You might want to go into a "troubleshooting" mode, so try and determine why there is not enough discrmination. For example, examine the materials that are being erronously identified, and determine, for example, if they are being (erroneously) identified as a similar material. If that is the case, you might want to examine the spectra and see if the two materials have such similar spectra that they cannot be distinguished.

Sometimes subclassification is necessary. For example, if A and B cannot be distinguished from each other using the primary classification model, you may be able to reliably decide that an unknown is (A or B). Having done that, you may be able to use a separate model to distinguish A from B, as long as that model doesn't have to identify anything else (which it won't, since you'd already know that the unknown is either A or B).

So some scientific detective work is in order.

As far as software: there are some packages that don't try to keep all the data in memory at once, but read it from the disk, and therefore have no limitations on the number of samples. In fact, I sell one such, although it has other limitations. But before you go that route, I recommend you follow the advice above, and see if you can't tell WHY you are not getting satisfactory discrimination.

Howard
Top of pagePrevious messageNext messageBottom of page Link to this message

David Russell (Russell)
Posted on Tuesday, September 25, 2001 - 8:00 am:   

Historically, Pirouette has had more features for pattern recognition than Unscrambler. However, I can't comment on its ability to handle large sample sets. The thought of using thousands of samples is very foreign to those of us in industry who have trouble getting a hundred. Expanding on Howard's suggestion, using PCA as a pre-screening tool to weed out "identical" samples would probably be a valuable step.
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Tuesday, September 25, 2001 - 8:48 am:   

This is a bit off the topic, but Dave Russel's comments bring to mind the fact that, while what he says about sample limitations for NIR calibrations in industry is generally true, as far as I know, the record for the number of samples in a calibration (83,000) is still held by Rocco diFoggio, who did this work at CORE Laboratories. Of course, that was for quantitative analysis. I'm not aware if anybody knows the record for qualitative work.

Howard
Top of pagePrevious messageNext messageBottom of page Link to this message

RJWilkie
Posted on Tuesday, September 25, 2001 - 4:27 pm:   

Greetings Colleagues,

DAVID
Thanks for the advice and I shall take it all onboard. We do however already have 10,000 spectra of each variety available for analysis, so we have already done the "hard yards" on this over the past few months. We need to be able to utilise this available data and currently we cannot in Grams32. As you know I am releatively new to this field, so any advice is great. Regarding temperature control, I feel we have that relatively under control using airconditioning and long equilibration times, however we would definitely need to do something about our humidity control. Since we are attempting to differentiate many products within the same scientific genus, we need all the help we can get in elimination of environmental fluctuations!
Regarding variability, we have attempted to incorporate regional, fruit ripening and even size, and seasonal differences into our data sets. To do this we have had to sample every available farm growing for example a particular variety of apple. This is the reason for such a huge set! Applying SGII 5 point to the data, only brings M Distances "closer" and differentiation is immediately lost. It seems as though we need to use raw data to obtain best identification, with no pretreatment.

HOWARD
After hearing Gerard Downey's presentation in Korea on subclassification of meats, we had thought that perhaps it may just work for us in our fruit work. Apple spectra are very, very similar among varieties, but also tend to fluctuate largely with baseline shifts and sometimes large changes in the shape of dips/peaks, primarily in the visible region and chlorophyll regions. After seeking to group varieties in a primary classification model, I questioned myself of how to do this? Obviously my visual grouping of green apples together and red apples together is a "no no" as I found later that some green apple varieties were spectrally more similar to pinklady apples (which are extremely pink!). Anyway, I found that I then had trouble separating the primary groups of apples from each other, as primary groups contained both red and green apples in each set. Thanks for the advice Howard, and I am interested in finding our more about your software and its limitations.

Thanks guys, any thoughts on that software? I am currently under email chats with CAMO and Unscrambler about the limitations of the package, but I am yet to find a package that will handle that many spectra.


Regards,

Russell Wilkie
Top of pagePrevious messageNext messageBottom of page Link to this message

RJWilkie
Posted on Tuesday, September 25, 2001 - 4:39 pm:   

Dear Colleagues,

I had another thought regarding our attempt to differentiate these beasties using less samples as proposed by David Hopkins. If I analyse the factor information provided by Grams32, I find that Factor #1 (accounting for around 85-90% spectral variation) is very similar among two red apple varieties. Factor #2 and #3 are even more similar to each other than #1, but Factor #4 is totally different. Is there any reason why I couldn't use only Factor #4 to differentiate both these apple varieties, rather than a combination of Factors #1 to #4? I am wondering though, if I could do this, would I really want to? This is a relatively new field for me, so I may be barking up the wrong tree with this one. Any thoughts?


Regards,

Russell Wilkie
Top of pagePrevious messageNext messageBottom of page Link to this message

LIU Yang
Posted on Wednesday, September 26, 2001 - 5:15 am:   

Greetings

I'm on the way of discriminating tobaccos. There are not so many spectra to deal with. Only 100 for a test. But I have found the exactly the same thing with Russell:
The factor1 and factor4 show good discriminating ability. Here again is the factor4, the coincidence? As for the factor1, according to my experiences of tobacco analysis, it has closely relationship with water content, I think it's due to the strong absorbance of water in NIR. So i usually find that even in quantitative calibration, the factor1 of PCA does not informative as its variance (if you don't calibrate for water). While the PLS is much different.

As for pattern recognize, I'd like to ask a question. Can PLS be a good discrimination method?I use PLS to discriminating work of tobacco. Set 0,1 to represent the two classes. I find that in fact it can provides good result if the rank is proper,such as 4. But when i using the factor number selecting method which is used for quantitative calibration, (F test based on REV),the number of factor is too small(sometimes only the factor1 is selected, and the variance is very low,about 12%).

Is there something wrong when i do this? Should i choose another way to select factors or i made some mistakes when i apply PLS to do this?
Looking forward to help and comments.

LIU Yang
Top of pagePrevious messageNext messageBottom of page Link to this message

Ola Berntsson
Posted on Wednesday, September 26, 2001 - 5:41 am:   

Liu,

You use PCA to compress and sort the information in your data set. You can then use the scores of any or many factors to discriminate. If factor 4 happens to describe exactly the difference you look for - fine. Use it and don't bother about the rest. However, if you have three or more classes to discriminate you'll probably end up using two or more factors.

PLS can be used for discrimination as you suggest, PLS will hopefully guide the construction of factors so that they are more useful than PCA. But if the information you look for is really in your data, you'll find it by PCA too.
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, September 26, 2001 - 7:18 am:   

Russel - as was mentioned subsequently, the whole modelling process is very empirical, and the trick is to find what works, and then use it. So if you think that you need only some of the higher factors and leave out the lower ones, that's a very valid approach.

As for my software, I'd be more than happy to discuss it with you, but since it's a commercial topic, it's not appropriate for this forum. If you send me your e-mail address we can discuss it privately, off the list server. My e-address is:

[email protected]

Howard
Top of pagePrevious messageNext messageBottom of page Link to this message

Michel Coene (Michel)
Posted on Thursday, February 28, 2002 - 2:29 am:   

Most software packages (certainly Unscrambler and Pirouette) will not allow you to do a prediction on PC4 without throwing in 1-3 for free. Any reason for this?
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Sunday, March 03, 2002 - 6:50 am:   

Michel - I'd say shortsightedness on the part of the chemometrician, and laziness on the part of the programmer! When I wrote a PCA package while I was at Technicon, my software did have the capability to use only a specified set of PCs, so that proves that it is certainly possible to do it, and there is very little reason not to. 20 years ago you might have been able to claim memory limitations, but not any more. So it's simply a lack of understanding of the value of doing it. I've often wondered myself why none of the software packages available today provide that fairly useful capability.

Howard
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Sunday, March 03, 2002 - 6:54 am:   

Come to think of it, regarding Russell's original posting to this thread, have you tried transforming the data so as to try to enhance spectral differences, before applying the discrimination algorithm?

Howard
Top of pagePrevious messageNext messageBottom of page Link to this message

Charles (Carmont)
Posted on Thursday, March 03, 2005 - 5:34 am:   

I am using discriminat analysis by WinIsi software. I would like to ask a question to someone. What does high uncertainty factor o low uncertainty factor values means?.
Thanks

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.