Reducing Variable Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Reducing Variable « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Rudi Heryanto
Posted on Thursday, April 29, 2004 - 9:44 pm:   

Dear Sir,
I am studying for classification and clustering microbe based on FTIR/NIr spectra data from microbe intact cells . I tried to use of transmitance/absorbance of whole wavelength number of sepectra data for that purpose. But I cannot do it with my software (SAS & Minitab) because they say too many variable than observation. So, I would like to ask you how to reduce variable?. I want reduced spectra data is represntation of origin spectra. I have tried using my macro in SAS(sound like stepwise regeression), but the result give variable in vary.

Thank you
Sincerely yours
[email protected]
Top of pagePrevious messageNext messageBottom of page Link to this message

Michel Coene (Michel)
Posted on Friday, April 30, 2004 - 2:40 am:   

I would suggest you start by having a good look at the spectra. Some areas might show a lot of noise, some might look completely identical for each kind of microbe. You could start by throwing these un-interesting areas away. Another simple method is to replace two adjacent datapoints by their average. You will lose some detail however. There are methods for selecting wavelengths, and there are methods for doing discrimination with more variables then observations, but you might have to invest in extra software. You could split up your spectrum in X slices, and do the test on each slice independently. If your method has a way of showing "interesting" variables, you can then combine the most interesting variables of all X experiments into a single new test. This is not perfect either, as some wavelengths only become predictive when combined with some other. I don't know SAS and Minitab, but maybe you can write a macro which picks out random wavelength combinations and grades them on performance?
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Friday, April 30, 2004 - 9:21 am:   

Rudi - this limitation on the number of variables that the general statistical software programs have has existed since the beginning. The manufacturers of those software packages generally provided many advanced capabilities, but did not consider that people would want to analyze several thousand variables (wavelengths). There are, however, several possible solutions (besides the one Michael suggested, which is good advice and can be applied in any case).

First, all instrument manufacturers provide software to analyze the data from their instruments, and so can deal with as many as they measure. It seem that in your case, the software only provides the capability of performing quantitative analysis. Such software can also be used to do a limited amount of cluster (qualitative) analysis; the limitation is that there be only two clusters (groups) of samples. If you can meaningfully divide your data into two
clusters, then all you need to do is to create a new "constituent" and enter a value of 1 (Unity) for all the samples in one group, and -1 for all the samples in the other group. Then the quantitative analysis will "predict" which cluster each sample belongs to.

Second, there are several third-party software vendors (besides the instrument manufcturers) that provide software designed for analyzing spectral data. These all can deal with essentially unlimited numbers of variables (wavelengths). You can try the following:

CAMO (http://www.camo.com)
Infometrix (http://www.infometrix.com)
Thermo/Galactic (http://www.galactic.com)
The Near Infrared Research Corp. (http://www.nearinfrared.com)
Axiom Corp. ({http://www.goaxiom.com)


Third, the latest JNIRS carried an article by Jim Reeves about the use of SAS to analyze spectral data. Jim was using the latest version (V8.02). I asked Jim about the limitation on the number of variables and he told me that this version could handle about 30,000 variables, and that future versions should have the limitation removed entirely. So if you want to use SAS then you may need to upgrade your program.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Christopher Brown
Posted on Friday, April 30, 2004 - 9:47 am:   

Rudi,

Minitab is not amenable to spectral data analysis. It is much more geared to the design and analysis of experiments. SAS however should handle your problem easily. If you have more variables than observations you'll need to delve into the SAS manuals (which are extremely well written). They have procedures there for principal components analysis, partial least-squares, and variable downselection by a pluthera of techniques.

~ Chris
Top of pagePrevious messageNext messageBottom of page Link to this message

Rudi Heryanto
Posted on Saturday, May 01, 2004 - 12:17 am:   

Thank you to all for valuabel suggestions. I use minitab v11 and SAS v6. I don't know if my institution want to up grade the version.


Regards

Rudi

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.