Cluster analysis Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Cluster analysis « Previous Next »

  ClosedClosed: New threads not accepted on this page        
  Thread Last Poster Posts Pages Last Post
  ClosedClosed: New threads not accepted on this page        

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (dhopkins)
Advanced Member
Username: dhopkins

Post Number: 88
Registered: 10-2002
Posted on Tuesday, June 27, 2006 - 4:53 pm:   

Hi Nirmaniac11,

Okay, it took a bit of study, but I think I understand better now. I made a powerpoint of your plots, so I could flip through them and easily compare the plots and study the loadings.

Of course, the loadings are the same for the 2 validation data sets, as they provide the bases for comparing the spectra using the scores determined for each sample scan versus those loadings. The scores plots are showing that Set 1 and Set 2 are very similar to each other, but somewhat different from the calibration set. From the plots of the original scans for the 3 data sets, we can also see what the scores plots suggest, that there is a good deal of baseline variation and some slope differences suggesting light scattering between the samples. Note the similarity of the loading 1 and the general shape of the spectra.

I think my suggestion that the second loading is showing you differences in the chemical properties is good, but I don't think it is a water effect. Although we see water in the scans, you have removed the major water band region from the calibrations and loadings. I think the 2nd loading is showing band shape changes associated with the chemical nature of the differences in the samples. The derivatives are apparently second derivatives, and the loadings suggest subtle band shape changes in the higher wavelengths, and perhaps not so subtle changes in the lower wavelength region, where there appears to be a low intensity band in the spectra. It may be that the large differences in the scores on PC2 of sets 1 & 2 are due to an effect of the water band tails into your measurement region. I find it hard to believe the validation sets have a larger range in the expected chemical properties than the calibration set. Do you have moisture determinations on the samples?

I wonder if you would not get better results if you continue the scans to 4000 cm-1? I expect another band beyond the 4400 cm-1 band you show. Because of the large amount of scattering seen in the spectra, I think MSC might be useful in your calibrations and in the scores plots. Also, I wonder if the lower wavelength region is adding all that much to your analysis. Can you obtain as good calibrations omitting the lower region as including it? You might find that the calibrations would be less sensitive to particle size variations that way too.

I hope this helps. It is not easy to interpret loadings and scores plots, and this is my best attempt.

Regards,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

ksharghi (nirmaniac11)
New member
Username: nirmaniac11

Post Number: 4
Registered: 6-2006
Posted on Monday, June 26, 2006 - 7:36 pm:   

The PC scores plots represent quantitative calibration models with calibration points (circles) and validation points (or samples, + signs). The PC 1 component is directly correlated to sample concentration for both calibration and validation samples. The concentration range is from 100% to 0% from left to right on the PC 1 axis. All of the validation samples for both sets fall within a narrow concentration range (around 60%), which is why they seem to be clustering with respect to the PC 1 axis, and there is less variability observed for this component.

The calibration points used for both sample sets have given accurate prediction performance for more than 500+ samples. The SEC and SEP values for the calibration model are 2.3 and 2.5 respectively. Until recently, I haven�t seen many entire validation data sets displace from the calibration samples along the PC 2 axis � usually they are mixed in randomly with the calibration samples. I am unsure what the PC 2 variability is and how I might go about identifying it. If it were due to water or moisture, what would I look for in the spectra or loadings?

As Mark pointed out in his comments, one of the validation samples is far removed from the rest of the validation samples along the PC 2 axis in the PC scores plot. This sample is in fact a �true� outlier with a concentration difference of +10% comparing the NIR predicted value to the actual value verified by the primary method. Sometimes, however, I observe for individual validation samples that have large variability in the PC 2 component on a PC scores plot still can have accurate NIR predicted values making them a �false� outlier. I am curious as to why this is?

Included attachments in this posting are in order:

1) PC 1 and PC 2 loadings plot, derivatized for sample set 1
2) PC 1 and PC 2 loadings plot, unprocessed for sample set 1
3) Original scans of sample set 1 spectra

4) PC 1 and PC 2 loadings plot, derivatized for sample set 2
5) PC 1 and PC 2 loadings plot, unprocessed for sample set 2
6) Original scans of sample set 2 spectra

7) Original scans of calibration set spectra (same calibration spectra used for both PC scores plots)

In my original posting, the PC scores plot on top is from sample set 1 data, and the bottom plot is from sample set 2 data.

My calibrations regions of interest are 6200-5800 and 4800-4300 cm-1.

loadings spectra set 1 - derivatizedloadings spectra set 1 - unprocessedset 1 validation samples - unprocessedloadings spectra set 2 - derivatizedloadings spectra set 2 -  unprocessedset 2 validation samples - unprocessedcalibration samples - unprocessed
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (dhopkins)
Intermediate Member
Username: dhopkins

Post Number: 87
Registered: 10-2002
Posted on Sunday, June 25, 2006 - 2:34 pm:   

Hi Nirmaniac11,

Some other thoughts about your question. It seems to me that the most important question is, do the statistics for the validation set agree with the SEC and SECV of the calibration set? If so, then the differences in the scores plots need not be very disturbing.

If, however, the scores plots are a symptom that the calibration does not apply well to the validation set, then you need to combine the calibration and validation data and split them into a new calibration and new validation set, that represent the total variation. A new calibration should then be tested on the validaton set, and if possible, on a new set of data that you collect.

Hope this helps.

Regards,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (dhopkins)
Member
Username: dhopkins

Post Number: 86
Registered: 10-2002
Posted on Saturday, June 24, 2006 - 2:23 am:   

Hi Nirmaniac11,

I agree with Howard that you need to look at the scans and determine what is causing the differences reflected in the scores plots. The loadings can help you in this.

And don't miss another feature, the Validation Set is much less variable in the PC1 direction than the Calibration Set.

If you have the usual case that the Loading for PC1 looks similar to the average spectrum for the Calibration samples, the variation would suggest that there is more variation in the scattering properties of the Calibration samples than in the Validation samples. If the second PC is more concerned with chemical differences (the big one will be water!), the scores may be telling you that there is more variation in this quality in the Validation Samples as compared with the Calibration Set.

Can you show us the original scans of the 2 sets, either in 2 plots at the same scale, or in 2 different colors on the same plot? Plots of the the loadings for PC1 and PC2 would also be useful in providing more clues for interpreting your observations.

Best wishes,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 25
Registered: 9-2001
Posted on Friday, June 23, 2006 - 8:28 pm:   

Nirmaniac11 - It's not clear what the two plots represent, but in any case I'd be more concerned about the fact that in both of them, the cluster representing the validation samples is displaced from the calibration samples along the PC2 axis. Also the fact that in the lower plot, one of the validation samples is far removed from the rest of the validation samples along the PC2 axis. I'd say that would qualify as an "outlier" by anyone's definition.

You should look for the spectral differences between the validation and calibration samples, to see why there is this consistent offset; also how the spectrum of the outlier sample differs from the rest.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

ksharghi (nirmaniac11)
New member
Username: nirmaniac11

Post Number: 3
Registered: 6-2006
Posted on Friday, June 23, 2006 - 6:49 pm:   

I am observing for PC Scores plots generated from calibration and validation samples that the PC 2 validation sample variance is grouped together compared to the calibration samples where the PC 2 variance is more random. I am wondering if this observed �clustering� in PC 2 variance for validation samples is telling me something about the calibration model i.e. that it is no longer representative of PC 2 variation in my new samples. I have included some PC Scores plots below that exemplify the observed �clustering.�

Note: circles are calibration points, + signs are validation points.

pc plots
Top of pagePrevious messageNext messageBottom of page Link to this message

ChangKyoo Yoo (Ckyoo)
Posted on Wednesday, February 07, 2001 - 9:32 pm:   

Dear sir
I have a feature selection or clutering problem.
My data is 18 observations, 5300 variables, which all observations are classified into two classes.
I wanna clasify the two classes, analyze the varialbes, variable relationship,

I used several methods such as PCA, clutering.
But I cannot find good method because the variables (feature) is too large ("the curse of dimentionality")

Following is my question.
First, I wanna get the special classication techniques.

Second, it is required ofthe correlation between variables and variable(feature) selection method or variable clustering because of "the curse of dimentionality".

I'm searching for specific method or paper, reference.

Please give me a idea or implication.

Bye
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Sunday, February 11, 2001 - 9:31 am:   

This may not be of too much help, but it seems that what you need to do first of all is to find out whether or not the data in fact contains any information that will provide a handle to discrimination capability. If the data represent spectral wavelengths, for example, are there any places in the spectrum where there are consistent differences between the spectra of the two classes of interest? If so it should be relatively easy to perform the classification; just ignore all the other wavelengths and just use the data at those wavelengths where the discrimination capability exists. You could use Mahalanobis distances to do this, for example. Since I don't want to get commercial on the discussion group, if you contact me directly at: [email protected] I could recommend software that will do that.

If you can't find any places in the spectrum with that property, it may be that your data simply isn't sufficiently sensitive to the class distinctions to allow using it to do your classification. It's hard to comment further without knowing what the data represent.

Howard Mark
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (Td)
Posted on Sunday, February 11, 2001 - 4:11 pm:   

PCA is not really a clustering method! It is finding directions of maximum variation in the data. This may or may not been seen as clusters when you do scatter plots.
Put the PC scores into canonical variates analysis (CVA).
Top of pagePrevious messageNext messageBottom of page Link to this message

gaoshouguo (Gaoshouguo)
Posted on Thursday, June 21, 2001 - 3:20 am:   

I agree with Tony Davies, PCA and NLM are not really a clustering method, I have used ANN to extract Feature for identification of artificial bezor and natural bezor, the result is satisfactory.
P.S.
We have built a NIR modeling center on internet with CGI and JAVA(including a lot of chemometrics methods). if you have interest, We can work together.
Top of pagePrevious messageNext messageBottom of page Link to this message

magali Laasonen (Laasonen)
Posted on Tuesday, December 11, 2001 - 7:38 am:   

Dear all,
I have 2 questions to ask concerning hierarchical trees.
- Could somebody give me informations (references,definition,applications)on the similarity measurement method called the "Pearson metric" in the SYSTAT statistical software?
- Can this method be used for the same applications than the Euclidian distance method?
Thanks a lot for your advices!
Magali Laasonen
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Tuesday, December 11, 2001 - 8:38 am:   

Magali - you might also want to post these questions to the chemometrics disussion group. That discussion group is at:

[email protected]

If you send a message there you'll probably get an error message as a reply stating that you're not a registered member of the discussion group, but I think you'll also get instructions as to how to join that group.

If you have a problem, contact me and I'll check out exactly who you need to contact to join that discussion group.

Howard
Top of pagePrevious messageNext messageBottom of page Link to this message

TD1
Posted on Tuesday, December 11, 2001 - 9:06 am:   

Hello Magali,

The Pearson option is defined in my (DOS!!) version of SYSTAT as 1-Pi,j where P is the Pearson product-moment correlation between objects i and j. This standardises the variables and can be used in place of Euclidean distance. It is especially useful when there are large scale differences between the different variables.

Best wishes,

Tony Davies
Top of pagePrevious messageNext messageBottom of page Link to this message

venkat
Posted on Wednesday, February 04, 2004 - 8:20 am:   

I am working on a project sorting of plastics (PP,PET,PVC,PS,PA,PE) by NIR spectral analysis.I have planned the following method.
1. Base line correction
2. SNV of data
3. Corvaraince and PCA
4. Peak mehtod and second dertivative method for sigulation(Indetification)
Is it coreect? If there is any FFT method for identification.I seek help from one all
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Wednesday, February 04, 2004 - 8:54 am:   

Venkat,

That sounds like a good plan. It is very important to have a good baseline correction before the SNV. In fact, the second derivative is perhaps the best baseline correction possible, so I would recommend trying the 2Der before the SNV.

There are many convolution functions you can use for the derivatives, and you will want to optimize the method. FFT techniques can also produce derivatives and reduce noise, but I have never seen the methods compared. Perhaps you should evaluate it. I have no idea how the methods would compare in speed, if this is an issue. Present computers are much faster than when FFT was introduced for speed.

Good luck in your project,
Dave Hopkins
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, February 04, 2004 - 8:57 am:   

Venkat - you may want to read "Qualitative Analysis of Thin-film Polymer Samples Using Near-IR Spectroscopy for Recycling Applications"; Spectroscopy; 9(1), p.27-32 (Jan., 1994). This paper presents spectra of several different types of plastics, including, I believe, the ones on your list. The spectra of the different types of plastics are so different (except for PE and PP) that it probably won't be necessary to do much, if any, spectral preprocessing in order to distinguish them.

PE and PP spectra are similar, but careful measurement of the wavelengths of one or two key bands should allow you to tell the difference. Second (or perhaps first derivative, and look for the wavelength where the derivative crosses zero) should help with this. Otherwise, I don't think any of those transformations will be necessary.

Also, you should reread Tony Davies' message above in this thread regarding the relation between PCA and discrimination.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

NIRman
Posted on Wednesday, February 04, 2004 - 9:27 am:   

Hello Venkat,

If you send me your e-mail address, I�ll contact you on the issue of sorting plastics. Also, chapter 27 in the Handbook of Near-Infrared Analysis deals with plastics analysis at two national laboratories. Part A uses neural networks and part B uses 1st and 2nd derivatives obtained with a standard NIR spectrophotometer equipped with a remote reflectance probe.

Don Burns
Top of pagePrevious messageNext messageBottom of page Link to this message

Gabi Levin
Posted on Wednesday, February 04, 2004 - 12:21 pm:   

Hello Venkat,

The issue of sorting plastics is very old one. I have done a very tough job of differentiating nylon 6 from 6,6 almost identical compound, with very small differences in the spectrum. The differentiation is 100% successful on carpet residue, with all sorts of color additions, all sorts of grades, you name it, it all shows in carpets. This diferentiation reuqires a different approach, and none of the tools you mentioned will successfly acomplish it routinely with all the variabilities involved in carpets. If your plastic sources do not vary so much, and you do not need to distinguish the two nylon types, you don't need such heavy tools. The differentiation between the polymers you mentioned is very easy, and you can do it with Mahalanobis distances using the first derivative, it is more than sufficient. We were also able to do a quantitative calibration for carpet materials that had mixed polymers, and provide the people that use it with a way to determine the percent of PP mixed with nylon, or PE, etc.

If you have transparent polymers, in films it takes a little different approach of how to collect the spectrum, but not the differentiation.

If you send me an e-mail with more details on the application, I will be able to send to you data on the differentiation of the nylons. The application is being used routinely with our hand held Luminar 5030, the operator just goes to a carpet, touches it with the "nose" of the spectrometer and presses a button, in three second he has the identification dispalyed on the LCD. This spectrometer has no fibers that may break with time, it is a diffuse reflectance, and it can operate on a battery.
The same procedure is being used for identification of the backing material, many times being PVC, many times other polyemrs.

Thanks, Gabi Levin
my e-mail: [email protected]
Top of pagePrevious messageNext messageBottom of page Link to this message

venkat
Posted on Tuesday, February 24, 2004 - 1:55 am:   

Why in (SAM Spectral Angle Mapping )cosdeg. is used .? I feel Sin will give not only direction but the charcter of Unknow spectra with standars. Is it.
Top of pagePrevious messageNext messageBottom of page Link to this message

venkat
Posted on Tuesday, February 24, 2004 - 2:21 am:   

My dear NIR friends I am insearch of NIR spectral data (1000 to 1800 nnm) particularly for PP,PVC,PE,PET PAM and PS. If any body have source and data kindly post me. I want to test the new algorithms /approach (based on Davis publication). I will acknowldege you.
Top of pagePrevious messageNext messageBottom of page Link to this message

Gabi Levin
Posted on Tuesday, February 24, 2004 - 3:29 am:   

Venkat,

Please send me your e-mail, I will see what we have in our drawers from all the work we have over some years. I know we have for PP,PE, PVC, Nylon 6 Nylon 6,6, PET and PVC, (diffuse refelctance for sure) I am not sure about PAM, because I am not familiar with the acronym, please enlighten me on PAM.

My e-mail is [email protected]
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Tuesday, February 24, 2004 - 8:16 am:   

Venkat - If Gabi's data does not include all the plastics of interest to you, then you may also wish to check out the Spectroscopy article mentioned in an earlier message in this thread: "Qualitative Analysis of Thin-film Polymer Samples Using Near-IR Spectroscopy for Recycling Applications"; Spectroscopy; 9(1), p.27-32 (Jan., 1994). It contains spectra of all the materials you mention, with the exception of PAM. Please inform all of us what PAM stands for.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

NIRman
Posted on Tuesday, February 24, 2004 - 9:40 am:   

Venkat,

Send me your e-mail address and I'll be happy to share the many spectra I have of the plastics in which you're interested. They are included in a chapter called "Plastics Analysis at Two National Laboratories. Part B: Characterization of Plastic and Rubber Waste in a Hot Glove Box."

Don Burns
Top of pagePrevious messageNext messageBottom of page Link to this message

NIRman
Posted on Tuesday, February 24, 2004 - 9:45 am:   

Venkat,

Sorry, my e-mail address didn't seem to get printed. It's [email protected]

Don Burns
Top of pagePrevious messageNext messageBottom of page Link to this message

venkatarman (Venkynir)
Posted on Wednesday, March 17, 2004 - 9:22 am:   

Why people are using COSdeg in spectral mapping ?
why not sin deg. I have worked on Sin deg. it gives quality and quantity information.

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.