Pretreatment vs. raw data Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » Pretreatment vs. raw data « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Kamdzhilov Yavor (nir_greenhorn)
New member
Username: nir_greenhorn

Post Number: 2
Registered: 5-2007
Posted on Tuesday, May 15, 2007 - 3:27 am:   

Dear Venkynir, Don, Tony, Dongsheng,

Thanks a lot for your answers!

I was a bit skeptical about receiving any responses after I had seen how long my
questions really were. Thanks for you time!

I will try to follow your precious advice, full around with the data and keep the forum posted.

I must join Liam Kelley by saying that the forum is a wonderful place for receiving support and discussing issues around NIR. You really have to be proud of the contribution to the NIR community you are making here.

I will be coming to NIR2007 and hope to meet some of you in person!

With best wishes,

Yavor
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (td)
Moderator
Username: td

Post Number: 152
Registered: 1-2001
Posted on Thursday, May 10, 2007 - 10:43 am:   

Dear Kamdzhilov,

For a new-comer you are doing very well!

My answers to your questions:

1) In general most of the pre-treatments were developed to reduce the influence of physical effects but none will be totally effective. Combinations do help but you have to be careful because small changes in your tablets may result in large changes in the treated data.

2)Yes MSC and SNV are designed to suppress physical effect and are often more effective than general methods. But they are often combined with derivatives.

3)You are correct but you need to be careful. You would not want a method that gave a result for chemical identification based only on physical data.

4)You are not comparing the same things! When you take the raw data a very large proportion of the variance is due to the physical effects and much of this can be modelled by the first two components. When you pre-treat the data most of the physical variance has been removed and you need more PCs to achieve the same percent explained. PCA can be used as a pre-treatment; you just leave out the first one or two PCs. You need to use far more of the PCs in your discrimination. I would suggest at least 10 but perhaps 15 or even 20 may be OK. Have a look at these later PCs by plotting their weights (loadings) you may be surprised to see that even the tenth PC has structure; it is not just noise. If these higher PCs do not have any useful information then they will be ignored by the discrimination method.

Some important points
Try to understand the spectroscopy not just use the method that gives the �best� result.

Make sure that you have different batches of tablets with ingredients from different suppliers.

YOU MUST validate the method by testing with data from tablets not used in the calibration.

Sources of information:
Try searching previous answers in this forum. There is a lot of good advice in the archive.

Could you go to the ICNIRS conference in Sweden (Umea, Sweden from Sunday 17 June 2007 to Thursday 21 June 2007)? You will be able to listen and talk with many experts and you will make friends in the international NIR community.
www.nir2007.com

You might find some of my articles useful (many written with Tom Fearn) in the Tony Davies column of Spectroscopy Europe:
www.spectroscopyeurope.com/td_col.html

Best wishes,

Tony
PS Others have got in while I was writing my reply! It may be repetative; there is an unusally high level of agreement!
Top of pagePrevious messageNext messageBottom of page Link to this message

Donald J Dahm (djdahm)
Junior Member
Username: djdahm

Post Number: 9
Registered: 2-2007
Posted on Thursday, May 10, 2007 - 10:00 am:   

Dear NIR "greenhorn":
Rest assured, you are not alone in being "confused". I have worked around this field for 15 years now, and I feel that I only understand a little slice of it completely. Other folks who follow the forum have far more experience with the kind of problem that you are working on, and they can give better specific advice than I can. I will give my perspective, which may be at odds with that of others.

First, the idea that a good pre-treatment gets rid of physical effects and leaves the chemical effects is overly optimistic for the general case. There are interactions between the two. That being said, we can approximate understanding by saying that the ideal set of data would be linear in absorbing power (the absorbance of the material making up the sample in the absence of physical effects) for "chemical effects" with deviations from linearity being due to "physical effects".

Second, I take the position that you have pre-treated the data in every instance you described. Using any metric such as { log(1/R) } is a pre-treatment. I regard the raw data as the fraction of incident light that has been remitted from your sample at each wavelength.

Third, there is no general solution to the "inverse problem". We have some reasonably good models that allow us to predict what the remission from a sample should be, and can only predict composition from remission data for a few special cases, which are largely too simple to be useful in the real world. These facts make it hard to "know" how a specific pre-treatment will affect the data.

Given all that (as a disclaimer?), you can not expect to have captured all chemical effects in the number of principle components that you have of chemical components. Consequently, I suspect that with only 15 samples you are not over-determined, and are just highlighting different aspects of variability with the different pre-treatments. Furthermore, the largest source of variability may very well be the physical effects, the most famous of which is particle size. I believe that when are doing a pretreatment, you throw out a large source of variability, it takes more components to capture the remaining variability.

So my recommendation would be to get more samples, making sure that they are very well characterized with respect to the parameters that are of most interest to you, and try to make sure the levels of these are quantized, rather than a continuum. You will want to make sure that the samples represent the full concentration ranges of ingredients and that there be no correlation with chemical composition and the measured physical parameter.

Then, examine the PCs of the log(1/R) data to see where the different levels of the important parameters are in PC space. Make your next "pre-treatment" selection of wavelength ranges. Frequently, you can find a wavelength range that is sensitive to the parameter you're interested in, and not so sensitive to the others. After that you can see if you can reduce the effect of one physical parameter on another by pre-treatments, but don't start there.

People with more experience than you and I may be able to take a more direct route to a classification scheme, but I find I have to approach these problems in small steps, and understand what is going on at each step before continuing.

Probably not what you wanted to hear.

Don Dahm ([email protected])
Top of pagePrevious messageNext messageBottom of page Link to this message

Dongsheng Bu (dbu)
Junior Member
Username: dbu

Post Number: 9
Registered: 6-2006
Posted on Thursday, May 10, 2007 - 9:16 am:   

Dear NIR_greenhorn,

Did you try PCA based classification in the Unscrambler? Please search SIMCA Classification in its Help. We have a customer able to discriminate between 18 pharmaceutical materials by NIR.
MSC on spectra from several tablets may not be helpful, since different tablet has its own shape in baseline. I think MSC will be good to correct scattering effect within individual tablet type.
Norris Gap derivative in Unscrambler has no smoothing as suggested by Dr. Norris, if smoothing is needed, please try Savitzky-Golay/Gap-Segment derivatives.
I probably understand your observation of K-means clustering results w/w pre-treatment. I would again suggest PCA based SIMCA approach.

Though your questions are too long, they are quite clear and interesting. I would like to share my thoughts here.
1. I would agree the reason you described. If active levels in tablets are large enough, I think you can count only on chemical differences with appropriate pretreatments.
2. SNV and vector normalization in the Unscrambler would perform similar effect on spectra that correct pathlength variations, SNV is usually used after derivative in NIR applications. For MSC and derivatives, please see my comments at beginning.
3. I think that discrimination by both physical and chemical effects would be better. You may try multi-stage classification. Firstly discriminate few tablets based on their physical differences without pre-treatment, then other tablets based on chemical differences with pre-treatments.
4. I have observed similar things in several cases as you described. You may check PCA loading plots to confirm your thinking. I also think pre-treatments eliminate some physical effects which are dominant in spectra, and enable chemical effects are significant in variance counted by PCA.

Regards,
Dongsheng
Top of pagePrevious messageNext messageBottom of page Link to this message

venkatarman (venkynir)
Senior Member
Username: venkynir

Post Number: 40
Registered: 3-2004
Posted on Thursday, May 10, 2007 - 12:00 am:   

For Particle size use MSC (read Kubul Monk relation) donot do any pre-processing.
2.On -line always SNV stands Good or any Pattern recognition SNV good.
3.No ,pre-processed prefer and us QDA
4.Please post the data if you increase the PC we can move close to true components.
why you have not talked about Mhalanobase.
Top of pagePrevious messageNext messageBottom of page Link to this message

Kamdzhilov Yavor (nir_greenhorn)
New member
Username: nir_greenhorn

Post Number: 1
Registered: 5-2007
Posted on Wednesday, May 09, 2007 - 8:58 am:   

Dear NIR-Experts,

I need help!

I am new to the field of NIR (only 2 months) and although I have a solid background in spectroscopy I am pretty confused about quite a few issues with NIR. I have recently signed up for the NIR forum since this seemed to be the only place where I can get a reasonable advice.


Now, I am trying to discriminate between about 15 solid tablets (mainly CaHPO4, Starch, actives, etc.), measuring in diffuse reflectance. For that I am running both
PCA and K-Clustering (both on Unscrambler 9. 6). The pretreatment of the data
is one of the following: MSC, SNV, Normalization, Noris Gap 1-st, 2nd derivative and the results differ accordingly.

The K-clustering gives to a large extent the same results independent on pretreatment. Only the type of distance (Eucllidian, Manhatan) matters.


The PCA (2-4 Factors), though, results in different grouping of the spectra when I use pretreatment of any kind. MSC, SNV and Normalization yield very much the same clusters whereas derivatives (Noris Gap, 1, 2) give no reasonable groups.

Here are the questions:

1. I am separating physical from chemical effects by pre-treating the data. Is that the reason for different clustering with and without pretreatment? If so, to what extent? Can I count only on chemical differences when, say, MSC + 1 derivative are applied?

2. Are MSC and SNV also suppressing physical effects (packing density, particle size distribution etc.) and are they more efficient in doing so, compared to regular vector normalization or derivatives?

3. If physical differences in the tablets (say, hardness or compression force) are what I am looking for in the descriminant analysis, should I use ONLY non-pretreated data? After all, suppressing them would diminish the scores of those components.

4. When pre-treating the data I have got 3-4 PC to explain nearly 99 % of the variability, whereas the raw data need only 2 components. If I eliminate some physical effects among the tablets, why do I have an increase of PC�s?


I hope my questions are not too confusing. I hope now you understand my nickname and would definitely appreciate any advice.


Best,

NIR_greenhorn

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.