PCR,PLS,SVM comparison Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » PCR,PLS,SVM comparison « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Bilal Ahmad Malik (elp09bm)
Junior Member
Username: elp09bm

Post Number: 10
Registered: 7-2011
Posted on Monday, August 22, 2011 - 8:51 am:   

Hi Howard and Barry,
I was expecting some comments from you.
Thanks,
Bilal
Top of pagePrevious messageNext messageBottom of page Link to this message

Desai N.M. (nishithdesai1969)
New member
Username: nishithdesai1969

Post Number: 3
Registered: 12-2010
Posted on Monday, August 22, 2011 - 2:58 am:   

To prcoess the data a powerfull software is also needed to see Nir Spectral Changes and Non-Changes in Raw as well as Procerssed spectra to co-relate the parameter's % with that.Manually done dataset then some other varibale aslo get corelated.

Please comment
Top of pagePrevious messageNext messageBottom of page Link to this message

Gabi Levin (gabiruth)
Senior Member
Username: gabiruth

Post Number: 63
Registered: 5-2009
Posted on Tuesday, August 16, 2011 - 12:20 pm:   

Hi guys,

I hope this will be of some contribution. I have not had a chance to fool around with SVM - as we are very much practitioners that work with what we know will work for a long run with our customers - which is PLS1. The package we use is Unscrambler - and one of the nice things about it is that you can quickly know or avoid over fitting. Therefore, if a PLS1 good package from anyone that makes one shows a certain RMSECV with 5 vectors or PC's - whatever you call it - I tend to believe it - and since I see that the low RMSECV for the SVM used 37!! support vectors - what ever this may mean - and as I said I don't know much about it - but if it is anywhere similar to PC's in PLS1 - then I would say it is way over fitted.

Gabi Levin
Brimrose Corp
Top of pagePrevious messageNext messageBottom of page Link to this message

Gabi Levin (gabiruth)
Senior Member
Username: gabiruth

Post Number: 62
Registered: 5-2009
Posted on Tuesday, August 16, 2011 - 12:17 pm:   

Hi guys,

I hope this will be of some contribution. I have not had a chance to fool around with SVM - as we are very much practitioners that work with what we know will work for a long run with our customers - which is PLS1. The package we use is Unscrambler - and one of the nice things about it is that you can quickly know or avoid over fitting. Therefore, if a PLS1 good package from anyone that makes one shows a certain RMSECV with 5 vectors or PC's - whatever you call it - I tend to believe it - and since I see that the low RMSECV for the SVM used 37!! support vectors - what ever this may mean - and as I said I don't know much about it - but if it is anywhere similar to PC's in PLS1 - then I would say it is way over fitted.

Gabi Levin
Brimrose Corp
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (td)
Moderator
Username: td

Post Number: 267
Registered: 1-2001
Posted on Tuesday, August 16, 2011 - 11:04 am:   

Hi Bilal,

I thought you would be swamped with comments; most people must be on holiday!

I would say that you only have 30 samples; a set of 10 samples in a validation set is quite small. The other thing that surprised me was that you had limited your scan to the 4,000 - 5,000 cm^-1 range. I'm not sure which is the best region but after all the preparation work I would always make a full scan.
The difference of the RMSEC and RMSECV in the SVM calibration suggests to me that this is over-fitted.

Overall I think it is a very good experiment.

Best wishes,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

Bilal Ahmad Malik (elp09bm)
Junior Member
Username: elp09bm

Post Number: 9
Registered: 7-2011
Posted on Tuesday, August 16, 2011 - 10:06 am:   

Hi Barry,
Thanks for responding to my query.I tried to put more information about the experiment and I have attached the report of PLS,PCR and SVR as well.

A total of 90 NIR spectra were collected for a 30 mixtures of a simulated biological matrix prepared by dissolving glucose, urea and triacetin in a phosphate buffer solution. Urea and triacetin are used to model the urea and triglycerides in blood, respectively. The spectra were collected with a Fourier transform spectrophotometer (spectrophotometer cary 5000 version 1.09). The buffer solution was prepared with 3.4023 g of potassium dihydrogen phosphate and 3.0495 g of sodium monohydrogen phosphate, dissolving them in distilled water. 5 Fluorouracil was added to the buffer as apreservative. The aqueous solutions of glucose and urea solutions were prepared by dissolving the dry solutes in the buffer solution. The triacetin solution was prepared by diluting the solution by adding the buffer solution.
The glucose concentration of the prepared samples ranged from 20 to 500 mg/dL, triacetin concentration ranged from 10 to 190 mg/dL, and Urea ranged from 0 to 50 mg/dL. The concentrations of the three components were selected to be higher than their physiological range in blood.
All experiments were carried out in a non-controlled environment to evaluate the ability of the proposed methods in this work to deal with the uncompensated variations. Many previous studies in this area have carried out experiments in a controlled environment to compensate the effect of the baseline variation.
The samples were placed on infrared quartz cuvette with a fixed pathlength of 1 nm. Three spectra were collected for each sample without removing the sample from the spectrometer in a double beam mode. The collected data spanned the spectral region from 2000 nm to 2500 nm (4000-5000)cm with a spectra resolution of 1nm. At the beginning of the experiments the absorbance spectra of the buffer solution were collected and used as reference spectra.
The collected spectra were divided randomly into two sets: each set spanned the whole range of concentration. The first set contained the three replicate spectra of 20 samples and was used to build the calibration model. The second set was used in the prediction phase to test the calibration model and contained the triplicate spectra of 10 samples.
Preprocessing:Ist derivative followed by autoscale.
CrossValidation:-Venetian blinds
I cannot see any high residulas or high leverage?
application/vnd.openxmlformats-officedocument.wordprocessingml.document
AnalysisReport_PLS.docx (12.0 k)
application/msword
AnalysisReport_PCR_FIRstderivative.dot (25.6 k)
application/msword
AnalysisReport_SVM.dot (25.1 k)
Top of pagePrevious messageNext messageBottom of page Link to this message

venkatarman (venkynir)
Senior Member
Username: venkynir

Post Number: 136
Registered: 3-2004
Posted on Monday, August 15, 2011 - 1:54 pm:   

Barry
There are too many things that we don't know. How many variables are there? What is the wavelength range? What sort of data is this? Transmission? Reflectance? What kind of samples are these?
Why you need all above the things ?
Once data is there and pre-processing set ?
Top of pagePrevious messageNext messageBottom of page Link to this message

Barry M. Wise (bmw)
Junior Member
Username: bmw

Post Number: 8
Registered: 2-2011
Posted on Monday, August 15, 2011 - 11:32 am:   

Hi Bilal:

The RMSEC/CV/P looks fine but that really isn't enough to judge a model by. There are too many things that we don't know. How many variables are there? What is the wavelength range? What sort of data is this? Transmission? Reflectance? What kind of samples are these? What is the expected error in the reference values? What does the variance captured table look like? How about the whole RMSEC/CV curves? How was the cross validation performed? What, besides 1st derivative, was used for preprocessing? Do you have any samples with particularly high leverage (T^2)? How about samples with high residuals (Q)? Is the data set from a designed experiment, or is it random? Did you select particular regions of the spectra? If so, why? How was the test set chosen?

All of these are things to consider when developing a model.

BMW
Top of pagePrevious messageNext messageBottom of page Link to this message

Bilal Ahmad Malik (elp09bm)
Junior Member
Username: elp09bm

Post Number: 8
Registered: 7-2011
Posted on Monday, August 15, 2011 - 11:06 am:   

Hi All,
I am trying to compare the results of PCR,PLS and SVM on a dataset with 90 samples.I am using 60 for calibration(training) and other 30 for test data.
In all cases I am using First derivative for preprocessing.
Could someone kindly look at my results and see if they are ok.
RMSEC RMSECV RMSEP No
PCR 39.7552 43.345 46.9664 6 pcs
PLS 29.1376 33.1028 37.3466 5 Lvs
SVM 13.1585 30.0618 23.7791 37(SV�s)

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.