NIR Discussion Forum: Oil in seeds quantification

Oil in seeds quantification Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » Oil in seeds quantification

« Previous Next »

Author

Message

Gabi Levin (gabiruth)
Senior Member
Username: gabiruth

Post Number: 30
Registered: 5-2009

Posted on Wednesday, February 03, 2010 - 12:46 am:

Hi Limor,

Please call me - I can not place info that will be considered commercial - I have large experience in that matter

Gabi Levin +972 050-7733911

Limor Baruch (baruch)
New member
Username: baruch

Post Number: 4
Registered: 5-2009

Posted on Tuesday, February 02, 2010 - 10:47 am:

Hi,
Thank you all for your notes, I have found them very useful, and I'm trying to improve my methods accordingly.
As for your questions:
I use 18-22 calibration samples (5 spectra for each sample)and I try to choose samples which will cover the whole range of oil concentration in the seeds.
I have a set of about 40 samples which were scanned and then their oil content was determined by solvent extraction. The samples which were not included in the method were used for its evalualion.

dafna barkan (dafna)
Member
Username: dafna

Post Number: 13
Registered: 4-2003

Posted on Tuesday, February 02, 2010 - 3:31 am:

Hi, Limor,
In addition to the points already raised, you should also pay attention to the size of your calibration set, as well as to the oil content range of the samples you are using. Too few samples or a very narrow range of the constituent you are trying to calibrate can impair the accuracy of the prediction model.

Marion Cuny (marion)
New member
Username: marion

Post Number: 5
Registered: 6-2009

Posted on Tuesday, February 02, 2010 - 2:28 am:

Hi

to compliment what Jose suggested, there is a test to know if you can trust your prediction or not called the inlier test.
It looks at samples that will be accepted by the T2 Hotelling limit but will still be too far from a calibration sample.
The limit for this test is the maximal distance between 2 neighbours samples in the calibration set.

and to add on Nils comment, you should use some indicators to determine the accurate number of latent variables to take into account. So every time you add a component look how those parameters behave:

- Explained variance in Y: you want it to go as close as possible to 100, it is better to use the validation curve for this (I assume you do a cross-validation when calibrating)

- the RMSECV: you want it as low as possible, so if it start to increase when adding one more component you should stop (I assume you use a cross-validation when calibrating)

- the shape of the component you add. If it shows a signal like for example a peak you are still using information to model Y if it looks noisy maybe you start over-fitting.

Marion

Jose Miguel Hernadez Hierro (jmhhierro)
Junior Member
Username: jmhhierro

Post Number: 7
Registered: 4-2008

Posted on Tuesday, February 02, 2010 - 1:54 am:

Hello Limor,

I agree with Nils. Moreover, I think that a useful way to be sure that your new samples belong to the spectral space is a PCA analysis. You can run a PCA whit your calibration samples to undertake this task. If your new sample belonged to the spectral space, you could make the prediction of this sample.

Best regards,

Jose Miguel

Nils Lastein (nilslastein)
New member
Username: nilslastein

Post Number: 1
Registered: 7-2009

Posted on Tuesday, February 02, 2010 - 1:26 am:

Hello Limor

Which kind a validation do you use? It seems as you are using too many components (overfitting). The best way to determine the right number of components is to use test-set or second-best split-half.

How many calibration samples are you using?

\Nils

Limor Baruch (baruch)
New member
Username: baruch

Post Number: 3
Registered: 5-2009

Posted on Tuesday, February 02, 2010 - 1:19 am:

Hello,
I'm trying to develop a method for quantification of oil content in seeds. I work with Antaris II instrument and TQ analyst software. Every time I reach a model which appears very good according to the software tools. However, when I use it for the prediction of new samples it fails to predict the actuel oil content. Sometimes the prediction seems OK but after performing some more tests I see the error in prediction.
I will be grateful to get some new ideas or advices.