NIR Discussion Forum: Need advice/suggestion:Thanks

Need advice/suggestion:Thanks Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Need advice/suggestion:Thanks

« Previous Next »

Author

Message

suranjan panigrahi (Panispisani)

Posted on Thursday, September 26, 2002 - 6:23 pm:

I am an university faculty. I will highly appreciate your feedback here.

We have done some reserach on using NIR techniques for predicting protein content of ag products(wheat etc). We have developed a different chemometric algorithm and then used PCR and PLS for predicting protein contents (PC). We took around 500 samples. 420 for calibration and 80 for validation.Protein content was measured on as is basis and was expressed in conventional form as % (e.g. 12% or 15% etc). We did not express Protein content in decimal form.

We defined error = actual PC - predicted PC
(for examp if actual PC = 12% and predicted was 12.6%; then error was 12-12.6 = - 0.6 (%))

We defined accuracy = {1- abs(error)/Actual}x100
(eg: for this case; prediction accurcay = 96%)

We found average accuracy = average of accuracies obtained for all the calibration or validation samples. Our avg. accurcay was in high 90%.

We determined SEP,SEC, MSEP, MSEC according to formula given in the Book of Chemomterics by Richard Kramer.

Our SEP (on validation) = 0.48; MSEP =0.24

The linear regression model between actual and predicted Protein contents (on validation data) showed a r (corrletaion coeff ) of 0.96.. Slope of the line was 1.05 and offset was very close to 0. Thus, the linear regression model was very close to the line through the origin.

(Stand deviation) Std of actual protein content was 1.6 and std of predicted protein content was 1.5. we also determined other 7-8 parameters, but I am not mentioning those.

based on these mentioned finding on our validation data, we feel that our developed algorithm and data collection method has high potential for success. IS this an acceptable basis?? Do you agree that this result is good and shows potential. ????.
P.S. we have very nice results for training data set..

Please suggest. Thanks for your patience reading my long question.

Peter Tillmann (Tillmann)

Posted on Friday, September 27, 2002 - 6:13 am:

> Protein content was measured on as is basis and
> was expressed in conventional form as % (e.g.
> 12% or 15% etc). We did not express Protein
> content in decimal form.

[I understand it that way that for every sample you enter 12%, 13%, 14% or 15% for protein in the calibration and validation file. You never enter 12.4% even if you determined 12.4% with Kjeldahl.]

> Our SEP (on validation) = 0.48; MSEP =0.24

Pierre wrote in a parallel answer that an SEP of 0,3% would be his upper limit for protein in wheat. (And mine as well.)

By the above procedure (data without 1/10th of a %) you added an "artificial SEP" of approx. 0.3%, because of rounding.

Assume the correct value is 12.3% but you enter only 12% in the validation file. The software output will be e.g. 12.4%. The correct difference is 0.1%, but because entering only full % points 0.4% is shown as a difference.

Since uncorrelated errors are summed as squares your upper SEP limit will be 0.42 = sqrt(0.3^2 + 0.3^2). So you are close to your maximum SEP.

Of course you better enter all data with the necessary precision. This will give "better results" with least imput. This is mandatory for at least the validation samples.

Yours

Peter

hlmark

Posted on Friday, September 27, 2002 - 8:00 am:

Suranjan - Peter has the right idea, but he oversimplified the calculation a little. Dropping a digit from the answer gives an error, as he said, but this error has a uniform distribution, rather than the (usually assumed) Normal distribution. The error of a uniform distribution is equal to the range divided by sqrt (12). In this case, the error range is 1 (one), so the error introduced is 0.268.

We can estimate the error you would expect to get by subtracting the square of that error from the square of the total error, then taking the square root:

0.48 ^2 = 0.268
0.268^2 = 0.083

difference = 0.145

error after correction = sqrt (0.145) = 0.383

So we would expect that including the first decimal place will reduce your SPP from 0.48 to 0.38 (roughly - I carried more places in the calculations than I'm showing here)

Howard

\o/
/_\

suranjan panigrahi (Panispisani)

Posted on Friday, September 27, 2002 - 11:14 am:

Thanks for all your time and input.

There is one important thing that gor miscommunicated from my first message. I did not round off any value. For exampel, if the predicted protein was 12.82% , we used it as 12.82. If the actual protein was 12.7%, we used it as 12.7. SO there was no round off at all.

But the question is my obtained SEP od 0.48, as determiend by using the equation from Richard Kramer's book (p 170) along with average accuracy of 97.25% an MSEP = 0.24 and RMSEP = 0.49 shows the potential of my methods??

suranjan

Jens Rademacher (Rademacher)

Posted on Monday, November 18, 2002 - 12:55 am:

Suranjan - could you tell something about the sample-set; Do you have 500 samples of each grain or are all samples from different grains and you built the calibration with diiferent grain sorts? My next question is: do all samples originate from the same season does your validation set and calibration set contain samples of different years?

jens