Precision problem Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Precision problem « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Xuxin Lai (Laixuxin)
Posted on Tuesday, June 22, 2004 - 6:29 am:   

Dear all,

When we make a good calibration model, we have to include as much variance as possible in order to improve the predictive ability for the unknown samples, right? So, the accuracy of the predict results should be improved, but how about the precision of the replicates? Can it be improved as well?

Best regards,
Xuxin
Top of pagePrevious messageNext messageBottom of page Link to this message

Michel Coene (Michel)
Posted on Tuesday, June 22, 2004 - 6:46 am:   

You can start blending tomato sauce in your gasoline model just in case, but this will have detrimental effects on your capability to predict octane number. You should put in the variance you expect to meet while predicting, but certainly not more. You might even ponder the question wether accurately predicting the 1 in a 1000 oddity is worth lowering your SEP by two points. As for the precision of replicates, what exactly do you mean? The repitibility of an unknown sample measured twice?
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Tuesday, June 22, 2004 - 8:15 am:   

Xuxin - there's a fairly extensive discussion of the tradeoffs between accuracy and precision (including the math) on pages 38-53 of "Principles and Practice of Spectroscopic Calibration" (2nd ed). The bottom line is that in order to accommodate and reduce variance due to one source (by increasing the contribution of that source to the total error variance), everything else the same, the contributions due to other sources of variance must increase.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Tuesday, June 22, 2004 - 9:35 am:   

Xuxin,

One of the parameters that Howard discusses in his book for evaluating the rejection of random noise in (repeat) measurements is the square root of the sum of the squares of the calibration coefficients (RSSB, where B is used for the coeffs, and he calls it IRE, index of random error). It was used widely for MLR calibrations from filter instruments that used a limited number of wavelengths, but it is also true for calibrations from PLSR or PCR. This parameter is largely ignored in most "modern" packages concerned with PLS, unfortunately. But, if you export the B vector to a spreadsheet, it is readily calculated, and can help you choose among nearly equivalent models. One thing that will help on the precision is to choose models employing lower number of factors.

Best regards,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Tuesday, June 22, 2004 - 11:23 am:   

Dave makes a good point. The computation of IRE (or RSSB) is a relatively easy way to compare the noise sensitivity of different models, and doesn't require specifially collecting precision data in order to make the comparison. However, the ability to export the values of the coefficients of the calibration model (as opposed to the values of the PCR or PLS loadings) will depend on the capabilities of the software you are using.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Xuxin Lai (Laixuxin)
Posted on Wednesday, June 23, 2004 - 9:14 am:   

Dear all,

Thanks a lot for your reply.
We don't have the book at the moment, but I will try to find it in the other library here.
I think I need to make it clear for my project. I am trying to make a calibration model for a suspension sample in different concentration by NIR measurement. There is only one major component in the sample. But I use different cuvettes for the measurement, so I can also see the cuvette variance. My primary result(I use The Unscrabler for data analysis) showing that the first PC (priciple component) discribe the concentration variance, the second PC discribe the cuvette variance. My plan is to include as much cuvette variance as possible into the model. I hope the predictive ability for the future unknown samples can be strongthen. For obtaining a good prediction result, we need not only a better accuracy, but also a better precision (or reproducibility), right? My question about the precision actually comes from a argument with a traditional statistician, he think the reproducibility of the replicates (several measurements for the same sample in same concentration)will not change no matter the calibration model is good or not. But I don't think so. Could you tell me your opinions?

How about the RMSEP(Root mean square error of prediction)? Does it also relate to the precision? Could you tell me the difference between IRE and RMSEP?

Best regards,
Xuxin
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, June 23, 2004 - 9:47 am:   

Xuxin - first: the IRE is used to estimate precsion only, and has nothing to do with accuracy. The RMSEP is a measure of accuracy.

If the "cuvette variance" is due to pathlength differences between cuvettes, then it is theoretically correct to use one of the forms of "normalization" that have been developed to reduce or remove the effect of pathlength from the data, before performing the calibration (or prediction) calculations. If the situation is favorable, you may well be able to improve both precision and accuracy by not putting the burden on the calibration to accommodate the pathlength difference.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, June 23, 2004 - 9:56 am:   

As for the question of whether the precision depends on the accuracy, the answer is that it doesn't depend on it directly, but both of them depend on the model you use. A model with large coefficients will have poor precision, whether it is accurate or not. If you can find a model with small coefficients that is also accurate, then it will have better precision than the one with large coefficients.

In the reductio ad absurdum limit, if all the coefficients are zero then the precision will be perfect, but the model will have no predictability at all! So the coefficients must have some minimum value in order to be analytically useful, which means that there will inevitably be some amount of imprecision, too. But other things (especially accuracy) being equal, the model with smaller coefficients is preferable.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Christopher D. Brown
Posted on Wednesday, June 23, 2004 - 10:30 am:   

Xuxin,

The 'traditional statistician' is almost certainly making some assumptions that will be invalid in your situation, so it sounds to me like you two need to get joint agreement on the properties of the data you're working with.

It sounds to me like you need to do a variance components analysis to figure out what effects are dominating your imprecision. Most decent stats books now cover variance components to some degree. The gold standard is a book by Searle and Casella ("Variance Components", Wiley), but your statistician can also guide you to the proper experimental design and analysis.

The discussions above regarding the length of the regression vector (or "IRE") only apply to a precision term originating from white noise (detector noise in IR, for example). If you're dealing with variable pathlength cuvettes, I suspect that your imprecision is actually dominated not by white noise but by the cuvette variable. A good representation of that cuvette variance in your calibration data will certainly help both precision and accuracy, although I'd agree with Howard that pathlength is not an effect that is trivially accomodated by linear calibration models. Is there any way to estimate the pathlength independently? If you're working in the NIR, for instance, you can empirically estimate the pathlength from water bands, and use that to normalize your spectra to a standard pathlength.

RMSEP is a joint measure of accuracy and precision, although if your 'Predicted vs. Reference' plot has a slope very close to 1, then RMSEP is mostly reflecting precision.

~ C.
Top of pagePrevious messageNext messageBottom of page Link to this message

Xuxin Lai (Laixuxin)
Posted on Wednesday, June 23, 2004 - 11:08 am:   

Dear Howard,

Thanks for your answer. It make me much clearer than before.

I tried to calculate the IRE for my PLS model. Then I have more questions:
There are two kinds of B coefficients, Weighted coefficients and Raw coefficients, which one I should use?
If the calibration model suggests that I should use 2 PCs, then which PC's coefficients I should use for calculating IRE?
How do you think the precision when the IRE=0.128? Is the precision good for analysis?
I have two model. IRE for Model 1 is 0,128; for Model 2 is 0,115. Can I say the precision in Model 2 is better than in Model 1?

Best regards,
Xuxin
Top of pagePrevious messageNext messageBottom of page Link to this message

Xuxin Lai (Laixuxin)
Posted on Wednesday, June 23, 2004 - 11:41 am:   

Dear Christopher,

Thanks for your reply.

For my data, the major variance is come from the cuvettes. Actually I have no idea where this cuvette variance comes from. Probably the pathlength is the main reason as Howard and you said. But my case is a quite special. My sample is a suspension, actually it's the light scattering effect helps me to distinguish the different concentration. I have tried to use 'maximum normalization' on my data before, but the result is worse than I just use raw data, and the cuvette variance is still there. But maybe I didn't do it in a correct way. I just push a botton in the Unscrambler. Maybe you have special rule to perform it.

Could you tell my how to 'estimate the pathlength from water bands, and use that to normalize your spectra to a standard pathlength'? I am not sure I understand it very well.

The slope in the 'Predicted vs. measured' plot is 0,990, so do you think the RMSEP is mostly refected by precision?

Best regards,
Xuxin
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, June 23, 2004 - 12:53 pm:   

Hmmm... complicateder and complicateder. Xuxin, when you said you were using cuvettes, I assumed that you were measuring transmission; that was what made me think to apply a normalization correction.

But now that you say your measurement is dependent on scattering, to me that implies that you are measuring by reflection. Can you tell us more about your measurement setup? If you are measuring by reflection and using the amount of reflected light to indicate the amount of analyte, then in that case normalization is exactly the wrong thing to do because it will remove the part of the signal containing the information about the analyte concentration.

Any other information you can give us will also be helpful. For example, do you know what is the nature of the differences between the cuvettes, that is giving you a different signal from different cuvettes? Is there anything else about your setup that you can tell us, especially if you thinmk it might relate to differences you are seeing?

As for computing the precision, the formula is:

SD(result) = sqrt (SD^2(A) * sum(Bi^2))

where:

SD^2(A) represents the variance of the absorbance readings

sum(Bi^2) is the square of the IRE, which is the sum of the squares of the calibration coefficients.

For one thing, this formula implies that the variance, or the standard deviations, of the absorbances due to noise at the different wavelengths are essentially all the same. It also means that you have to know what the noise level is. If you don't know the noise level, then you could still estimate the relative precisions from different models. So in your example, you would expect more noise if you use the model with IRE=0.128 than if you use the model with IRE=0.115. But again, this assumes that the spectral noise is the same in both cases.

I think when you ask which PCs coefficients you should use, that indicates that you are missing the point. I explicitely made the point in a previous message that you need to use the calibration coefficients for this computation, not the coefficients of any individual PC loading. The calibration coefficients are computed from the PC loadings when those are used to create the model. It may be that your software does not allow you to access those values, not even by plotting them on a graph.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Bruce H. Campbell (Campclan)
Posted on Wednesday, June 23, 2004 - 2:17 pm:   

Howard,
I believe you have omitted a term in the overall calculation of precision. There is a precesion term from the reference values also which should be considered. That is, there are at least three terms to use to describe the precision. One is from the reference procedure, another from the inherent noise and the third is from the calibration usage. I have seen texts that lump the noise and calibration usage sources into one, leaving two terms in the equation.
Bruce
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, June 23, 2004 - 3:21 pm:   

Bruce - No, precision is a measure of the degree to which a reading can be repeated (or not, actually). The offical definition of "precision" (from the FDA, for example) includes a statement that the sample must be homogenous, specifically to remove any consideration of possible effects due to the sample. True, reference readings have problems of their own, but those are separate from an analysis of instrument readings. Indeed, if you want to characterize the performance of an instrument, you want to remove any effect of the reference lab that you can, since it tells you nothing about how the instrument behaves.

It can be shown that there is an algebraic equality to the total error of a calibration being equal to the sum of individual contributions, as long as the contributions are uncorrelated. In fact, this is what my book spends a fair amount of space doing.

That being the case, you can say that the total error ("error" being defined as the variance due to differences between the instrument and reference readings) equals the sum of the contributions:

1) Reference lab error
2) Instrument Noise
3)Repack variation (in powdered solds)
4) Orientation variation (in powdered solids when one pack of a sample can be measured in different orientations of the sample cup)
5) Lack of fit of the model (especially interesting, discussed further below)

This list is not exhaustive, but suffices to make the point.

The key point about statistics is that it provides a way to tell you about the contributions of random error. Systematic errors can be characterized more easily and more meaningfully by directly saying what they are: noting a 1% bias, for example, is more meaningful than calculating the contribution of that to the total variance of an analysis.

So except for lack of fit, all the error contributors in the above list are random, i.e., they will be unpredictably different from one reading to the next.

Lack of fit, however, is a systematic error, and itself can contain different contributions:

1) Non-linearity in the relationship between instrument and reference lab.
2) Insufficient or incorrect selection of factors (or wavelengths, in MLR)

(This list is also not exhaustive)

The key difference here is that the error contribution from a systematic error will always be the same, for a the same condition: a sample-specific bias, if you will.

Turns out however, that when you go through the math to separate the contributions, these systematic errors contribute to the variances exactly the same way the random errors do. The difficulty that arises, however, is that variance of random errors can be measured; it's just a matter of repeating the measurement several times, under conditions that cause a given source of random error to express itself.

A systematic error, on the other hand can't be measured that way, because when you repeat the condition, you always get the same value for that error - specifically because it's not random, despite that fact that it contributes to the total error variance. There is a trap here, in that sometimes people think that because it can't be measured, it doesn't exist. It does, but it can't be evaluated easily.

But going back to your comments: obviosuly you can combine the error due to any set of error sources that concern you, and for the total error an an analytical procedure, you do in fact need to include the error from the reference lab as well as the error from the instrument. But that is not to say that that is an instrumental error, those are due solely to, and must be characterized, as coming from the instrument, if you want to make sense out of the different effects.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Wednesday, June 23, 2004 - 3:49 pm:   

Xuxin,

Good discussions! Another thing, one has to remember that the IRE is just an estimate of the expected precision, as Howard has described. I think it is wrong to give 3 or more significant figures in the calculation, 2 suffice. When you do that, you sense that 0.13 and 0.12 are probably not statistically different, and so there is no compelling reason to select one model over the other on this basis. Factors of 1.5 or 2 or more would certainly influence your selection. Same argument goes in comparing SEP values and particularly SECV values. Although it seems like these values are solid, leaving out a single point can influence the values substantially, so we should agree that we are talking about statistics with variability, and only give 2 significant figures. When the values are in the neighborhood of 0.12, this may seem like throwing away good information, but in general it is not.

By the way, the B vector is readily available from Unscrambler and many other chemometrics programs. If you look, for most models B and Bw are identical, so it does not matter which you choose. I believe that with spectral data, we should not be using weighting.

Best regards,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

NIRguy
Posted on Wednesday, June 23, 2004 - 5:10 pm:   

Guys, one comment, hope it's not off target:
From the context, it seems the precision we discussed here includes not only the instrument variation, the repacking variation, which are random, but also the variation from different sample cells. Since in this experiment the difference of cuvettes is a systematic variation that can't be neglected, I would think including those cuvettes variation in the model will increase the model performance - accuracy and precision. And, I think for setting up an analysis method, you can limit the cuvettes to certain ones to avoid this cuvettes effect unless the samples you are measuring come with their own cuvettes such as product within the ampul.

L
Top of pagePrevious messageNext messageBottom of page Link to this message

Christopher D. Brown
Posted on Wednesday, June 23, 2004 - 6:23 pm:   

Agree wholeheartedly NIRguy.

As I said above, I suspect that the white noise precision term (from the length of the regression vector, or "ISE") contributes little to the imprecision that Xuxin is seeing. For that reason I doubt that the two norms he reported (0.128 and 0.115) have much relevance to the real problem.

If you can, run a small experiment to approximate the variance components of your error. Nested designs are the easiest to analyze without specialized statistical software. Here is a quick nesting protocol:

5*sample>3*cuvette>2*re-insert>2*integrate

which reads: measure 5 distinct samples, each in 3 different cuvettes, 2 insertions of each cuvette and let the instrument take two measurements for each insertion.

This assumes you can get your hands on 15 different cuvettes for a random effects analysis (what you want). Otherwise you need to model it as a fixed effect (chat with your statistician).

You need to understand the sources of imprecision in your measurement before you can remedy them.

~ C.
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, June 23, 2004 - 6:37 pm:   

Statistics isn't necessarily the answer to all problems. Another way to eliminate the effect of different cuvettes is to not use different cuvettes. Xuxin, can you find a set of cuvettes that are matched, and then use only those? Or even just use one cuvette for all your measurements (but then you'd better not lose or break it!)

Don't know if NIRMAN's repacking applies, since the samples are not solid powders. But Steve's experiment will give you some information as to what the important contributors to the overall precision variations are, including the aliquot-to-aliquot variation, which would be the equivalent in this case. The magnitudes of the variability from the different sources of the variations will also depend somewhat on the nature of the model you use, but to a first approximation, any reasonably accurate model will be satisfactory for testing the precision at this point.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Bruce H. Campbell (Campclan)
Posted on Wednesday, June 23, 2004 - 7:42 pm:   

Re: Dave Hopkins comment on significant figures in expressing precision.

A statistician told me the following rule of thumb for limiting the number of significant figures in a precision mesurement.
If there are 10 values used to calculate the precision, then one and only one significant figure should be used.
If one has 100 values, then two significant figures can be safely used.
Note the number of values and number of significant figures dependence is a squared relationship.
Bruce
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, June 23, 2004 - 8:33 pm:   

I hadn't heard that rule-of-thumb before but it sounds about right. I think it might be logarithmic, though, rather than polynomial

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

JMueller
Posted on Wednesday, June 23, 2004 - 11:28 pm:   

Re: "Statistics isn't necessarily the answer to all problems"

I do not know what hlmark means by this statement, but I don't think I have ever heard a statistician claim that statistics answers _any_ problem! It does, though, provide analytical routes to identifying the source of and quantifying the problem, which is what Christopher seems to be recommending.

I too would prefer to do a quick variance component experiment, rather than a speculation about the problem.

Good luck!

Juergen
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Thursday, June 24, 2004 - 4:55 am:   

Just what I said immediately following the statement: sometimes, the appropriate solution to a problem is a "quick-fix" that avoids complicated experiments (however small) and their interpretation. It may not be as intellectually satisfying, and the alternate solution may have problems of its own, but sometimes it's appropriate to "think outside the box", as they say.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Xuxin Lai (Laixuxin)
Posted on Thursday, June 24, 2004 - 9:01 am:   

Dear all,

Thanks a lot for all the discussions.

First I think I need to make it clearer for my project. Yes, I do measure in transmission mode, but I think both reflection and transmission are measured at the same time. When the concentration goes up, more reflection in my sample; when it goes down, the transmission is dominant. So I think it depend on the light scattering.
The cuvettes I used are made by quartz.
I don't have reference analysis here. I just dilute it in a buffer by myself. So it could have the variance comes from the sample preparation.

I have make a experiment to investigate the cuvette variance and the sampling variance. The design is:
1. Prepare 8 samples for the same concentration;
2. Use 8 cuvettes for the measurements.
3. Each sample measured by 5 different cuvettes.
4. Two cuvettes are used to measure all that 8 samples. The other 6 cuvettes randomly choose 4 samples to measure.
5. Each time collect a new background before the sample measurement. (I think it should compensate the instrument variation, right?)
So 40 spectra were produced in this experiment. Later I have done PCA for these data. From the score plot, I can only see 2 groupings according to 2 individual cuvettes, but I can't see any groupings according to the individual sample. So I conclude that the major variance for my case is the cuvette variance, and the cuvette-sample variance as well. I also have several measurements used the same cuvette for the same concentration sample( but I wash the cuvette between each measurement), I can see there is some variance there, and it also depend on different cuvettes. For some cuvettes this variance is small, for the other cuvettes, this variance is big. Maybe this is why I can only see 2 groupings according to cuvette in my PCA model as I mention above. So I guess there is some physical interaction between cuvette and sample.

Christophoer, do you think this experiment is similar to your design? What kind of data analysis I can perform more to investigate the random error.

One thing I have to mention: my sample is a suspension, it will precipitate. So when I put the sample into the cuvettes, the precipitation start, though it is slowly, but it will change a little bit during the measurement. I think this is the reason to make the cuvette variance worse than the other application(fg. transparent solutions). And also because this reason, I could not make two measurements for each insertion of cuvette.

Then I have several questions about your discussions:
Howard, I don't know how to calculate the 'SD^2(A)'. Do you mean to export the absorbance value of all the wavelengthes in a spectrum and calculate the square of the standard deviation? I mean, for different concentration, the absorbance value of spectrum is not the same. Then which concentration should I use for calculation? Well, I think I must be mis-understood the point. Could you explain it more? In Unscrambler, I can export the B-coefficients in every PC. But I think your calibration coefficients means the the B-coefficients corresponding to the appropriate optimum number of PC, which has been suggested in the model. Is it right? For the Model 1 I mentioned above, the suggested optimum no. of PC is 1, so the IRE is 0.13. For Model 2, the suggested no. of PC is 2. when I re-calculate, the IRE is 0.35. The difference between Model 1&2 is: in Model 1, there are 21 spectra data for 7 concentrations and each concentration measured by 3 different cuvettes; in Model 2, I add that 40 spectra data for the same concentration(see above) into to the model. For my understanding, Model 2 should have better predictive ability, but it's expect precision is obviously worse than Model1. How can this happen? Now I am confused again. Maybe this is not a good example to discuss, because only one concentration has been performed in 8 cuvettes.
Do you have any sugestions for correcting the cuvette variance?

NIRguy and Christopher, I plan to include the cuvette variance in the model. I don't know how much variance should be included in the model for a good predict result. Do you have any idea about it? I don't know if it's a good idea to limit the cuvettes. I mean if some day the cuvettes are broken. Then I need to make a new model in order to fit the new cuvettes...

Thanks for your help!

Best regards,
Xuxin
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Thursday, June 24, 2004 - 10:03 am:   

Xuxin - your message covers a wide range of topics and I think several of us will try to help you with more than one of them.

First, if you're measuring in transmission mode, that's the important factor for us to understand what's going on - your measurement is of the LOSS of energy of the light passing through the cuvette. Indeed, it is likely that reflection and/or scattering causes part of this loss, but there is probably also considerable loss of energy due to absorption of the energy by the molecules in the sample. The point is that you are not MEASURING the light that is reflected/scattered.

The value of SD^2(A) is not simply the standard deviation of the spectra. A crude estimate of it could be calculated from the data from your experiment, although it would not carry many degrees of freedom.

Also, I think you still misunderstand the difference between the loadings coefficints and the calibrtion coefficients. It's not easily explained in a text discussion like this, I'm afraid, since it involves understanding how the PCA calibration algorithm works.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Thursday, June 24, 2004 - 11:20 am:   

Xuxin,

It is hard for me to understand why, with 8 cuvettes, your PCA scores plot would show only 2 groups. Do your quartz cuvettes have 2 plane windows with a fixed physical optical path? Or are you using vertical light path, and cuvettes like beakers, where the path is determined by how much sample you measure? Is it possible that the 2 groups are due to two different measuring temperatures, say because you measured on 2 different days?

Regards,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

NIRguy
Posted on Thursday, June 24, 2004 - 1:23 pm:   

Xuxin, It's good to prepare for the future in advance. But it's not realistic to make an idea calibration that is going to be suitable for all kinds of situation. For example, how about if your lamp failed someday? What I mean is you can use sevearal sample cells to build model in case you break one or two. But there is no need to put too much efforts by including sample cells as many as possible. A possible alternative way is signal preprocessing methods that can remove irrelavant drift or information. There are many chemomtrics methods focus on that -- and I think this leads to the issue of robust calibration and in some extreme cases, the calibration transfer. Actually, Howard mentioned normalization, Chris suggested wavelength correct method.. there are other candidate methods like derivative to remove baseline shift, MSC to correct the scatter, and more advanced chemometrics methods. They may not all work for your case but it's worthy to consider.
Just my two cents and I agree with Dave --maybe there are other effects like tempreture variation you need to think about other than cuvettes..
Good luck.
LIU Yang
Top of pagePrevious messageNext messageBottom of page Link to this message

Xuxin Lai (Laixuxin)
Posted on Friday, June 25, 2004 - 6:53 am:   

Dear all,

Thanks for your replies.

Howard, Oh, you are right, I should consider the loss of energy. The loss of energy should be according to the concentration of the samples, right? But within the same concentration, there is variation there. Would it possible to explain like this: As I told you above, it is very slowly precipitating during the measurement. So maybe some of the lights has been reflected between the particles within the sample, it means for these lights, the pathlength are not just the cuvette pathlength, it will longer than that, and it also variates between light to light, finally it leads to the variatation in the same concentration. How do you think this explaination?
More engery will be loss when the light travel longer length within the sample, right?
For the B-coefficient, I export it from a PLS model. Maybe it's wrong.

Dave, You said it's possible to export the B-coefficient from the Uncrambler. Could tell me more about how to how to do it? What I do is: I have a PLS model, in Unscrambler, File/import/Unscrabler results, then I choose the PLS model, then import B on the list. So I got a B value for all the wavelengthes, then copy this datasheet to Excel and calculate the IRE. Am I right?
For the PCA score plot, maybe I didn't explain it clearly. The data are just like a cloud, some of data can be group by it's own cuvette. The rest of them are just everywhere. So I think within the same cuvette, it still have variance.
Yes, my quartz cuvettes have 2 plane windows with a fixed physical optical path. I am using the horizontal light, the light will pass through the sample, and then goes to the detector.
I measure that 40 spectra in one day. Acutally I have measured the ambient temprature surround the cuvette when it was measured. From the beginning to the end, the temprature has increase 1.5 C. I don't know if this would affect the result. But I measured them randomly. That two groups is not because the tempreture.

Liu Yang, Yes, I totally agree that it'd be much better for me if I can find a signal preprocessing to correct the cuvette variance. Actually I have tried normalization, MSC and Second derivative. All of them are worse than the raw data. Especially for MSC and Second derivative, after them I am not able to distinguish the different concentrations. I think they remove the main information in the samples. This is also the reason I think I am measuring the light scattering in the beginning. I really hope any one of you will give a suggestion for the preprocessings which can help in my case.

best regards,
Xuxin
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Friday, June 25, 2004 - 8:55 am:   

Xuxin,

Yes, that is the best way to export the B-vector in full precision from Unscrambler. It is also possible to export directly from the view of the graph of the B-vector. Select View/Numerical and copy/paste from the table there. You lose a little precision from the lower B-values, but that may not be a problem.

Thanks for explaining the segregation you see in the PCA plots. I think you are saying that there is only one cluster with the 1st 2 components, but some samples seem to be localized in that group, while others seem to be distributed rather uniformly. It is possible that you are seeing the segregation due to the analyte signal, which you would expect to see, if you can make a PLS regression with 1-2 factors.

You can make a category variable, name it 3levels or whatever, that divides the range of your analytical value into 3 ranges. Just copy the analysis col to another column, and convert the copy to a category variable with 3 categories based on an equal division of the range. Then redo the PCA so these categories are active in the plots, and you can get a visual impression of whether the segregation you suspect is associated with the analyte category. This allows you to evaluate all the samples better.

Regards,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

Xuxin Lai (Laixuxin)
Posted on Friday, June 25, 2004 - 9:44 am:   

Dear Dave,

Thanks a lot for your reply.

I have redone the PCA as you suggest. You are right, in the score plot, I can see 3 clear groups according to that 3 levels in the 1st PC direction. I have tried 5 levels too, then I get 5 groups. So it means the variation comes from the samples, right? But it's very difficult to keep sample homogenous for each measurement. Do you have any idea about the preprocessing suitable for this case?

About the B-vector, you mean I have done it properly. Can you explain why the IRE goes up when I include more cuvettes in Model 2 compare to Model 1?

Best regards,
Xuxin
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Friday, June 25, 2004 - 10:44 am:   

Xuxin,

Great! Yes, the variation is coming from the analyte in the samples. You have told us that the sample is essentially only your analyte, and the analyte is rather insoluble and forms a precipitate as it is formed. Therefore, your method should only require a few factors to describe the variation, and perhaps it makes sense that MSC and second derivatives removes the signal, as the scatter amplitude is the signal. I think it is all coming together, and the method will depend on obtaining uniform particle sizes from batch to batch. The magnitude of the B-vector always increases as you include more factors in your model, to include more sources of variation. Therefore, the IRE goes up with more factors.

It is not clear to me that there should be any cuvette effect. You need to do the ANOVA as suggested earlier to evaluate the contributions from the sources you can isolate. Qualitatively, you can use the category variables on these sources of variability to look at their contributions to the PCA.

I think we have all learned a from your project. If you are going to the IDRC meeting at Chambersburg, I think you should present a poster.

Best regards,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

Kathryn Lee
Posted on Saturday, June 26, 2004 - 6:34 am:   

Dear Xuxin,

You wrote that you were trying to make a calibration model for a suspension sample in different concentration by NIR measurement and that there was only one major component in the sample. I think it is important to realize that there are at least two components, what is suspended, and what it is suspended in. When doing quantitative analysis, it is easy to overlook other extra components. In your case, this may not be important, but changes in that component could be causing some problems. You might also want to consider possible impurities in those two components.
You mentioned that you don't have reference analysis but dilute it in a buffer by yourself. You are the best judge of your lab technique, but it is important to know the extent of error from making the samples, the error in your reference technique. This can be analyzed and estimated by the precision on your glassware, and checking whether some of that glassware is not properly calibrated or whether there are errors in calculations or transferring or weighing samples. For example, I tried doing a simple calibration with weighing as my reference method, but did not consider the sample was hygroscopic, and gained or lost water weight from day to day. This is an example of not considering all components (moisture). I was confident that I could use a balance correctly, so I knew it had to be something else. Since your sample is a suspension that settles, I suspect that most of your variation is coming from that.
The variation in the cuvettes should be minimal, unless they were not designed to have the same pathlength. Instead of using your actual sample for testing the cuvettes, you could use a clear liquid at constant temperature. This would be a quick way to test them, and give you a good idea of the amount of error from the cuvettes.
I suspect that the greatest cause of error is that your suspension is settling, and possible at different rates due to different particle sizes. Consistent sample presentation would be very difficult, and even more difficult for multiple users. Perhaps someone has experience with this, but using a fiber optic probe in the suspension mechanically stirred a metal beaker might be a better way to present the sample to the instrument. A device so that the depth of penetration into the sample and the stirring rates are consistent may be needed.
Best Regards,
Kathryn Lee
Top of pagePrevious messageNext messageBottom of page Link to this message

David W. Hopkins (Dhopkins)
Posted on Monday, June 28, 2004 - 1:22 am:   

Xuxin,

I agree with Kathryn. And she got me thinking, I don't understand why you cannot get reasonable calibrations with 2nd derivatives.

What pathlength cuvette are you using? Even with 1 mm path, you should not be including the 1930nm water band region in your regressions, because the absorption is too much, and the data is noisy and limited by stray light. Nor should you be using the high wavelength regions, say above 2300nm, for the same reason.

Have you tried the EMSC technique of Harald Martens on your data?

Best regards,
Dave
Top of pagePrevious messageNext messageBottom of page Link to this message

Xuxin Lai (Laixuxin)
Posted on Monday, June 28, 2004 - 7:52 am:   

Dear all,

Many thanks for your replys.

Kathryn, after all these discussion, I also agree that most of the variance should comes from the samples. But I still think there is some cuvette variance, otherwise I could not explain the phenomenon that the same sample measured in some of the cuvettes several times are very different from it measured by the other cuvettes. But I will try some of the suggestions above to figure out what cause this variation. My sample is a finish product from a company. So I don't need to prepare it from the beginning. What I do is just a simple dilution. Within the same batch, the variation from sample preparation should be small. I have build up a precedure to minimize this variation. (Of course until now I am the only user. I don't know what happen when other people follow that precedure.) But it will variate from batch to batch, which could be due to the impurity and the paricle sizes difference as Dave and you suggested. I don't have fiber optic at the moment, but I think it's worth to try if the transmission mode could not work in the end. Thanks for your advice.

Dave, Since the IRE goes up with more factors, I can't simply just use IRE to compare the precisions between that two models, right?
The pathlength of my cuvette is 1cm. So I can only use the wavelength from 700-1300nm for my regression model.
Yes, I also tried to use EMSC, but it didn't help in this case either.

Best regards,
Xuxin
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Monday, June 28, 2004 - 8:37 am:   

Xuxin - you can still use IRE to compare (relative) precision between two models. If the model with more PC factors has a higher IRE, then you can expect that it will have worse precision, since the standard deviation of the readings will be higher. This is in accord with both the concept of IRE, and the fact that we would expect a model with more factors to be more sensitive to noise.

But first you should do the experitment that Kathryn recommended, and measure water many times in a single cuvette, and compute the standard deviation of the readings predicted by the model. You must be careful not to allow any temperature changes of the water sample, or anything else that would cause systematic changes, to occur. (If you try to use both models, you can also verify that the one with the higher IRE gives a larger SD due to the noise).

Then do more measurements of water the same way, but this time in as many different cuvettes as you can, and do the same calculations. This will tell you if the difference between cuvettes is contributing to the total error.

By the time you're done, you'll have some estimates of the contributions of the noise level of the instrument and the additional variation due to the cuvettes, and can compare them to the variations you see with the sample. If the first two are much smaller than what you get with sample, then you can attribute most of the error to the sample variations.

Howard

\o/
/_\

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.