Chemometrics software for NIR spectro... Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Chemometrics software for NIR spectroscopy « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Julie Williams (jwilliamscc)
New member
Username: jwilliamscc

Post Number: 1
Registered: 9-2008
Posted on Tuesday, October 28, 2008 - 12:22 pm:   

Hi Charlotte - You may want to consider speaking to one of our engineers at TopNIR (www.Topnir.com) if nothing else they can also help to provide you with some information and direction. TopNIR was developed in a refinery over many years ago and has concentrated on this type of technology
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 44
Registered: 9-2003
Posted on Thursday, April 03, 2008 - 2:49 am:   

Richard,

A separate note with respect to numerical stability of PLS. When I was a post-doc with Bruce Kowalski (1994-96), I derived some intricate formulas that required input from PLS. For those calculations, rounding errors affected the second decimal for a small data set. [I solved that problem using some sort of preconditioning.] The work of Aloko Phatak (unpublished thesis) also reports numerical problems with PLS. There is a lot of work being done in the field of numerical analysis but it has been overlooked in chemometrics. Conversely, the numerical people seem to be unaware of PLS.

The fact that you have never encountered numerical problems, perfectly agrees with those observations. You just haven't looked close enough! If you want to substantiate your claim that all these algos are numerically stable, just provide a mathematical proof. Looking at data sets without being very critical is not a proof. You just limit yourself to basic quantities like scores, loadings, etc. for all time.

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 43
Registered: 9-2003
Posted on Thursday, April 03, 2008 - 2:02 am:   

Richard,

I looked at your chemometrics toolbox and found that you have a "two-way F-test for reduced eigenvalues". Don't you know that that's not an F-test? Not even remotely? That sort of "issues" casts some doubt on the objective "This allows you to concentrate on the chemistry while maintaining confidence in the math." Do you mean "confidence" in a statistical sense?

If statisticians complain about, for example, using the terms "standard error" and "mean squared error" interchangeably, then they are quite right I would say. These are very different things. Calculating an overall "bias" for a set of objects, instead of calculating THE bias for individual objects, is symptomatic for current black-box modeling. Why refer to PLS (say) as a biased regression method and calculate a "bias" which is not related to the bias associated with the regression method?

If we want chemometrics to be recognized as a science, we should do our best to harmonize with fields like statistics.

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 42
Registered: 9-2003
Posted on Thursday, April 03, 2008 - 1:36 am:   

Richard,

I also give courses to newcomers and most of them leave as "happy users" of software, so why should that make me feel comfortable? They just go out to build black boxes with trial-and-error validation. Sometimes it works, sometimes it doesn't - different property, same spectra.

The chemometrics software that I know is about 20 years behind when it comes to properly handling noise in the input data and properly estimating the uncertainty in the output results. Some packages have a jack-knife to estimate uncertainties in regression coefficients. However, I have never seen a decent proof that that's a good approach under any reasonable model assumption.

A couple years ago, I notified you about the incorrect uncertainty estimation that you have in your software for rank annihilation factor analysis. Did you correct that?

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

Richard Kramer (kramer)
Junior Member
Username: kramer

Post Number: 9
Registered: 1-2001
Posted on Wednesday, April 02, 2008 - 5:07 pm:   

Klaas, Mike, et. al.

I've returned from a stretch of traveling and now have an opportunity to jump into this thread.

Regarding, the issue of what Klaas has called the inconsistency in PLS routines: "In short, this inconsistency may have serious consequences for outlier detection. As far as I know, Pirouette (Infometrix) appears to be the only package where this inconsistency has been fixed."

I think it is misleading to call this an inconsistency rather than simply an issue. I'd also like to note here that the Chemometrics Toolbox (Applied Chemometrics, Inc.) has addressed this issue from its first release in 1987 through the present day. Also, recent comparison of the calculation results provided by Pirouette and the Chemometrics Toolbox confirmed that their respective calculation results are identical within 10e-06 to 10e-07. (In the interests of disclosure, for those who may not know, I am the main author of the Chemometrics Toolbox.)

The issue (not the inconsistency) is that various PLS algorithms can deliver factors which either are or are not orthornormal. Some algorithms, such as the one described in -Multivariate Calibration- by Martens and Naes, deliver both the sets of factors. The non orthornomal factors are usually called, simply, the loadings. The orthornormal factors are called, by some, the loading weights or simply, the weights.

Since, PLS is applied empirically, there is no universally applicable rule which determines whether it is more appropriate to use the loadings or the loading weights for outlier detection for a particular application. Also, while it might seem like common sense that it would be best to use the same factors for outlier detection which were used for the calibration, that is not always the case. Indeed, when outlier detection is being performed for the purpose of qualifying an unknown sample as eligible for prediction by a calibration, the origin of the calibration, including which factors were used to develop it, is truly, in the most rigorous sense, irrelevant. For validation, the important question is whether or not the candidate measurement of the unknown is statistically significantly different from the measurements of the samples which were used to VALIDATE the calibration. In some cases, this qualification test may be more reliable when using the loadings, in other cases this qualification test may be more reliable when using the loading weights. In many, perhaps most, cases, this qualification test is equally reliable using either set of factors. I believe that it is important that chemometrics software provide the user both the PLS loadings and the PLS loading weights. The Chemometrics Toolbox has always provided these. I applaud Infometrix for supporting this in their Pirouette package.

Mike wondered: "My comments on the reviewer's statements caused me a mild concern as to whether this book, as a first-time exposure to chemometrics for a "newbie" might be daunting if the math got in the way. It would be interesting to see Richard's reaction."

My book is an attempt to bring neophytes and beginners to the point where they can successfully develop chemometric calibrations using off the shelf software while operating within a demanding, industrial context. I try to do it without using math in the explanations. (There are some basic equations in the book, but they can be ignored by those who wish to do so). In other words, I think it is possible to keep the math out of the way. I've been accused by some of perpetrating a great heresy on the community by teaching the topic improperly. If that's true, I must be an ineffective instructor judging by the large number of people who have survived my classes and/or my book and who have, despite my misguidance, gone on to successful careers in multivariate calibration.

Klaas said: "As far as I know, there is currently not a single PLS algorithm (in chemometrics packages) that is numerically stable. While many efforts went into developing fast algorithms, this aspect has been largely ignored.... The suggested improvement leads to a calculation that is as stable as Matlab's svd."

I think this comment is very unfair. Certainly, the Chemometrics Toolbox provides numerically stable results. In fact, the Chemometrics Toolbox relies on the very Matlab svd algorithm, which Klaas suggests as the standard for numerical stability. Pirouette, which produces results identical to the Chemometrics Toolbox is also clearly numerically stable. I've not (yet) conducted direct numerical comparisons with any of the other popular packages, but we frequently develop calibrations for clients using these other packages. We have not had the occasion to question the numerical stability of the Unscrambler. We also have some experience with Thermo's PL:S-IQ and less with Umetrics' Simca. In no case have we noticed any numerical problems with any of these packages. There are issues among the various packages involving supported features, algorithmic choices, reporting of results, visualization, etc., but in my experience, they all seem to perform essentially as advertised.

As to the matter of the use, misuse, calculation, and miscalculation of statistics, I'll defer to another time. I will, however, offer one final thought: When working in an industrial context, if the discrepancies among the various ways statistics are employed has significant financial implications, this is probably an application which should be abandoned. If the margin between success and failure is so small that the issue of how to split the statistical hair becomes important, the chances of successful long-term deployment of such a marginal calibration are probably below the acceptable range. At such times it is often best to cut your losses early and move on to more promising, more practical applications.
Top of pagePrevious messageNext messageBottom of page Link to this message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 188
Registered: 9-2001
Posted on Wednesday, April 02, 2008 - 8:50 am:   

Mike - I did forward your message to Richard, along with the link it contained. I think Richard is travelling heavily right now, though, so he'll have to respond whenever he can.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Michael C Mound (mike)
Senior Member
Username: mike

Post Number: 52
Registered: 7-2007
Posted on Wednesday, April 02, 2008 - 5:29 am:   

Howard,

I couldn't agree with you more. Considering your prolific article and book output, it doesn't surprise me that you have experienced differences of opinions on your writings, as we all have. I have personally contributed to numerous books, trade journals, treatises, etc., in the fields of automation, process analytics, collaborative production management, automated laboratories, analytical instrumentation, including XRF, XRD, neutron analyzers, etc., petroleum exploration, geological and mining, and paleontology. It would be less than truthful to say that there were no adverse reviews that I had to endure (though, fortunately, these were in the minority), and they still sting.

However, I also have felt that it is much easier to criticize a work than it is to create it (remember the fable of the Little Red Hen?). As you put it, if the critic is so sure that his way is better, he should please contribute to the store of knowledge and write his thoughts up for exposure to the bright light of scrutiny.

We can't all agree on everything, nor should we expect to. That's the spirit and the juice of a forum for me.

My comments on the reviewer's statements caused me a mild concern as to whether this book, as a first-time exposure to chemometrics for a "newbie" might be daunting if the math got in the way. It would be interesting to see Richard's reaction.

I also liked Klaas' comments.

Best,

Mike
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 41
Registered: 9-2003
Posted on Tuesday, April 01, 2008 - 2:43 am:   

Hello Chuck,

What I meant was that chemometrics packages should agree about the basic calculations. Now, unlike statistics packages, they often conflict. There is a good excuse, however, for missing the inconsistency in PLS calculations. Seldom does one see a surprise at that fundamental level. Usually, progress takes place on a techical level. For an example, see:

N.M. Faber and J. Ferr�, On the numerical stability of two widely used PLS algorithms, Journal of Chemometrics, 22 (2008) 101-105

As far as I know, there is currently not a single PLS algorithm (in chemometrics packages) that is numerically stable. While many efforts went into developing fast algorithms, this aspect has been largely ignored.

Is the improvement "significant"? I would say that that's not a scientific question. [By the way, for certain data sets we looked at, SIMPLS would not work in single precision arithmetic. Therefore, some time ago the improvement would have been substantial for that particular algo.] It is highly desirable, also from a practical viewpoint, that various packages "simply" produce the same results for basic calculations. The suggested improvement leads to a calculation that is as stable as Matlab's svd. Surely, we expect that one to be numerically stable. Personally, I always try to go for the best solution, unless the increase of effort makes it counterproductive.

Regards,

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 40
Registered: 9-2003
Posted on Tuesday, April 01, 2008 - 2:01 am:   

Dongsheng,

I will just focus on the statement "Y-deviation attempts to give some uncertainty indications." This is not how it was originally described in:

M. Hoy, K. Steen, H. Martens, Review of partial least squares regression prediction error in Unscrambler, Chemom. Intell. Lab. Syst. 44 (1998) 123-133

In the conclusion section, one reads: "The present results indicate that the uncertainty estimator works as intended. It gives a representative, slightly conservative estimate of the uncertainty standard deviation of the estimated y-value y�i j."

It was intended to provide intervals (see coverages in Table 3) and, consequently, the results of the calculation are presented as intervals by the Unscrambler. Only later, did its funtion evolve to what it is now: an outlier detection tool. However, since it results from an incorrect derivation, it is not clear at all how a user should decide (between normal and abnormal). The "properties" are unknown so what could the critical value be?

Moreover, have a look at a statement in the following paper:

S. De Vries, C.J. Ter Braak, Prediction error in partial least squares regression: a critique on the deviation used in the Unscrambler,
Chemom. Intell. Lab. Syst. 30 (1995) 239�245

"Alternative, mathematically rigorous estimators for the MSEP in PLS regression have already been suggested."

That message (from 1995!) is difficult to misunderstand.

Regards,

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

David Russell (russell)
Senior Member
Username: russell

Post Number: 38
Registered: 2-2001
Posted on Monday, March 31, 2008 - 4:20 pm:   

Thanks to Chuck and Scott for their posts.

Chuck saved me the trouble of re-reading the Pell reference.

The bottom line in the process world is that results matter.

So if the predictions are calculated consistantly the correctness of the statistics are less of an issue.

But it would, of course, be nice if all of the major packages would generate equivalent performance statistics.

But I agree with Howard's initial suggestion that Charlotte should buy/borrow one of the recommended books to get a feel for the task. Then she can decide whether to charge on or seek experienced assistance.
Top of pagePrevious messageNext messageBottom of page Link to this message

Dongsheng Bu (dbu)
Advanced Member
Username: dbu

Post Number: 23
Registered: 6-2006
Posted on Monday, March 31, 2008 - 4:00 pm:   

Klaas,

Wow, this is amazing discussion. I got your opinions in the middle of long thread: 1) �y-deviation is incorrect� and your advice is �to replace it�; 2) �Jack-knife cannot be used for inference about the scores�. Thanks for your advices and links. I have enjoyed your posts and informative links. I need to read them first as usual and see if I misunderstood something.

Interest people may find related algorithm or references in p31 and p41 from
http://www.camo.com/downloads/U9.6%20pdf%20manual/The%20Unscrambler%20Method%20References.pdf
Or, find related usage guidance in p210 and p327 if you have our Introducer book.

I was surprised by your judgment in your first post regarding wrong PLS calculation. We know Jack-Knife won�t change the regression model (inference scores?) and y-deviation won�t change the prediction results. I understand that someone like Jack-knife for building better model, and someone may not. Y-deviation attempts to give some uncertainty indications. User can either rely on available X-residual and Hotelling T2 by statistic way, or as plus, setup approximate threshold for y-deviation by practical way. Maybe there is more reliable approximation to replace y-Deviation for prediction uncertainty indication.

Thanks again,

Dongsheng
Top of pagePrevious messageNext messageBottom of page Link to this message

Scott Ramos (lsramos)
New member
Username: lsramos

Post Number: 4
Registered: 1-2007
Posted on Monday, March 31, 2008 - 1:35 pm:   

Chuck,

Your description of the premise of the paper is very nicely stated. Key is that predictions will not differ, but outlier diagnostics based on spectral residuals will. The magnitude of this difference will very much be data specific, and you are correct to point out that this difference should be contrasted with the magnitude of error deriving from other sources.

In the paper, we show that with the bidiagonalization algorithm, regression and data reconstruction derive from the same subspace. In the Martens and Naes book, Multivariate Calibration, the non-orthogonalized PLSR is described (p. 120)--it also derives from a single subspace and thus has the same benefits.

At this point, Charlotte is probably overwhelmed as the discussion has evolved substantially from her original questions. If you are still reading this thread, Charlotte, you should probably start at the beginning for practical suggestions on how to proceed with your study.

Scott
Top of pagePrevious messageNext messageBottom of page Link to this message

Charles E. Miller (millerce)
New member
Username: millerce

Post Number: 4
Registered: 10-2006
Posted on Monday, March 31, 2008 - 12:30 pm:   

Hello Klaas:

I wanted to make a few comments about your recent post regarding the very nice work of Pell et al - a technical one and a philosophical one.

First of all, if I understand correctly, the authors of the paper (Hello Scott!) dutifully point out that an internal inconsistency exists for several commonly-used PLS algorithms (NIPALS- I think- and SIMPLS): whereby the regression vector and (X-)data reconstruction model are based on slightly different subspaces. If this is an accurate assessment, then it does not necessarily mean that the �offending� algorithms systematically generate poorer predictive models than the non-offending ones- just that a direct �correspondence� of the reconstruction model residuals to those predictions cannot be assumed for these algorithms. Furthermore, I think that this discrepancy would affect outlier detection for both calibration and prediction in the same manner, as it is an inherent property of the algorithm. I think that your post might have given some folks some wrong impressions.

This is an important finding that challenged a previous misconception that I had, but I am still trying to understand the full nature and magnitude of its implications for chemometrics users. Clearly, it does not affect predictions, but only outlier detections. If one is using the identical PLS model to both generate property predictions and to generate a reconstructed spectrum for the purpose of outlier detection/sample screening in prediction, then there might be some cause for concern in a few cases. Even then, though, it�s not clear to me (yet) that the implications of this algorithmic �deficiency� have a high enough magnitude to impact most practical applications, especially when one considers the magnitude of errors from other typical sources (data collection, model over/underfitting�).

Secondly, I won�t argue that chemometricians have never been ignorant of the �rigors� of statistics (and vice versa!). However, I will stop short of suggesting to a forum of users that NIPALS and SIMPLS are �incorrect� algorithms that have led users astray over all these years. I don�t think that this is what you had meant, but it just sounded that way.


Best Regards,
Chuck
Top of pagePrevious messageNext messageBottom of page Link to this message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 187
Registered: 9-2001
Posted on Monday, March 31, 2008 - 9:03 am:   

Mike - no, I wasn't aware of the review, and I believe that Richard isn't, either. However, I did peruse it from your link, and I don't necessarily agree with all of it. However, the best person to defend Richard's work is Richard himself, so I will forward your comments to him, along with the link to the discussion group, so he can have his own say.

I will make a couple of comments, though:
First, the book was intended for the newbie to chemometrics, and when I recommended it, it was also recommended as being an elementary text for the newcomer. Since there is no claim in the book that it is intended to teach advanced chemometric mathematics, and in fact explicitely states that the plan is to do it without math, it is unfair for the reviewer to downgrade it for its lack of mathematical content.

Much of the rest of the discussion about the content, especially the part where he feels that some topics that should have been there were missing is the reviewer's opinion about what he thinks belongs in a book about chemometrics. In one sense this is OK, since a review is partially about opinion. In another sense, though, just because the auther of a book has a different opinion about what is important than the reviewer does, doesn't automatically make it a bad book. My books were also criticized for not including topics that various people felt were important, and my reaction then, as is my reaction to those parts of this review, is something on the order of "Well, then let him write his own book, and then he can put in whatever he thinks is needed." But a reviewer can't expect a book author to write the book he would have written if he wrote one, he's got to do it himself if he thinks it's important enough.

The review uses some rather arcane formatting to represent various equations, and I don't think that those were inventions of the reviewer, since he was talking about poor editing rather than substantive errors in the text. However, when I went to check my copy of the book (which is a pretty early (1998) edition) for those cases he mentioned, I found that in copy of the book those equations were properly formatted. I don't know why or how the reviewer got hold of a copy with apparently uninterpreted equation formatting codes; could he have had a preproduction copy that had not been put into final form?

Klaas's comments are more substantive, and I will let Richard and Klaas argue it out between them; it should be interesting!!

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 39
Registered: 9-2003
Posted on Monday, March 31, 2008 - 8:44 am:   

Hello Mike,

Thanks for the support!

Note that there are plenty of statisticians doing good work in the area of data mining, for example, which is even much softer than chemometrics has ever been. [Think of multiple comparisons.]

I don't think there need be a conflict. Most issues have a historical origin mainly. In particular, "confusing" terminology and "sloppy" mathematics can both be solved. Sometimes the solution has always been around. The main problem that I see is that practical chemometrics is almost synonymous with applying software - "canned routines". Once something is implemented in software, it is hard to get it out. Soap companies have a habit of advertizing improved formulas. Can't we learn something from that!?

Regards,

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

Michael C Mound (mike)
Senior Member
Username: mike

Post Number: 51
Registered: 7-2007
Posted on Monday, March 31, 2008 - 7:06 am:   

Hello, Klaas,

Thanks for the input. I agree with your points (I have been following the lively interchanges with great interest), and am especially concerned with the areas of confusion. In the past, I have had some crossed swords with pure statisticians who have been tough on the pragmatics of chemometrics and who conveniently forget its origins.

Best,

Mike
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 38
Registered: 9-2003
Posted on Monday, March 31, 2008 - 6:51 am:   

Hello Mike,

That review is illustrative of what I meant in several discussions on this forum and elsewhere. I will just pick out two comments by the reviewer:
1. "Several questions are left unanswered by these comparisons. Are the applications on the different datasets significantly different? What effects are causing the different values for the four statistics? What did we learn about the application of the four chemometric methods?" These are all natural questions for a statistician. You need answers to be conclusive about an analysis. Unfortunately, chemometrics is often too "soft" to provide these answers, even if methodology has been developed (usually by statisticians). For example, significance can always be assessed using a randomization test (aka permutation test). I know that SIMCA has a randomization test to assess the significance of a model, once it is build, but one could also use similar testing schemes during model building, or to compare (competing) models, as has been proposed by Hilko van der Voet (and has been implemented in SAS).

2. "In general, examples and statistics are presented, but the book lacked explanations and information. Definitions of the comparison statistics are given in Appendix B. Statisticians will be surprised by some of the explanations, such as the following:
1. The variance of prediction, [S.sup.2], for a set of samples is defined as
[[[sigma].sup.n].sub.i=1] [([y.sub.i] - [y.sub.i] - bias).sup.2]/n - 1
2. SEP is simply the square root of the variance of prediction, [S.sup.2].
3. The standard error of estimate (SEE), the standard error of calibration (SEC), the root mean squared error of estimate (RMSEE), and the root mean squared error (RMSE) are used interchangeably.
These definitions should alert statisticians to take care that they understand the statistics and terminology being used in the chemometric literature." The terminology is absolutely wrong. A solution to that problem is far from obvious because it is so well entrenched. I mean a solution in practice. It should be fairly easy to correct the book.

Regards,

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

Michael C Mound (mike)
Senior Member
Username: mike

Post Number: 50
Registered: 7-2007
Posted on Monday, March 31, 2008 - 6:27 am:   

Hi, Howard,

I just happened to come across a rather critical evaluation of the book you recommended to Charlotte recently. I was wondering if you were familiar with the critique and would be interested in your reactions.

The review is as of the date of publication of the book by Kramer, i.e.,

R. Kramer; "Chemometric Techniques for Quantitative Analysis", Marcel Dekker (1998)

You can see this review, if you are not familiar with it at:

http://www.allbusiness.com/technology/396880-1.html

Best,

Mike
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 37
Registered: 9-2003
Posted on Monday, March 31, 2008 - 4:20 am:   

Dongsheng,

Forgot to mention where you can find the correct methodology. You can download an IUPAC report directly from my website:

http://www.chemometry.com/Index/Calibration.html

That report ("Part 3") is intended to be complete up to 2006. However, new contributions appear quite frequently. For an alternative for the y-deviation,see:

J.W.B. Braga and R.J. Poppi
Comparison of variance sources and confidence limits in two PLSR models for determination of the polymorphic purity of carbamazepine
Chemometrics and Intelligent Laboratory Systems, 80 (2006) 50-56

The abstract for that article, as well as contacting information, can be found on:
http://www.chemometry.com/Index/Links%20and%20downloads/Papers/SelectedReferencesMVC.html

Regards,

Klaas Faber
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 36
Registered: 9-2003
Posted on Monday, March 31, 2008 - 2:20 am:   

Dongsheng,

What you call the y-deviation is incorrect. This has been pointed out in several articles since 1996. Nevertheless, I still encounter it in applied work. For a recent example, see Figure 8 in:

A.M.C. Davies and T. Fearn, Back to basics: the �final� calibration, Spectroscopy Europe, 19(6) (2007) 15-18.

My advice is to replace it, which won't be simple because it so well-entrenched by now. Note that the Unscrambler also gives deviations for the loadings, scores and coefficients which are "plausible", but not good. In particular, what is calculated for the scores. Graphically nice indeed; however, statistically speaking it is non-sense. That's not my personal opinion but a solid fact: a jack-knife cannot be used for inference about the scores.

Regards,

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

Dongsheng Bu (dbu)
Advanced Member
Username: dbu

Post Number: 22
Registered: 6-2006
Posted on Saturday, March 29, 2008 - 11:13 am:   

Dear Klaas,

We knew that we need to make a distinction between calibration and prediction phase. I was referring to outlier detection both in the calibration and prediction phases. I suggest one to know a product in deepth before making a judgment. The Unscrambler provides many ways such as HT2 and HT2Lim in calibration phase in addition to what you mentioned. You may not notice this because they are available in numerical format not fully in the graphic. Acutally, there are 6 significance limits (includes 95%) in saved Unscrambler regression/callsification model. We also provide several means in the prediction phase, such as residual/leverage/y-deviation in the Unscrambler, and HT2 and HT2Lim in the real-time prediction.

I cannot agree more that visualization for interpretation and statistics to accompany those plots to guide users in their decisions. The Unscrambler visualization features are usually very strong, but outlier detection in prediction phase seems exceptional. However, visualization and formula incorrection are two different things. Please advice.


Best regards,
Dongsheng
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 35
Registered: 9-2003
Posted on Saturday, March 29, 2008 - 3:09 am:   

Dongsheng,

I didn't keep the messages, but the pertinent reference is:

Pell RJ, Ramos LS, Manne R. The model space in partial least squares regression. J. Chemometrics 2007; 21:165-172

I am seldom surprised, but this time I was.

With regards to outlier detection, one needs to make a distinction between calibration and prediction phase. I referred to outlier detection in the prediction phase, whereas you are referring to outlier detection in the calibration phase. There's also a lot to say about that! In the calbration phase, the Unscrambler calculates residuals and leverages, but that's all there is. These numbers are displayed in plots. However, there is no statistical testing so how to decide? Hotelling ellipses are calculated that cover 95% (say) of the calibration data, but that's not a 95% (say) region for prediction data. [Sampling error is ignored.] And so on, and so forth. Where's the multivariate statistics?

What would be beneficial, in my opinion, is to combine the best of two worlds: visualization for interpretation and statistics to accompany those plots to guide users in their decisions.

Klaas
Top of pagePrevious messageNext messageBottom of page Link to this message

Dongsheng Bu (dbu)
Advanced Member
Username: dbu

Post Number: 21
Registered: 6-2006
Posted on Friday, March 28, 2008 - 10:19 pm:   

Dear Klaas,

Could you tell me month/year of the discussion on the ICS list about an inconsistency in PLS calculations. I have followed ICS post tightly in the past 2 years, and I could not remember I saw this topic on the ICS. There was one on multivariate outlier detection in Feb 2007, though.

The Unscrambler provides several methods for outlier detection, i.e., sample X-residual and residual limits, Hotelling��s T-square (HT2) and HT2 limits, leverage, and y-deviation. Details of the algorithms have been public for many years and also posted on CAMO website. Could you tell me which formula that results from an incorrect derivation and how it conflicts to well-established statistic rules?

I would love to see ��real�� statistics likes residual, Hotelling��s T-square with limits. I also would like to see great practical methods verified by many cases and some day would turn to ��real�� statistics.

Best regards,
Dongsheng
Top of pagePrevious messageNext messageBottom of page Link to this message

Klaas Faber (faber)
Senior Member
Username: faber

Post Number: 34
Registered: 9-2003
Posted on Friday, March 28, 2008 - 5:21 am:   

Hello Tony,

I rephrase your assessment (chemometrics vs. statistics) a little bit: statistics packages tend to have fewer plots but chemometrics packages tend to be weak when it comes to statistics. As an illustration: There has been a discussion quite recently on the ICS list about an inconsistency in PLS calculations. In short, this inconsistency may have serious consequences for outlier detection. As far as I know, Pirouette (Infometrix) appears to be the only package where this inconsistency has been fixed. The Unscrambler (CAMO) on the other hand bases outlier detection on a formula that results from an incorrect derivation! You won't find this kind of conflicts between statistics packages.

In conclusion, I would like to see some more "real" statistics (instead of re-inventions of statistics) in chemometrics packages. Too often people use terms like "significant" and it's not at all clear what they are trying to say.

Klaas Faber
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (td)
Moderator
Username: td

Post Number: 171
Registered: 1-2001
Posted on Wednesday, March 26, 2008 - 1:06 pm:   

Hello Charlotte,

You have received several good replies on software but much of the rest of your question has been ignored.

I think that if you really need to make progress with this project in five months then you really need a consultant. There are several of us around!
Here are a few questions for you:
1) How do you specify �oil composition�?
2) Am I correct in thinking that you do not have an NIR spectrometer?
3) Is the long term aim to make measurements of the oil in situ? If �Yes�, this presents another layer of difficulty!

An answer to one of your questions:
Differences between Chemometric and Statistical packages?
Good chemometric packages present results of computations in several different graphical presentations so that we can see (and understand) how the analysis is progressing. Statistical software tends to just do the calculations and leaves you to decided how to view it.

Rather than the book suggested by my friend Chuck Miller (Hi Chuck!) I would suggest the book by Harald and Magni Martens �Multivariate Analysis of Quality: An Introduction� which gives a more readable introduction to what chemometrics is about �Data analysis is a cognitive discipline Not a mathematical exercise!� [Available from NIR Publications as are many other books on NIR spectroscopy and Chemometrics]

We are all here to help you!

Best wishes,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

Michael C Mound (mike)
Senior Member
Username: mike

Post Number: 49
Registered: 7-2007
Posted on Wednesday, March 26, 2008 - 4:36 am:   

Hello, Charlotte,

Wow! What a great response from a call for aid!!!

If I may add...one other and very readable introductory source, in addition to the excellent suggestions already made to you is

Chemometrics, Statistics and Computer Application in Analytical Chemistry, Matthias Otto, Wiley-VCH, 1999.

Also,

Chemometrics in Environmental Analysis, J. Einax, H. Zwanziger, and S. Geiss, 1997.

Good luck.

Mike
Top of pagePrevious messageNext messageBottom of page Link to this message

Charles E. Miller (millerce)
New member
Username: millerce

Post Number: 3
Registered: 10-2006
Posted on Tuesday, March 25, 2008 - 11:59 am:   

Hello Charlotte:

To add a few comments to those already posted..

With regard to software, I am hopelessly biased, as I work for a company that sells such software and related services. There are several good commercial options available, as well as some "shareware" ones that work on different platforms. Getting a better understanding of chemometrics, and how it relates to your problem, is the first step in getting the most effective software tool. (Hopefully, your Purchasing Department is not "pushing" you to make a quick purchase...)

The text "Multivariate Calibration" by Martens and Naes (Wiley, 1989) is a very effective chemometrics reference, and is surprisingly relevant almost 20 years after its initial publication! There is a "philosophy" behind chemometrics, the understanding of which, I think, is critical for its effective use. The authors do a very good job of explaining this, laying the foundations of the technology, and balancing tutorial material with advanced statistical details.

We have also found that many folks find short courses particularly useful for getting introduced to a new topic. Several companies (including ours) offer such courses at technical conferences, customer sites, and at special training events.

I hope that you find this useful, and please let me know if I can be of further assistance.


Best Regards,
Chuck
Top of pagePrevious messageNext messageBottom of page Link to this message

Scott Ramos (lsramos)
New member
Username: lsramos

Post Number: 3
Registered: 1-2007
Posted on Tuesday, March 25, 2008 - 10:50 am:   

Charlotte,

Your plan of attack is very appropriate--it would be wise to evaluate a small set of spectra to see if you can in fact correlate information embedded in the spectra to the properties you plan to monitor. Because this approach will require a multivariate analysis, a general statistical package may not be adequate. Of course, there are chemometric add-ons to R and other toolkits, and there are several commercial chemometrics packages such as those already mentioned (and, excuse the promotion, our software Pirouette).

There are several geochemists in Australia who are working on applications involving chemometric analysis. If you are interested, I can put you in touch with some of them; please contact me off-line (scott_ramos at infometrix dot com). In addition, there is a wealth of information on multivariate analysis of geochemical data in the literature, although a good bulk of it is for chromatographic data.

Scott
Top of pagePrevious messageNext messageBottom of page Link to this message

Christian Mora (cmora)
Intermediate Member
Username: cmora

Post Number: 20
Registered: 2-2007
Posted on Tuesday, March 25, 2008 - 9:36 am:   

This may help you:
http://www.jstatsoft.org/v18/i02
Christian
Top of pagePrevious messageNext messageBottom of page Link to this message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 186
Registered: 9-2001
Posted on Tuesday, March 25, 2008 - 8:19 am:   

Venky - certainly there are many good articles published about chemometrics, too. But again, space limitations in a journal mean that single articles must of necessity be limited in scope. A book can cover many more topics, and do it all in one place, so that the reader doesn't have to spend much time hunting down all the information in different places.

Lectures are good, as you say, which is the reason I recommended a short course, and it's unfortunate that none are coming up in the US in the very near future.

Another resource would be consultants such as myself, but since promotion of commercial activities is discouraged on the discussion group, I didn't want to bring it up.

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

venkatarman (venkynir)
Senior Member
Username: venkynir

Post Number: 59
Registered: 3-2004
Posted on Tuesday, March 25, 2008 - 6:53 am:   

Dear Hlmark;
Thanks for your complement.
I do have indirectly suggested to read book by word to be good chemometrician .
Recently I have invited Prof.Richard to INDIA to create awarness in Chemometrics .We have chemometrics tutorails section.
What I feel read and work will help for quick understanding and do better than merely reading.
I hope you will agree with me.
Kindly look at the artickle published by
[email protected]
Top of pagePrevious messageNext messageBottom of page Link to this message

venkatarman (venkynir)
Senior Member
Username: venkynir

Post Number: 58
Registered: 3-2004
Posted on Tuesday, March 25, 2008 - 6:32 am:   

Dear Hlmark;
Thanks for you complement.
I do have indirectly suggested to read book by word to be good chemometrician .
Recently I have invited Prof.Richard to INDIA to create awarness in Chemometrics .We have chemometrics tutorails section.
What I feel read and work will help for quick understanding and do better than merely reading.
I hope you will agree with me.
Top of pagePrevious messageNext messageBottom of page Link to this message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 185
Registered: 9-2001
Posted on Tuesday, March 25, 2008 - 5:02 am:   

Charlotte - Venky's advice is correct, but of necessity to brief to be very useful. I'd say that before you run out to buy software, you should spend a few dollars (euros, yen, whatever) on a good book or two about chemometrics and multivariate analysis.

There are several books available, that can give you a good overview of what chemometrics is and does and how to use it. There are also books about NIR that can more generally advise you on methods of sample presentation, pitfalls, etc. I'll just mention a couple here, out of the dozen or two on my bookshelf.

For the complete novice in chemometrics, the best book to start with is:

R. Kramer; "Chemometric Techniques for Quantitative Analysis", Marcel Dekker (1998)

At a somewhat more advanced level:

R. Brereton; "Chemometrics: Data Analysis for the Laboratory and Chemical Plant"; Wiley (2003)

K. Beebe, et al, "Chemoemtrics: a Practical Guide"; Wiley (1998)

NIR Books:

D. Burns, et at, (ed), "Handbook fo Near-Infrared Analysis" (2nd ed); Marcel Dekker (2001) (Note that the third edition is either just out or soon to come out)

It's too bad Pittcon is just past; you could have taken a short course(s) in chemometrics and/or NIR analysis being offered in conjuction with the conference, where most of the key topics would have been discussed. I don't know of any being offered in the near future; the next one I know about will be offered at the IDRC conference this summer (see http://www.idrc-chambersburg.org)

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

venkatarman (venkynir)
Senior Member
Username: venkynir

Post Number: 57
Registered: 3-2004
Posted on Tuesday, March 25, 2008 - 12:04 am:   

Dear Farliec;
NIRS and chemometrics are unseparable for multivariate analysis.
It not peak it is composition of over tone Bands like O-H,NH ,CH .
First opproach is correct .Colllect the NIR spectra know composition .Then use emperical mehtod for analysis.
To Be an Good chemoemtrican you shoould follow the six steps.
1.Pre-porcess the data
2.Information to be extracted from the spectra (object)
3. Model building ( PCA,SIMCA,&PLS)
4. Prediction
5. valdiation ( different type)
6.Freezing the model.
There are good amount of software package available like Unscrambler to PLS tool box
3.chemometrics we are extracting chemcial and physical information through statistical and mathamatical method.
Statistical people say factor used.Here the word principal component is vital.
Chemometrics can applied to flourescence spectra like NPLS and also PARAFAC.
Top of pagePrevious messageNext messageBottom of page Link to this message

Charlotte Stalvies (farliec)
New member
Username: farliec

Post Number: 1
Registered: 3-2008
Posted on Monday, March 24, 2008 - 11:32 pm:   

Hi All,

I wonder if anybody can help me please.

For the next five months I will be working on a project which aims to relate the bulk chemical composition of fluid inclusions in rocks to the NIR spectra obtained from those inclusions. The group that I have joined investigates petroleum inclusions as a means of determining the past presence of oil or gas zones, and also as a guide to work out reservoir filling and oil migration pathways.

Currently, the presence of oil inclusions is assessed using optical microscopy, followed by excitation of any oil inclusions present using UV. The fluorescence colour of an oil can be related to CIE chromaticity co-ordinates, which in turn correlate with the API gravity, viscosity and gross composition of the oil.

Initially I will have to devise a calibration set of samples to see if the procedure actually works. Where I work we have oils of known composition which can be used for this. I will need to obtain NIR spectra for these oils and theThese will comprise a group of oils of known composition. The spectra

Rather than diving straight into working on inclusions, the first step in my project will be to obtain NIR spectra on a set of oils of known composition. Once I have the spectra, I then need some way of relating these to the composition. The plan is to use chemometrics for this. The problem for me is that I am actually a geologist, so I am completely new to both spectroscopy and chemometrics so there are several questions that I have. Firstly, would the peaks in the NIR spectra need to be resolved before applying chemometric techniques to the data? I was wondering if this introduces any bias. Secondly, I was wondering if any of you experts out there were tasked with the same job, what kind of software would you use, specifically; is there a difference between chemometric software and a general statistical package such a R which is capable of multivariate analysis?

Any information would be much appreciated.

Thanks
Charlotte

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.