Howard Mark (hlmark)
Post Number: 549
|Posted on Sunday, May 26, 2013 - 4:53 pm: |
Jason - I'm not often in s position to do this, but it's easy to know about something you've invented! The rest comes from simply being around a long time - - - and paying attention!!
Jason Antunovich (jjakhm)
Post Number: 2
|Posted on Friday, May 24, 2013 - 8:52 am: |
Thank-you very much for the response and the literature reference, it is very much appreciated. It seems the applicable knowledge is always at your fingertips. If you don't mind, I may have more questions at a later time.
Howard Mark (hlmark)
Post Number: 548
|Posted on Thursday, May 23, 2013 - 2:41 pm: |
Jason - I believe the initial publication of the RMSG concept was Anal. Chem., 58(2), p.379-384 (1986). In that paper the computations used raw spectral values rather than Principal Components, which had not yet made their way into spectrosopic application. Nevertheless I am pleased to see that use is being made of the concept, even if without proper attribution. However, that paper does give the purpose and rationale for computing RMSG as a way to determine which groups (or classes) individual samples should be assigned to, when computing Mahal. Distances from unequal groups.
Once the concept was published, of course, I cannot vouch for whether or not any particular implementation of it was done correctly. However, considering your concern over the MATLAB results, you have to be aware that the default mode for MATLAB computations is double-precision, which corresponds to 16+ decimal digits. Therefore, variations in the 7th digit is far greater than that amount, and indicates that the MATLAB computations are not the limiting factor for precision.
In fact, 7 digits of precision just about coresponds to the precision of single-precision computations, so you should look for where, in the chain of data handling, single-precision computations are used. It's most likely in the instrument or in the transfer of data to the auxiliary computer.
Jason Antunovich (jjakhm)
Post Number: 1
|Posted on Thursday, May 23, 2013 - 11:58 am: |
To the good people of this forum,
I have an inquiry, of perhaps no practical significance, about a certain Root Mean Square Group size (RMSG) normalization factor utilized in GRAMS/AI PLSplus IQ (ver. 7.02) when reporting the Mahalanobis distance.
The calculation of the factor is described in the user’s guide as written in the attached document. Additionally, the document contains a brief algebraic derivation of the RMSG-PC factor relation. I am not certain if the description in the user’s guide is given for the sake of comprehension, or if the software actually implements the defined procedure for the calculation. It seems that people may find the concept of a RMSG more intuitive than the arguably more abstract concept of the square root of the number of principle components. Thus, perhaps the procedure is so described to inform users (the manual does a good job describing software features with theory). But, it also seems that the number of floating point operations would increase via the route described, and consequently propagate more arithmetic uncertainty (albeit a small amount), relative to simply calculating the root of the number of factors utilized in the model…Unless there is some nuance in the relationship between the factors and M-distance within a PCA that I am missing, which would prompt calculation according to the path described in the manual.
Side note: M-distances (normalized and raw) and scores from “identical” PCA models created in different software packages (both using NIPALS etc.) were compared using MATLAB to verify in practice that RMSG is the square root of the number of factors. The RMSG was very close to theoretical values with variation limited to about the 7th decimal place (excel was used to import data into MATLAB).
I am wondering if anyone could kindly answer which path is more precise or appropriate and which path is implemented in GRAMS (particularly ver. 7.02). I’m sure there may be little sense in concerning oneself with slight numerical variations and different routes of arriving at the same answer (if those variations can even be detected). But I guess I’m just curious and maybe a smidge concerned about potential questions (no matter how low the probability) from a Validations department, QA department, or even the dreaded agency.
This is my first post in this forum, probably because it is such a great resource (populated by many great and knowledgeable people) and answers questions long before I’ve ever thought of them! I’m sorry if this question has been addressed before, but I could not find anything related (also, we don’t have a software maintenance contract with Thermo anymore, so they won’t entertain too many questions…especially since they are up to version 9 or so by now).