Why Yp -vs- Y with validation plots? Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Why Yp -vs- Y with validation plots? « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark (Unregistered Guest)
Unregistered guest
Posted on Sunday, May 14, 2006 - 5:29 am:   

Andrew - I'm not absolutely sure of the answer, but my best guess is that the reason is historical, and has the same basis as for calling calibrations with the absorbance as the dependent variable and constituent concentration as the independent variable the "Beer's law" method, while MLR as we normally use it (i.e., concentration is the dependent variable and multiple wavelengths as the indpendent variables) the "Inverse Beer's law" method.

Back in the dim (well, not really so dim, I suppose) dawn of pre-chemometric pre-history of calibration methodology, when UV-Vis was the dominant technolgy for quantitative analysis, instruments were based on vacuum tubes, and calibration samples were clear solutions that were made up gravimetrically, it was the sample concentrations that were known with less error, while the instrument readings were considered to be the ones that had greater error and therefore should be treated as the Y variable (now that's probably the worst run-on sentence I've written in a long time!) In those BC days ("BC" meaning "Before Computers"), "calibrations" were done by actually drawing a graph on graph paper, which could be done only when the analytical absorbance band was well-isolated. In this case, it didn't matter a whole lot which way you drew the plot, but since theory (Beer's law) said that the absorbance was a function of concentration, the absorbance was made the Y-variable on the plots, and the concentration the X-variable.

With multivariate (i.e., multiwavelength) analysis, the concentration has to be treated as the Y-variable regardless of the errors. It's fortunate, therefore, that technology has advanced to the point where the absorbance data has the smaller errors, too.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Andrew McGlone (Unregistered Guest)
Unregistered guest
Posted on Sunday, May 14, 2006 - 4:26 am:   

Why are validation results often shown as plots of Yp -vs- Y? Probably an old question but it has troubled me recently and I'm damned if I can find an answer in my reference books.....

The problem (to my thinking anyway) is that these plots necessarily have a slope less than 1 and that doesn't 'sell' well, makes the prediction performance look less than optimal. Indeed if we were considering calibration data then the slope of the regression line for Yp -vs- Y is precisely R^2 and that is always less than one for real data.

Now if we plot Y -vs- Yp for the validation data then the slope will be near 1 (cf, for calibration data it would be 1) and will thus look right. That way around also makes sense to me as generally we make models on the presumption that the dominant error is in the Y values not the wavelength variables (call them X). Thus the modelling exercise basically involves regression of Y -vs- X .
So why when we wish to 'validate' a model do we so often invert the 'error' sense and put Yp (a derived function of X and thus little error) against Y (dominant error)?

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.