NIR Discussion Forum: The necessity of autoscale on Y ?

The necessity of autoscale on Y ? Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » The necessity of autoscale on Y ?

« Previous Next »

Author

Message

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 492
Registered: 9-2001

Posted on Thursday, August 30, 2012 - 1:40 pm:

Zhenqi - it's not entirely clear what "autoscaling on Y" consists of. My guess is that, like "autoscaling on X", it consists of subtracting the mean Y value from each Y value, then dividing (or multiplying) all the Y values by some factor to make the SD of all the Y values be unity.

If this guess is correct, then it's easy to figure out what the mathematics of the calibration prescribes should theoretically happen:

1) The B0 term of the model should change by the amount that you subtracted from the mean Y value of the original data.

2) The coefficients of the model should change by the same factor that you divided the original Y data by.

When doing predictions, to put the answers on the same scale as the original data, you would then have to perform the inverse operations on the raw results computed using the model. I.e., first multiply the computed predicted value by the factor you divided the Y data by, then add back the mean value you subtracted. You should be able to recover the same predicted values (assuming you treated the X data the same way, each time). Since you obtain the same predicted values, you should get the same performance of the model obtained from the two cases.

If that doesn't all happen the way the mathematics prescribes, then you have to figure out why not.

On the subject of making changes to the Y values, however, there is another wrinkle that was discovered (relatively) recently, that you may want to check on. This was published in Applied Spectroscopy, 64(9), p.995-1006 (2010).

What that work showed is that the units that the Y values are expressed in matter, because different ways of expressing concentration are not linearly related, and there may not even be a one-to-one relationship between concentrations expressed in differeht units.

Those findings have both theoretical justification and experimental confirmation.

Howard

\o/
/_\

ZHENQI SHI (shizhenq)
New member
Username: shizhenq

Post Number: 3
Registered: 3-2010

Posted on Thursday, August 30, 2012 - 12:24 pm:

There are some publication addressing the situations when mean-centering is beneficial or not on the X data for building a quantitative model. Is there anything published about the necessity of using autoscale on Y?

The reason triggered me to raise such a question is the following scenario. On a dataset aiming to build a quantitative model for content uniformity on pharmaceutical tablets, out of my curiosity I tried three preprocessing routines including (1) mncn on X and auto on Y, (2) no mncn on X and no auto on Y and (3) only mncn on X and no auto on Y. What I found out is that the model was certainly doable for the first two scenarios, but not for the third scenario.

Given what we used to think mncn on X and auto on Y are necessary for a quantitative model, does anyone come across such a situation before and can anyone help us understand why?

Thank you in advance.