Determination coefficient and non-cen... Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Determination coefficient and non-centered data « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

Ciaccheri Leonardo (leonardo)
Junior Member
Username: leonardo

Post Number: 6
Registered: 5-2010
Posted on Monday, June 07, 2010 - 3:35 am:   

Jerry,

do you mean that the choice between centering or non-centering have to be made only when applying regression or classification tools; while in explorative analysis, like PCA, centering is always the best option?

Best Regards.

Leonardo
Top of pagePrevious messageNext messageBottom of page Link to this message

Jerry Jin (jcg2000)
Senior Member
Username: jcg2000

Post Number: 29
Registered: 1-2009
Posted on Friday, June 04, 2010 - 7:38 pm:   

Leonardo,

The three ways you listed for evaluation of R^2 are equivalent. They all work as the same metric : the proportion of variation in y attributable to the prediction model.

R^2 is fixed for a model whether or not you mean-centered your data. If you found these three equations gives your different R^2, I suspect you didn't calculate it properly.

Data-centering prior to PCA analysis is a default because PCA steps from eigenvalue decomposition of variance-covariance matrix where the variance is defined by differences from the mean value. Data-centering serves two roles here: it makes the math manipulation easier; it makes the geometric illustration of PC more understandable. In a word, data-centering is simply for the convenience of computation.

Best,

Jerry Jin
Top of pagePrevious messageNext messageBottom of page Link to this message

Ciaccheri Leonardo (leonardo)
New member
Username: leonardo

Post Number: 5
Registered: 5-2010
Posted on Friday, June 04, 2010 - 2:59 am:   

I have purchased the Tom's article and it was very interesting. My doubts about R-squared, however, were simply an example of the doubts that raised in me when working with non-centered data.

A friend, that works in chemometrics too, said me that in spectroscopy is often better do not center the data because it gives you more easily interpetable loadings.

My experiments, however, highlighetd a series of problem in doing it. Another example is that PCA on non-centered data do no more maximizes data variance but data sum of squares. In other words, if your spectra have a high peak with low variance and a low peak with high variance, is the high peak that goes on the PC1 (I have tried it). This means that PCs are not, necessarily, ordered by decreasing information content.

I have tried to find in literature something about when is better to work with non-centered data and how to interpretate the results in that case, but with little success. Most of the didactic books and articles I found assumed the data were centered.

Do someone have some advice to give about this question?

Best Regards

Leonardo
Top of pagePrevious messageNext messageBottom of page Link to this message

Ian Michael (admin)
Board Administrator
Username: admin

Post Number: 27
Registered: 1-2006
Posted on Friday, May 28, 2010 - 2:53 am:   

I sincerely hope not!! It would be illegal.

All articles can be bought online with a credit card for immediate access. It costs only �12. Just click on the "Buy article on-line" link.
Top of pagePrevious messageNext messageBottom of page Link to this message

Ciaccheri Leonardo (leonardo)
New member
Username: leonardo

Post Number: 4
Registered: 5-2010
Posted on Friday, May 28, 2010 - 2:21 am:   

Thank you very much Tony.

Unfortunately my institute is not a subscriber of NIR news. Do you know if this article can be found somewhere on the web?

Best Regards.

Leonardo
Top of pagePrevious messageNext messageBottom of page Link to this message

Tony Davies (td)
Moderator
Username: td

Post Number: 231
Registered: 1-2001
Posted on Thursday, May 27, 2010 - 4:56 am:   

Hello Leonardo,

I would go fot your first method.

You might find it useful to read an NIR news article by Tom Fearn; the reference is: Fearn, T., NIR news 11/1, 14 (2000).

The most important message about R^2 is "Do not over-interpret" RMSEP is much more important!

Best wishes,

Tony
Top of pagePrevious messageNext messageBottom of page Link to this message

Ciaccheri Leonardo (leonardo)
New member
Username: leonardo

Post Number: 3
Registered: 5-2010
Posted on Wednesday, May 26, 2010 - 3:24 am:   

One of the parameter used to asses the goodness of fit is the so-called determination coefficient, R^2. I have found, in literature, three way, to evaluate it:

1) The squared correlation coefficient between predicted and reference y-values.
2) The ratio of predicted-y variance over reference-y variance.
3) 1 - (RMSEC^2 / reference-y variance); this last is exactly true only if you calculate RMSEC and variances simply dividing by the number of samples.

Until you work on mean-centerd data all three definition re equivalent and bring to the same value. This is not true when you work on non-centerd data, where you got three differnt values (I have tried).

My questions are. What is the true definition for R^2? What is best suited when working on non-centered data?

Thank you for kind assistance.

Leonardo Ciaccheri

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.