PCA and MANOVA Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » PCA and MANOVA « Previous Next »

Author Message
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Friday, August 26, 2005 - 7:21 am:   

Taccagno - Sorry. I didn't know how to say that I couldn't answer the question as you stated it, because I sensed confusion between the fact that all the calibration methods under discussion (MLR, PCR, PLS) are "linear" methods, and the fact that data may be non-linear. Linearity or non-linearity in the data has nothing to do with the linearity of the algorithm, that's determined solely by the nature of the algorithm. This is a point you'll understand better when you learn about how the algorithms work.

Not that non-linearity of the data is a good thing. But that can be handled in any of several ways, one of which is to use a non-linear algorithm. Generally that's not recommneded, though, except for research purposes; non-linear algorithms are much more complicated, difficult to use properly and fraught with potential problems than linear algorithms.

But there are other ways to deal with non-linearity in the data, some of which I mentioned earlier. These allow the data to be used with the simpler and more readily-available software that implement the linear algorithms.

There's a big difference between multivariate calibration and polynomial calibration. Here again I refer you to Draper and Smith, or to my own book: "Principles and Practice of Spectroscopic Calibration" (Wiley). Possibly some of the confusion is created by the fact that there is a similarity in the derivations of multivariate calibration and polynomial calibration. Nevertheless they are very different animals, both in the nature of the data they are applied to, and the nature of the results you obtain. You will never solve a multivariate problem by using polynomial calibration. You must use a multivariate algorithm, which, in practice, means MLR, PCR or PLS. BTW, polynomial calibration is also a linear (and univariate) algorithm, again despite the fact that the data may not be linear. But if you confuse polynomial calibration with multivariate regression (MLR), you'll never straighten yourself out.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Taccagno
Posted on Friday, August 26, 2005 - 1:23 am:   

Howard ,

thanks and ..I know I need to learn a lot ..and I am starting just asking the experts.
Thanks for the reference given.
Training , unfotunately I don't have any budget for that ( real life is not always that funny ! ).

Question one : answered , thanks
Question two : I think I'll go for the manual once again.
Question three : data trsformation I understand , PLS doesn't need ( meaningless !! thanks ! ;) )
linearity - well I need to understand more about PLS but for MLR and PCR I am a bit puzzled.
You said linearity is not needed but it is beneficial ( doesn't this mean something like it is needed ?! ).
Please correct me when I am wrong ( don't say meaningless , that is killing me ! ):
During PCR PCs are extracted from the X matrix indipendently from the Y matrix/vector ( this is called data reduction ).The next step is the regression ( std MLR ) of the components ( latent variables ) on the Y matrix. MLR is defenitively based on a linear equation. Other regression like polynomial and factorial do exist but still the function needs to be monotonic ( is this english ? ).May be I am speculating too much just to take my face out of the mud.Sorry.
As far as I know problems related to separate slopes regression should be solved by means of splitting data into more than one model.
The question is : could you tell me where I am wrong ? I don't have access to experts that often.

Thanks
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Thursday, August 25, 2005 - 7:50 am:   

Taccagno - Oy! - You've got a lot to learn about!!

Let's take your questions one at a time:

1) what makes PLS different from PCR when the Y matrix is monovariate ?

ans: The algorithm used. PCR computes factors from the spectra alone, PLS includes constituent information in the factor computation. They're otherwise very similar approaches. I recommend you read up on chemometric calibration. A good starting book that doesn't use any math is Richard Kramer's "Chemometric Techniques for Quantitative Analysis", Marcel Dekker. After that, there are several intermediate-level books. Now that we're getting into the fall, you might think about taking a short course; if you're planning to go to FACSS or EAS, there are courses offered at both of those. In fact, I'll be giving an NIR workshop at FACSS where I'll talk about just this topic (among others). If you're not planning to go to any conferences, then maybe you should.


2) Can PCA scores be used for a logistic regression ?

ans: read your user's manual, since you seem to be stuck with one software pakcage. I tend to suspect that you wouldn't want to, though.


3) Does PLS need linearity between the X components and Y components ?

ans: Oy! The question is almost meaningless, I'm afraid. Let me give a brief answer to a somewhat different question: all the standard quantitative algorithms (PCR, PLS, MLR) are "linear" algothms, in the mathematical sense. That is, they assume the correct model is of the form:

a0 + a1X1 + a2X2 + ...

None of the algorithms require that the relations between the data be linear in order to use them. Obviously, however, (a) linear relation(s) in the data will give better (more accurate) results than an otherwise-equivalent nonlinear relation. Depending on the data, linearity can sometimes be achieved in one of a number of ways, including pretreating (transforming) the data. Sometimes one non-linear variable can be corrected by the non-linearity in a different variable, again depending on the data. You should read Draper & Smith's "Applied Regression Analysis", Wiley - they have a very good discussion on detecting and correcting non-linearity (and all sorts of other effects in calibration); while they only consider MLR, most of their discussion of testing calibrations can be applied to all algorithms.

But then you again need software to implement it. This all seems moot, though, since your main interest is in identification.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Taccagno
Posted on Thursday, August 25, 2005 - 2:05 am:   

Howard ,

what makes PLS different from PCR when the Y matrix is monovariate ?

Can PCA scores be used for a logistic regression ?

Does PLS need linearity between the X components and Y components ?

Thanks
Top of pagePrevious messageNext messageBottom of page Link to this message

Taccagno
Posted on Thursday, August 25, 2005 - 1:21 am:   

Thanks Howard ,

I have been spending the night thinking and I Think I have been able to shoot dead my procedure.
One of the assumption for ANOVA methods is the response variables to be normally distributed and this is not the case whit spectra.Don't know if this is true but sounds like.
I will follow your suggestion on PLS but I need to build up more knowledge.
Back soon
Thanks
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, August 24, 2005 - 9:54 am:   

Taccagno - yes and no. It's these sorts of issues that make it better to use dedicated software, even when something else will "work". I'm sure you've heard the saying "when all you have is a hammer, every problem looks like a nail". If you're limited in the tools available then you make do with what you've got, and try to make it work. But even when it "works" the results are likely to be less than wonderful. There is fairly inexpensive software available for discriminant problems, if you want to contact me off the discussion group we can talk about it.

As for the technical part of your question: 0 and 1 are legitimate values from the continuum. By making the synthetic constituent values match the sample type, you're sort of tricking the program into relating the spectrum to the sample type by using the "constituent" as a surrogate for that. When it comes time to do the final classification, it's up to you to see which "constituent" is predicted as unity and select the matching sample type.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Taccagno
Posted on Wednesday, August 24, 2005 - 9:36 am:   

Thanks Howard,

just a note : doesn't PLS call for continuous variables , classifications are discrete ( even if they can be associated with continuous matching criteria - in ANOVA that would be the result of the F statistic and the P-value )?
What am I missing ?

RGDS
Top of pagePrevious messageNext messageBottom of page Link to this message

hlmark
Posted on Wednesday, August 24, 2005 - 7:53 am:   

Tacaggno - To find out if Statistica V. 6 includes qualitative (identification) algorithms, your best bet is to contact the company - or check their web site. It's always better to use software thats designed to do the job you want, than to try to "make do" with a software that's designed for something else, just like with anything.

In the event that you need to use PLS, you can do it more straightforwardly. How to do it will depend on whether there are in fact only two types of samples, or more than two.

If there are only two types of samples, then create a variable as a "constituent" that you assign the value 0 (zero) to for one type of sample and 1 (unity) for the other type of sample. Then use that as the "constituent" for your PLS calibration.

If there are more then two types of samples then you have to create as many of these "constituent" variables as you have sample types, and assign the value 1 to each variable to correspond to the sample type. Then you'll have to use PLS-2 to calibrate the data set.

In each case, the "predicted" value from the calibration will be 1 for samples that are of the corresponding sample type and zero otherwise.

Howard

\o/
/_\
Top of pagePrevious messageNext messageBottom of page Link to this message

Taccagno
Posted on Wednesday, August 24, 2005 - 7:10 am:   

Being a stat dummy ,pls don't shoot me.

I want to aply ( if possible ) multivariate technique to non spectral data. Actually I have a device that measures the growth profile of a reacting mixture of chemicals.There is a polymerization ongoing.
For each second there is an height being measured.
Not that far away from a spectrum. ( time for wavelenght and height for absorbance ).

I know there are many classification techniques for multivariate arrays of data ( SIMCA , PLS-DA ). I don't have any doubt on their power but unfortuantely my software doesn't have that.
I am working with Statistica 6 ( does it ? )

Here comes the question :

would it be correct to run a PCA over a number of spectra and collect the scores for a number of components that are representing 100 % of total variation and then running a MANOVA with independent variable being the belonging of spectra to group one or two and so ?

In detail :

10 height scans ( just dropping figures on the table )
5 out of the ten coming from sample A
5 out of the ten from sample B
run PCA
extract scores on a 100 % representation
run MANOVA with dependent variables being the scores on each component and indipendent variable being the sample A or B
read Wilk's labda
if the above is significant than the height scans belong to different populations ( samples ) otherwise not.
The magic would be to classify an unknown spectrum based on the above given procedure.

Please don't tell me to buy a different software.

Thanks

Add Your Message Here
Posting is currently disabled in this topic. Contact your discussion moderator for more information.