https://www.euro-online.org/enog/inoc2007/Papers/mac-slots.html https://www.euro-online.org/enog/inoc2007/Papers/m https://www.euro-online.org/enog/inoc2007/Papers/mac-slots.html

IASIM 2018 Advertisement

Treatment of outliers in pca to answer specific question

scottanthonyparsons@gmail.com's picture
Forums: 

Hello there,
I have a series of spectra (NIR) where I am trying to determine whether the broad composition of the material converged or diverged over the series (as simple as that). I do this using distances (e.g. bray-curtis) in the full ordination space (8 axes, using kernal pc analysis) and regressions.  I have attempted this analysis both by excluding and including spectral outliers. I am curious how accurate my models including outliers may be to answer my broad question? i.e. the outliers may intuitively contribute quite a bit to the conclusion of divergence or convergence in this material. I understand that my pc axes will be driven by these outliers, but as I go on to treat each axis equally to cover the variance without overfitting in both cases, im not sure it is a problem. is it?
Cheers!

td's picture

Hello Scott,
WELCOME to the NIR forum!
I'm not a chemometric expert but I do have an interest in meaures of similarity. I had not heard of "Bray-Curtis" and I had to look it up in Wikipedia!
It told me that Bray Curtis (BC) should not be conisidered a distance measure.
Before it is possible to give you any sort of answer, a few more details are needed. How many spectra are in your set? How many wavelengths (wavenumbersers) did you scan? When you said "series" is this a time seies or something else? 6 PCs is rather a small number for NIR data. We would often start with 20.
I have been working on something vaugly similar. This is a new method for quantitative analysis in which we use a large database of analysed samples and compare their spectra with the spectrum of the unknown. We do this by calculating similarity values between the spectrum of the unknown sample and the spectra of every member of the database. With some modification it might be possible to utilise it for your problem.
AS a first guess at your question, I think you should be worried about the effect of outliers which can cause serious problems with PC calculations.
Hope to hear from you soon.
 
Best wishes,
Tony