NIR Discussion Forum: Savitzky-Golay derivatives / Unscrambler 9

Savitzky-Golay derivatives / Unscramb... Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » Chemometrics » Savitzky-Golay derivatives / Unscrambler 9

« Previous Next »

Author

Message

David W. Hopkins (dhopkins)
Senior Member
Username: dhopkins

Post Number: 126
Registered: 10-2002

Posted on Sunday, September 30, 2007 - 10:42 pm:

Christian,

In the case where you had a data vector of 9 points, you should be able to obtain second derivatives with convolution functions of 9 or 7 points. Of course they won't be very interesting, and especially, you will not be able to obtain the full listing of the convolution factors. You need a Y vector of at least 17 points, so why don't you just use one with something more realistic, like 1001 points?

Please, be careful of your nomenclature. In working with derivatives, gaps are often used with segments in segment-gap derivatives. These are a special type of derivative with their own convolution functions. They use algorithm that averages all the data points within a segment specified by a number of points or wavelength interval, and these intervals are separated by gaps, also specificed by a number of points or wavelength interval. Then the first derivative is the difference in the averages of 2 intervals separated by the gap. A second derivative is the result of taking the averages in 3 segments each separated by a gap of n points, and summing these with coefficients of 1, -2, 1. Of course, the proper normalizations must be applied.

It sounds to me as if you are calling a gap (L-1)/2, where L is the length of the Savitzky-Golay convolution interval, and I don't think that is appropriate.

Best regards,
Dave

Christian Mora (cmora)
Intermediate Member
Username: cmora

Post Number: 17
Registered: 2-2007

Posted on Sunday, September 30, 2007 - 9:50 pm:

In my previous post I meant 7 points instead of 9 (left gap=3, right gap=3, + inner point)
CM

Christian Mora (cmora)
Intermediate Member
Username: cmora

Post Number: 16
Registered: 2-2007

Posted on Sunday, September 30, 2007 - 9:48 pm:

David / Howard

I have tested the SG derivative algorithm from PLS_toolbox (www.eigenvetor.com) and the results are similar to those reported by Unscrambler v 9 and up, except for the fact that explicitly requires that the number of points used in the window must be < 1/2 # variables (columns).

For example, testing the vector y=[0,0,0,0,1,0,0,0,0]' with 2nd deriv and 9 points (i.e. gaps=3) results in a warning message indicating that the number of points was changed to 3 (instead of 9).

Christian

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 161
Registered: 9-2001

Posted on Sunday, September 30, 2007 - 5:30 pm:

Dave - I'm not sure which part of my comment you're taking exception to. The worst thing I said was that didn't know if they included the analytic correction in their equations. I must have missed it. I'm pleased to find out that they did.

I think that at least some of the general statisical packages include the correction, even if the programs marketed to the NIR and spectroscopic communities don't.

But yes, indeed, let's hear from people using other programs, and find out who is making the correction and who not.

Incidentally, apparently when I put a "delta" between angle brackets it shows up correctly on the discussion page, but disappeared from the e-mail distributions, making it appear as though I wrote that the derivative was Y/X. But you can check what I actually did write below, in the discussion.

\o/
/_\

David W. Hopkins (dhopkins)
Senior Member
Username: dhopkins

Post Number: 125
Registered: 10-2002

Posted on Sunday, September 30, 2007 - 5:05 pm:

Howard,

I have to take exception to one of your comments, Savitzky and Golay do account for the <delta>Y term, in their equation VII. It adds a division by (dx)^s, where s is the order of the derivative. This correction is practically universally ignored; it is not employed by any of the software packages I have tested. My guess is, only your personally written routines include it. I have tested Grams, Pirouette, and The Unscrambler. In fact, I would like to hear from users of other software packages, are there any who include this term?

Best regards,
Dave

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 160
Registered: 9-2001

Posted on Sunday, September 30, 2007 - 10:49 am:

Christian - the normalization factor is included in the original Savitzky-Golay paper, as explicit numerical values in the tables; I'm not sure if that paper gives the origin of the normalization values, though.

However, there is another correction value that also needs to be included; some software includes it and some doesn't. As Dave pointed out, as long as you're using one software package that always computes the derivative in a consistent manner, it doesn't matter to the final result, but I'm in favor of doing things the "right" way - and it WILL matter if you want to compare results from programs that deal with these corrections in different ways.

This other correction arises from the fact that we don't compute true derivatives, we compute approximations, as finite differences. In this computation, a "derivative" should be computed as <delta>Y/<delta>X. Several software programs only compute the <delta>Y term, and forget to divide by <delta>X.

The failure to include the <delta>X divisor is also true of the Savitzky-Golay paper (and all the ones based on it). As I said above, however, it will be necessary to take that into account when you want to compare results from different programs.

\o/
/_\

Christian Mora (cmora)
Member
Username: cmora

Post Number: 15
Registered: 2-2007

Posted on Saturday, September 29, 2007 - 11:02 pm:

Dear all;
By adding a "normalization" factor to the convolution function as suggested by Dongsheng (i.e. factorial(deriv)*convolve(....; on my code) I got the same results that Unscrambler currently reports for every derivative I have tested. Thanks. I'll appreciate any further thoughts about the reason because I have to normalize the resulting data, though.
Thanks
CM

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 159
Registered: 9-2001

Posted on Friday, September 28, 2007 - 3:40 pm:

Dave is correct to point out that anyone reading the orginal Savitzky-Golay paper with a view toward implementing it must augment his reading with the Steinier paper. What Dave didn't mention, though, is that both the Steinier paper as well as the original Savitzky-Golay paper have inherent limitations.

Some of these limitations were addressed by Madden (Anal. Chem.; 50(9); p.1383-1386 (1978)), who provided formulas to extend the the use of the S-G method to the inclusion of any arbitrary number of data points in the convolution. Unfortunately, as it turned out, the Madden paper also had an error.

I discussed this whole situation in Spectroscopy; 18(12); p.105-111 (2003), including a correction to the Madden paper, reprinted as chapter 56 in the new "Chemometrics In Spectroscopy", Elsevier (2007), just out. I guess this also serves as an announcement of the book.

As Murphy would have it, however, while checking my references, I found that in the Spectroscopy article, the page number in the reference to the Madded paper was incorrect. It seems you can't publish on S-G without making some sort of mistake! (although in the book it was correct).

\o/
/_\

David W. Hopkins (dhopkins)
Senior Member
Username: dhopkins

Post Number: 124
Registered: 10-2002

Posted on Friday, September 28, 2007 - 11:22 am:

Hi Christian,

It is not clear to me from your R routines where the problem might lie. I am not an R programmer. It occurs to me that the best way for you to test your routine is to see whether you obtain the proper results when you apply your second derivative routine to an impulse function. This is an artificial data vector with all data points = 0, except for a 1 at an internal point (for example the center, or another convenient location). Then the result should be series of values that are the convolution factors, surrounded by zeros, and centered on the data point that had the 1 originally. For odd derivatives, the order of the coefficients is reversed. You can compare your results with the expected results, that are tabulated in the original Savitzky (1964) paper. Note that there are many errors in this paper, that were largely corrected in a paper by Steinier etal (1972). For example, the quadratic 2Ders are correct in Savitzky, but the quartic 2Ders are incorrect.

It is easy to test any software package and see whether the proper results are obtained, if you can import spectra from an Excel spreadsheet. It is easy to make an artificial spectrum in Excel that is an impulse function. The convolution factors for smoothing and derivatives up to order 5 and 25 points long are tabulated in the Savitzky and Steinier papers, cited below.

Best regards,
Dave

Savitzky, Abraham, and Marcel J. E. Golay, Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 36 (8): 1627-1639 (1964).

Steinier, Jean, Yves Termonia and Jules Deltour, Comments on Smoothing and differentiation of Data by Simplified Least Square Procedure. Anal. Chem. 44 (11): 1906-1909 (1972)

Dongsheng Bu (dbu)
Intermediate Member
Username: dbu

Post Number: 16
Registered: 6-2006

Posted on Friday, September 28, 2007 - 9:06 am:

Dear Christian,

Could normalization with factorial of derivative order (e.g. 0! = 1 for smooth only, 3! = 6 for 3rd derivative) be ignored in your Savitzky-Golay filter?

Best regards,
Dongsheng

Christian Mora (cmora)
Member
Username: cmora

Post Number: 14
Registered: 2-2007

Posted on Thursday, September 27, 2007 - 9:46 pm:

Dear David/Dongsheng;

BTW: The reference I used for programming the routine is given in "Signal Processing in Analytical Chemistry" by Peter Wentzell and Christopher Brown in "Encyclopedia of Analytical Chemistry" pp: 9764-9800, year 2000

Christian

Christian Mora (cmora)
Member
Username: cmora

Post Number: 13
Registered: 2-2007

Posted on Thursday, September 27, 2007 - 9:40 pm:

Dear David/Dongsheng;

I have tested my little routine with the "m5spec" dataset
that you can download from the "corn" dataset in
http://software.eigenvector.com/Data/Corn/index.html.

My routine is written in R (www.r-project.org) and I'm not
sure if you will be able to test it but here it is just in case:

sgfilter<-function(data,gap=2,poly=2,deriv=2){
suppressPackageStartupMessages(require(MASS))
if(is.matrix(data)=="FALSE") data<-as.matrix(data)
data<-as.matrix(data)
cnames<-colnames(data)
X<-outer(-gap:gap,0:poly,"^")
A<-ginv(crossprod(X,X))%*%t(X)
n<-length(data[,1])
filtmat<-NULL
j<-deriv+1
for(i in 1:n){
filtmat<-rbind(filtmat,convolve(data[i,],A[j,],type="f"))
}
colmiss<-2*gap
filtmat<-cbind(matrix(0,n,colmiss/2),filtmat,matrix(0,n,colmiss/2))
colnames(filtmat)<-cnames
return(filtmat)
}

I run the following transformations in Unscrambler v 9.2

1) SG smoothing with left=3, right=3, poly=2
2) SG smoothing with left=4, right=4, poly=0
3) SG 1st derivative with left=3, right=3, poly=2
4) SG 2nd derivative with left=4, right=4, poly=2

The results of my routine showed that for 1) to 3) the results agreed
with Unscrambler; however for 4) my results are 1/2 of those reported
by Unscrambler (but they agree with version 8)

I would appreciate your comments

Christian

Dongsheng Bu (dbu)
Member
Username: dbu

Post Number: 15
Registered: 6-2006

Posted on Thursday, September 27, 2007 - 12:08 pm:

Dear Christian,

Dave gave the reason, I just add few things here: The Unscrambler v9.2 and after, numerical values of Savitzky-Golay derivatives are up by factor 2 for 2nd derivative, by factor 6 for 3rd derivative, and by factor 24 for 4th derivative. Also, �Number of smoothing points� is included as input in addition to old �Number of left points� and �Number of right points� in Averaging panel.

Best regards,
Dongsheng

David W. Hopkins (dhopkins)
Senior Member
Username: dhopkins

Post Number: 123
Registered: 10-2002

Posted on Thursday, September 27, 2007 - 11:29 am:

Hi Christian,

You are correct in noticing this factor of 2. It arises because somehow the Unscrambler algorithm for the second derivative gave a result exactly 1/2 of the values expected according the formulas in the original Savitzky-Golay paper. All other derivative orders and smoothing were not affected. They are simply correcting a long-standing error, and this will not affect anyone using regression models and predictions using the same version of The Unscrambler for calibration derivation and use.

I am surprised that you were able to obtain the same result as in the early versions of The Unscrambler. Other software packages I have tested reported the proper values, including Grams and Pirouette. Where did you obtain the simple convolution algorithm you tested?

Best regards,
Dave

Christian Mora (cmora)
Member
Username: cmora

Post Number: 12
Registered: 2-2007

Posted on Thursday, September 27, 2007 - 11:00 am:

Dear all;
Does anyone know the reason why when using the SG derivative algorithm in Unscrambler 9 and up the values obtained are twice as those obtained in previous versions of the same software?
Actually by implementing a simple convolution algorithm I got the same values as those reported in earlier versions, so I don't understand the reason of the change in newer versions.
Thanks for any comment