NIR Discussion Forum: Savitzky-Golay Tails

Savitzky-Golay Tails Log Out | Topics | Search
Moderators | Register | Edit Profile

NIR Discussion Forum » Bruce Campbell's List » I need help » Savitzky-Golay Tails

« Previous Next »

Author

Message

Jim Burger (jburger)
Junior Member
Username: jburger

Post Number: 9
Registered: 11-2010

Posted on Wednesday, June 22, 2011 - 1:57 pm:

Ahhhh.... this forum is amazing. Thanks to all for the feedback - especially Scott and Howard for the references. What seems like such a trivial problem, gets complicated exponentially with time - but simplified with the right experts at hand! Thanks!

Back to coding...

Jim

Scott Ramos (lsramos)
Junior Member
Username: lsramos

Post Number: 9
Registered: 1-2007

Posted on Wednesday, June 22, 2011 - 12:38 pm:

Jim,

Here is one approach:

Gorry, P. A. (1990). "General least-squares smoothing and differentiation by the convolution (Savitzky-Golay) method." Analytical Chemistry 62(6): 570-573.
Smoothing and differentiation of large data sets by piecewise least-squares polynomial fitting are now widely used techniques. The calculation speed is very greatly enhanced if a convolution formalism is used to perform the calculations. Previously tables of convolution weights for the center-point least-squares evaluation of 2m+1 points have been presented. A major drawback of the technique is that the end points of the data sets are lost (2m points for a 2m+1 point filter). Convolution weights have also been presented in the special case of initial-point values. In this paper a simple general procedure for calculating the convolution weights at all positions, for all polynomial orders, all filter lengths, and any derivative is presented. The method, based on the recursive properties of Gram polynomials, enables the convolution technique to be extended to cover all points in the spectrum.

Scott

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 445
Registered: 9-2001

Posted on Wednesday, June 22, 2011 - 12:30 pm:

Good point, Tony. There are ways other than polynomials to fit functions to data, Fourier components being one of them. Mathematicians have come up with dozens of different sets of orthogonal functions that can be used. Principal Components can also be considered one of those sets, albeit one where the orthogonal functions are determined by the data, rather by a priori mathematical considerations, as Fourier and polynomial functions are.

BTW - have you heard from Fred lately? I tried sending him some e-mails and they bounced. Don't know if anybody from the community has been in touch with him lately.

\o/
/_\

Tony Davies (td)
Moderator
Username: td

Post Number: 262
Registered: 1-2001

Posted on Wednesday, June 22, 2011 - 11:40 am:

Jim,

I know that Fred McClure would want me to tell you that you can do it by transforming the spectrum to the Fourier domain and then multiplying by the derivative number. However, it may be prone to noise. It is discussed by Bill Hruschka in Chapter III of Williams and Norris "NIR Techology", (page 53).

Best wishes,

Tony

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 444
Registered: 9-2001

Posted on Wednesday, June 22, 2011 - 11:33 am:

Jim - I had almost forgotten, but I had written up some of this myself, previously, although for specific cases. Those should be simple enough to expand to the general case. You might want to take a look at:

Spectroscopy; 18(12); p.106-111 (2003)

In that article I present the numerical matrices that you could (matrix)-multply by the data to obtain the coefficients for the indicated derivative order/polyomial degree, but it is simple and obvious enough how to extend it to other cases. This article include the numeric computations as an illustrative example.

Also, in:

Spectroscopy; 21(1); p.44-53 (2006)

I show the derivation of the equations for a quadratic polynomial, although there again, it should be straightforward to extend that to the general case.

\o/
/_\

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 443
Registered: 9-2001

Posted on Wednesday, June 22, 2011 - 9:32 am:

Jim - Some "dusting" might be in order. Alternatively, if you're familiar with the derivation of MLR, you can do it yourself for the polynomial case. Just replace the basic equation of MLR:

Y = b0 + b1X1 + b2X2 + b3X3 + ...

with the polynomial expression:

Y = b0 + b1X + b2X^2 + b3X^3 + ...

Express that as the errors:

0 = b0 + b1X + b2X^2 + b3X^3 + ... -Y

Then repeat the Least Squares derivation for this polynomial. You'll wind up with a matrix where the elements are the various powers of the data, instead of the different variables such as you have in MLR. Formally, however, it follows the same steps (sum the errors over the data points, take the derivatives dY/dX with respect to the various bi and set them to zero, then solve the resulting algebraic equations). You can use any degree polynomial you care to, and any size subset of your data to sum over. If you set your data up properly, you could even use an MLR program to do the computations.

Then apply the resulting equation to the data; the result will be the polynomial (based on the polynomial degree and number of data points you used) that best fits the data in the Least-Squares sense. Then you can calculate the value of the polynomial for any value of X, including the end points of the data. Strictly speaking, it's not even an extrapolation since the calculations never go beyond the range of the data, although as Hector suggested, you can even extrapolate beyond the range of the data (with the usual caveats!).

\o/
/_\

David W. Hopkins (dhopkins)
Senior Member
Username: dhopkins

Post Number: 198
Registered: 10-2002

Posted on Wednesday, June 22, 2011 - 7:45 am:

Hi Jim,

I agree with Howard, you could do the calculations yourself, it would be fairly straight-forward in Matlab, I think. I know I also have seen papers where the authors developed the math for the end regions. I think there are closed solutions for the convolution factors for those points too. However, the smoothing is not as effective as you approach the ends of the tails. Sorry, like Howard, I cannot remember the names of the authors.

On the other hand, I have never needed to do those calculations, because those first and last few wavelengths have not been critical. I have just backed off the appropriate standoff to avoid the undefined regions in my models. Is it really critical to your application to utilize those end regions? Usually they are just garbage anyway, although I admit that sometimes I have desired to extend the useful wavelength range.

Best regards,
Dave

Hector Casal (casalh)
Junior Member
Username: casalh

Post Number: 8
Registered: 1-2007

Posted on Wednesday, June 22, 2011 - 7:44 am:

The application of SG to a spectrum can be summarized as: calculation of coefficients which form an array X and then the convolution of that array with the spectrum array Y.
One easy way to extend the operation to the very 'ends' of the spectrum is to make the indexed elements outside the ranges of X and Y equal to zero. This is the way that some convolution subroutines have been implemented. It works.

Jim Burger (jburger)
Junior Member
Username: jburger

Post Number: 8
Registered: 11-2010

Posted on Wednesday, June 22, 2011 - 5:25 am:

Thanks Howard, Yes I understand the polynomial fitting. I was just hoping someone would have references to a 'non-centered' window. (i.e. fitting the first point, to a window only to the right, not centered) Guess I'll have to dust off my numerical analysis books, and a few brain cells!

-Jim

Howard Mark (hlmark)
Senior Member
Username: hlmark

Post Number: 442
Registered: 9-2001

Posted on Wednesday, June 22, 2011 - 4:54 am:

Jim - yes, that can be done. The basic process underlying S-G is fitting a polynomial to the data with a Least Squares calculation. S-G extended that by pre-calculating the resulting coefficients; the degree of the polynomial and the number of data points are parameters of the calculation (as well as the order of derivative, if you are using it to calculate derivatives).

But if you explicitly do the curve-fitting over some region of the data as a separate step, you can calculate any point on the function. By doing the fitting to the points at the end of the data, you can calculate the function all the way to the end of the data, although strictly speaking it's no longer "Savitzky-Golay". I've seen this writen up but I can't recall where, offhand. You should be able to find it in books on numerical analysis.

There are limitations in that the end-points of the fitted function tend to be more subject to variability, and thus have larger error than the middle points.

\o/
/_\

Jim Burger (jburger)
Junior Member
Username: jburger

Post Number: 7
Registered: 11-2010

Posted on Wednesday, June 22, 2011 - 2:40 am:

This is more a numerical analysis question than NIR, but here goes:

SavGol and similar filters are based on a 'sliding window' of width 'W' over a spectrum of 'N' equally spaced points. Many papers discuss the original theory and minor corrections to the filter coefficients.

But the filter is used to compute replacement values for the middle point of the sliding window.

Can anyone suggest some theoretical basis for computing the 'half window' tails at each end of the spectrum? What is an optimal way of 'extending' the spectrum so that these tail regions can be transformed as well?

Thanks in advance,
Jim