Aspects of noise analysis

3.21 Aspects of noise analysis

We discuss signals \(S(t)=S(t_n)=S_n\) which are measured continuously over time \(t\) or at certain times \(t_n\) (most often with equally distant time differences \(\Delta t\)). This signal contains the wanted information \(f(t)\) and unavoidably some noise \(N(t)\), i.e.

\[S(t) = f(t) + N(t)\]

Most often one is interested to gain the pure \(f(t)\) information, but quite often the noise \(N(t)\) contains relevant information of our measured system as well, so in general one wants to split up the signal into its ”pure” and its noise component which can be a very difficult task.
Let us in a first example assume that \(f(t)\) is a constant \(f_0\). In this case one would calculate the average of the signal \(\langle S\gt\) either as an integral or as a sum depending if the data has been measured continuously or at certain time intervals. We find

\[\langle S(t)\rangle = \langle f(t) + N(t)\rangle = \langle f(t)\rangle + \langle N(t)\rangle = f_0 + N_0\]

The second equal sign holds since averaging is a linear operation. If \(N_0 = 0\) we have found the pure information \(f_0\) easily. Noise with \(N_0 = 0\) is called unbiased noise. Most systems show such noise. If the noise is biased, it gets very difficult to extract \(f_0\). One needs sequences of measurements at different noise and/or signal conditions, but this is not topic of this section.
Here we will discuss strategies for separation signal and noise if \(f(t)\) shows an explicit time dependence. In general, this is not possible. One needs additional information and/or has to make certain assumptions about \(f(t)\) and/or \(N(t)\). Most of these assumptions are related to differences in the power spectrum of \(f(t)\) and \(N(t)\). If you do not understand the last statement don’t bother; the remaining of this section will deal only with this concepts.
Most often one just assumes that \(f(t)\) is changing slowly vs. time while \(N(t)\) changes quickly in time. Either the physics of the system tells us which model \(m(\vec{a},t)\) (with \(\vec{a}\) some typically unknown fitting parameters) to use for describing \(f(t)\), or one uses some general fitting models like polynomial fitting \(p(\vec{a},t)\). Both approaches have in common that they should (only) allow to describe slow changes in time (slow processes). For linear problems we learned in section 3.7 how to find the optimal fitting parameter \(\vec{a}_{opt}\). In general we have to solve \begin{eqnarray*}\mbox{min} \quad \chi^2(\vec{a}) & = & \langle S(t)-m(\vec{a},t)|S(t)-m(\vec{a},t)\rangle \\ & = & \langle f(t)-m(\vec{a},t)|f(t)-m(\vec{a},t)\rangle \\ & & -\langle f(t)-m(\vec{a},t)|N(t)\rangle -\langle N(t)|f(t)-m(\vec{a},t)\rangle \\ & & + \langle N(t)|N(t)\rangle \end{eqnarray*}

The last equal sign just used the bilinearity of the scalar product. If the fit is good \(f(t)-m(\vec{a},t)\) and especially \(\langle f(t)-m(\vec{a},t)|f(t)-m(\vec{a},t)\rangle \) is small.
\(\langle f(t)-m(\vec{a},t)|N(t)\rangle \) will be small due to two reasons

\(\left(f(t)-m(\vec{a},t)\right)\) hopefully has very small values, i.e. is close to the zero function
\(\left(f(t)-m(\vec{a},t)\right)\) and \(N(t)\) vary on different time scales, i.e. they have non overlapping power spectra (again this funny word). One calls such functions uncorrelated.

The second reason is much more important and fundamental than the first reason! Before we will quantify the correlation between two functions we will just state that under the conditions described above the minimal \(\chi^2(\vec{a}_{opt})\) just calculates the variance of the unbiased noise \(N(t)\). Effectively by calculating \(S(t) - m(\vec{a},t)\) we have ”simulated” an experiment with a constant signal \(f(t)=f_0\) with \(f_0=N_0=0\) and of course the variance of our signal is the variance of the noise in this signal.
For quantifying the correlation between two functions \(f(t)\) and \(g(t)\) we introduce the cross correlation function (CCF) between these two functions

\[\mbox{CCF}(f,g)(\tau) := \frac{1}{T} \int_0^T f(t-\tau)g(t) dt\]

Up to a scaling factor we already know this function; it is the convolution between two functions as discussed e.g. in section 3.15.1. Again up to some scaling factors and some tedious problems with the definition of the limits for the integrals the Fourier transformed of the CCF\((f,g)\) is just the product \(F*G\) of the Fourier transformed \(F = \F(f)\) and \(G = \F(g)\).
Applying the CCF operation to one function \(f(t)\) it is called the auto correlation function (ACF), i.e.

\[\mbox{ACF}(f)(\tau)=\mbox{CCF}(f,f)(\tau) := \frac{1}{T} \int_0^T f(t-\tau)f(t) dt\]

Obviously ACF\((f-f_0)(0)\) is just the variance of \(f\). More important \(\F(\mbox{ACF}(f))\) is \(F^2(\omega)\) and this function is called the power spectrum of the function \(f(t)\).
Two functions \(f\) and \(g\) have non overlapping power spectra if for all \(\omega\) either \(F(\omega)=0\) or \(G(\omega)=0\). So \(F(\omega)*G(\omega)=0\) for all \(\omega\) and thus CCF\((f,g)(\tau)=0\) for all \(\tau\), i.e. \(f\) and \(g\) are completely uncorrelated. A slowly changing function \(f(t)\) has only non vanishing \(F(\omega)\) for small \(\omega\). Typically noise has non vanishing \(\F(N)(\omega)\) for large \(\omega\). Of course sampling rate and sampling time must be taken appropriate to allow for a separation of signal and noise. Let us assume that we have a friendly system for which all this works, i.e.

\[S(t) - m(\vec{a}_{opt},t) \approx N(t)\]

So performing an auto correlation analysis of the left hand side, resp. calculating the power spectrum of the left hand side allows to analyze the noise of the system. Even if \(f(t)\) is expected to be zero often a fitted approximation function is subtracted; this approach is called ”drift elimination” and allows to eliminate e.g. changes in time of control parameter like temperature while the measurement is performed or eliminate certain transient processes. One could call such phenomena ”noise” as well, but they occur typically due to other uncertainties in the measurement than the fundamental noise sources which one typically analyzes by the power spectrum.
Whole branches of materials science deal intensively with such noise analysis concepts. This section was just meant as a very brief introduction into this topic. In the computer math lecture we will deal with one example of measured data for a diffusion experiment where the data acquisition was done by a computer program with a high sampling rate over a long time. Such data is most useful for a statistical analysis as discussed above.
The last aspect in this section will be a more precise definition of ”white noise” (\(WN(t)\)), a widely used definition for which we have now for the first time the tools for a mathematically correct definition.
One definition of white noise is that the time sequence is completely uncorrelated, i.e.

\[\mbox{ACF}(WN)(\tau) = 0 \quad \mbox{if}\quad \tau \neq 0 \]

For a (theoretically) infinite time series ACF\((WN)(0) = \infty\), so the auto correlation function resembles the delta function (resp. Kronecker \(\delta_{0,i}\)). The Fourier transformed of the delta function is a constant, i.e. all frequencies \(\omega\) show up with the same weight in the power spectrum. No frequency (regime) dominates which explains the meaning of ”white” in the characterization of the noise spectrum. Does white noise really exist? Do physical systems show white noise? Think about it!