Tutorial Contents

Spectral analysis

Background

Fine-tuning

Averaging

Windowing

Overlapping

Sample and Time Bins

Power Spectrum

Power values

Shifting the viewport

Zero-padding data

Log or linear display

Spectrogram

Standard power spectrum

Analyse your own voice

Spectral analysis within events: fly song

Carrier frequency

Contents

Spectral analysis

This section starts with some background information on spectral analysis. You can skip this if you want and go straight to the Dataview Power spectrum or Spectrogram facilities.

Background

Spectral analysis is a technique which estimates the powerThere is a lot of inconsistency in the literature in how power is expressed in spectral analysis, but in Dataview power is energy per unit time, expressed as the mean square of the signal amplitude. of a signal at different frequencies. The signal is first digitizedAnalog (continuous-time and continuous-amplitude) signals are digitized by passing them through an analog-to-digital converter (ADC). This turns the signal into a series of numbers representing the amplitude of the signal at discrete time intervals set by the sample rate of the ADC. to produce a sequence of sample values and these are then passed through the fast Fourier transform (FFT) algorithm. The algorithm produces a set of equally-spaced frequency bands in a range from 0 (DC) to the Nyquist frequency (half the sampling frequency), and tells us how much power there is within each of these band. The number of frequency bands is determined by the number of samples passed to the FFT routine – there are exactly half-as-many-plus-one bands as there are samples, so the greater the number of samples passed to the algorithm, the higher the frequency resolution. The FFT algorithm only works on data chunks which are a power-of-two samples in length (e.g. 32, 64, 128 etc.). Short signals can be zero-padded to fill them out to the length required for a particular resolution, but this of course reduces the overall power values that the FFT reports.

Standard spectral analysis combines all the samples in the signal and produces a single 2D graph showing the average powerIt is often more informative to plot the logarithm of the power rather than its linear value, since this makes it easier to detect low (but still significant) power components. However the display can be switched to linear if desired. in each frequency band plotted against the centre value of the frequency band. This tells us nothing about any changes in frequency within the signal, it just gives the overall power/frequency profile of the whole signal. It is available through the Power spectrum command on the Analysis menu. However, sometimes we are interested in how the frequency changes with time within the signal. In this case the signal is divided into successive chunks which are analysed in series to produce a plot of frequency vs power vs time known as a spectrogramA spectrogram is sometimes called a sonogram – but that term is now mainly used for medical ultrasound images.. These are usually displayed as a 2D graph of frequency vs time, with power (the Z-dimension) being colour-coded. Spectrograms are also available from the Analysis menu in Dataview.

a
voice recording
b
sound spectrum
c
sound spectrogram
Spectral analysis of the file voice. a. The raw signal derived from a microphone recording of someone speaking. b. The power spectrum of the whole file. c. A spectrogram of the file with time on the x axis, frequency on the y axis, and power colour coded.

Fine-Tuning the Analysis

There are several parameters that fine-tune the analysis that are common to both the power spectrum and spectrogram.

Averaging

If the power spectrum is calculated from a single FFT episode, deep math or a book on Fourier analysis tells us that the uncertainty (standard deviation) of the estimate at each frequency is the same as the estimate itself! The uncertainty can be reduced by averaging successive FFT episodes, but of course this increases the amount of data needed for the analysis, and so reduces the time resolution for a given number of samples. It would seem, for instance, that averaging 4 FFT episodes would need 4 times as many samples as just processing one FFT episode. However, Welch's ingenious overlap method (see below) reduces the number of samples needed.

Windowing

In principal, Fourier analysis should be applied to an infinitely long section of periodic data, with the length of the FFT segment an exact integer multiple of the cycle period. In practice, of course, this is never the case. The finite length of non-repeating data used in real analyses causes “edge effects” to appear as artefacts in the form of  spectral leakage of power into inappropriate frequencies. These artefacts can be reduced by windowing the data. A window, in this context, is a filter that smooths out the start and end edges of a chunk of data by gradually tapering them to low amplitude or zero. In the concept image below (available in file window) the raw signal in the upper trace is multiplied by the window in the middle trace, producing a signal in the lower trace that is full-strength in the middle, but tapers to zero at either edge.

FFT window

There are several different types of window available that can be chosen from the drop-down FFT window list in Dataview, but the default Hamming is often a good choice. You should consult a book on digital signal processing for information on the pros and cons of the different window types (note that the Rectangular window is actually no window at all – just the FFT applied to normal data with a sharp transition at the edges).

Overlapping

One problem with windowing is that you “throw away” data at the edges of each segment sent to the FFT routine as they taper away towards zero. However, you can to a certain extent both have your cake and eat it in terms of averaging for noise reduction while still maintaining time resolution. You can overlap successive episodes of FFT to reduce the amount of data required to achieve a certain number of averages. This partially recovers the data attenuated due to windowing, and has the added benefit of allowing greater frequency resolution for a given length of data. Obviously, because there is redundant information in the overlapped data, the noise reduction is less than with non-overlapped analysis, but the drop in noise reduction is not as great as might be expected, and the increase in time resolution can be very helpful. Averaging K episodes with 50% overlap reduces the variance by (9/11)K, as opposed to reducing it by K with no overlap (Press et al., 2007). By default Dataview uses an overlap of 50%, but you can experiment with different values selected from the Percentage overlap drop-down list.

Samples and Time Bins

A spectrogam is produced by combining a sequence of individual power spectrum estimates each with a specified duration. This duration determines the time bins of the X axis of the graph. At first sight it would seem that the time bin duration would be the same as that for a simple Power spectrum, and that each time bin would be calculated using a set of samples that was separate from the others. However, this is not the case if FFT episodes are overlapped since the boundaries between time bins are "smeared" by the overlapping data. The diagram below illustrates the sample distribution within a spectrogram where the Frequency resolution is 64, the Percentage overlap is 50%, the Number to average is 4, and Number of time bins is 2.

Time bin overlap diagram
Sample distribution within a spectrogram. The thick horizontal lines indicate the sections of data passed to the FFT routine, and each of these sections is 128 samples long because the frequency resolution is set at 64. Each power spectrum average is made up of 4 sections with 50% overlap, so each average contains data from 320 separate samples. The total analysis period contains 576 samples, and in the output display, these are divided into two horizontal sections, each 288 samples long. There is thus some bleed-through of information between adjacent time bins in the spectrogram display.

With this background information we can now look at the analyses in action.

Power Spectrum

power spectrum file
Various sine waves. Trace 1: A pure sine wave with 2.44 Hz frequency and amplitude 1. Trace 2: A pure sine wave of 24.4 Hz frequency and amplitude 3. Trace 3: A mixed sine wave consisting of the arithmetic sum of traces 1 + 2. Trace4: The first half is the sine wave from trace2, the second half is the sine wave from trace 1.

The data were constructed using the expression parser within Dataview, and consist of two sine waves in various combinations. You can see the formulaeNote these formulae come from an earlier version of DataView. Consult the help file for information about formula syntax for the current version. used to construct the data in the trace labels. The frequencies were chosen to match the centres of the frequency bands resulting from FFT analysis. These depend on the Nyquist frequency of 2 KHz, which in turn depends on the nominal ADC sample rate of 4KHz).

The power spectrum analysis processes as much of the data displayed in the main view as will fit within the power-of-two sample number constraint. On start-up the default settings are adjusted to maximize the Frequency resolution available from the data with 50% overlap and an Average of at least 4. The user can then adjust the resolution and overlap if desired, but cannot directly set the average since this is constrained by the other values and will always be the maximum possible. In this case the main Viewport is 7500 ms, which allows a resolution of 4096 frequency bins while averaging 6 FFT episodes. The Time needed (used) for the analysis is 7168 ms.

Now let's look at the results:

You should now see a single large spike in power towards the left of the spectrum graph.

Power spectrum of 2.4 Hz sine wave
Power spectrum of a 2.441 Hz sine wave.

The similarity of the trace 3 and 4 power profile illustrates an important fact about Fourier analysis: within the analysis you completely lose all information about time. In techno-speak, you have moved from the time domain to the frequency domain, and when you gain one set of information you lose the other.

Power values

Note that the sine wave has an amplitudeNote that sine wave amplitude is defined as the distance from the mean (0) value to the peak value. The peak-to-peak value of a sine wave with amplitude 1 is 2. of 1, and the mean square value of a sine wave of amplitude 1 is 0.5, so the peak power for trace 1 should be 0.5. But in fact it is about 0.37. Why?

As noted above and from the text output shown above the graph, some of the power in the 2.44 Hz sine wave is leaking into the adjacent frequency bins. If you sum the 3 power values from bins 4, 5 and 6 in the text output you should get a total power of almost exactly 0.5, which is what it should be. The leakage problem in this case is caused by the windowing, which reduces the accuracy of the power location. Normally, this is an acceptable trade-off because the alternative damage caused by spectral leakage is usually worse. But in this particular case, where the pure sine wave frequency has been tuned to fit the parameters of the analysis, it is better not to window the data.

It must be emphasised that with normal biological data containing a wide and arbitrary range of frequencies, windowing is definitely the correct thing to do. Dataview provides a variety of window options, but to be honest, there is not much to chose between them when analysing biological data (as opposed to some physics or engineering problem when specific trade-offs may be important). The default Hamming option is usually perfectly adequate.

The bottom line is that quantitative interpretation of the individual power values is rarely attempted for biological signals. What is usually important is the relative power within the different frequency bands.

Shifting the Viewport

You can change the main viewport, and hence the data supplied for analysis, using the toolbar buttons within the dialogThe navigation tools in the main view are inaccessible because the dialog is modal..

Two things happen. First, the power returns to 4.5 because only the high-power sine wave is now within the viewport and available for analysis. Second, the Number to average drops from 6 to 2. This is because we have fewer available samples, but have kept the same frequency resolution. Note the non-linearity between sample length and number to average, which is due to the overlap algorithm.

We are now warned that we don't have enough data to analyse at this resolution, and the expansion request will be ignored. If we really want to expand the viewport timebase, we will have to manually drop the Frequency resolution.

Zero-padding data

Now we have run off the end of the data file, and the right half of the main viewport is zero-filled. However, this does not affect the location of the peaks in the power spectrum, it just halves their values, since half the signal now has no power.

In this case nothing is gained from the zero-filled data. However, if a recording is just a bit too short to achieve a particular Frequency resolution, then extending the data by zero-filling up to the next power-of-two constraint may double the frequency resolution, with just a little drop in the numerical values of the peak power. And this can sometimes be quite useful.

Log or Linear Display

It is common in spectral power estimation for the power values to be displayed on a logarithmic scale. This enables low-power components to be visible when on a linear scale they would be completely swamped by the high power components. It also fits with Fechner's law that the human perceived experience of stimulus intensity scales with the logarithm of its actual intensity, rather than linearly.

a
Logarithmic power spectrum
b
Linear power spectrum
a. Logarithmic display of the power spectrum. b. Linear display of the same data.

It is obvious that much more detail is visible in the logarithmic display. There is high power at the low-frequency (left) side of the display, reflecting the DC component of the membrane potential. There is then a section of medium power, reflecting the PSPs (and interference) in the recording. There is then a drop to very low power. This reflects the cut-off frequency of the amplifier used to make the recordingThis indicates that the data have been oversampled in terms of meeting the strict Nyquist criterion - there is no information at these higher frequencies. However, to achieve a useful join-the-dots display in the main view oversampling is essential, so this is not a mistake. . In contrast, in the linear display only the DC component and a small adjacent spike are visible.

We can look at the interesting part of the spectrum by zooming in.

a
50 Hz mains interference power spectrum
b
Linear power spectrum zoomed
a. Logarithmic display of the power spectrum zoomed in to show the lower frequencies. Interference power spikes are visible at 50, 150, and 200 Hz. Interestingly, the second interference harmonic (100 Hz) has low power. b. Linear display of the same data. Only very low frequency power and the primary interference harmonic are visible.

The logarithmic display shows much more detail, but the linear display draws attention to the dominant features of the spectrum. You should chose your display according to what you wish to emphasize.

Spectrogram

Spectrograms are frequently used to analyse sounds such as speech or bird song. We will look at a different type of sound:

The dialog is quite complex and what follows is only a brief description. If you want more details on a particular option, press F1 on your keyboard to call up the context-sensitive help.

On the right of the dialog are two graphical displays. The large coloured display is the spectrogram itself – the result of the analysis. The horizontal axis is time, and you can see that it is quite “blocky”. In essence, each block represents the result of a FFT analysis on a sequential chunk of data. The number of blocks is set by the Number of time bins. The vertical axis is frequency, with high frequencies at the top and low frequencies at the bottom. Again, the display is “blocky”, and each block represents a frequency band. The number of bands is set by the Frequency resolution (and is constrained to be a power-of-two). The colour of each block in the display reflects the power of the signal within that time and frequency block. The accuracy of the power is largely set by the Number to average, with higher values giving more accurate results. These parameters all interact with each other, and increasing any of them requires more data for analysis. The details of these parameters were described earlier in the Fine-tuning the Analysis section.

a
killer whale spectrogram
b
killer whale song power spectrum
Spectrogram of the song of a killer whale. a. The spectrogram dialog. b. The power spectrum at the time of the scrubber bar.

If you move the mouse over the spectrogram display, the time, frequency and power values at the mouse location are shown just to the left of the display. These values change as you move the mouse. If you want to freeze the read-out (perhaps to copy the numbers to the clipboard), just click the mouse. Click again to un-freeze the display.

Below the spectrogram is a smaller display showing the section of data that is being analysed. The Start time (which is relative to the whole data file) can be adjusted here if desired, either by directly editing it, or by using the adjacent buttons. The end time is read only, since it is determined by other settings. If you had a multi-trace file (which this one isn't) you could set which trace is analysed by changing the Trace ID parameter at the top left of the dialog.

Standard power spectrum

You can produce a standard power spectrum (as described previously) for any region of interest in the spectrogram.

You should now see the display in part (b) of the figure. Note the power spikes occur in the frequency bands of the horizontal redish bars under the scrubber in the spectrogram.

Analyse your own voice

If you have a microphone input on your computer, it might be fun to have a look at a spectrogram of your own voice.

You will (probably) see that the spoken word "high" has power in higher frequency components (is further up the display) than "low". This is not a result of some deep AI intelligence about the meaning of the words(!) - it's that the vowel sound in "high" is more squeaky than that in "low". In the unlikely event that you want to hear me, the file is my voice.

spectrogram of the words low and high
Spectrogram of me saying "low, high".

Spectral analysis within events: fly song

What can you do if the activity that you want to analyse occurs in relatively short bouts, and these are interspersed with activity with rather different characeristics? One solution is to delimit the regions with one type of activity by events in one channel, and the regions with the other type by events in another channel, and then to analyze the two channels separately.

fly song
Clicks and buzzes in a fly song.

The fly can produce two main types of sound; a burst of clicks which are delimited by events in channel a, and short buzzes delimited by events in channel b. The individual clicks within the bursts are delimited by events in channels c. Silent periods between songs have been made inactive to reduce file size. The aim is to do a separate spectral analysis for the different components of the song.

The frequency resolution determines the length of the data segments processed by the FFT. The data within each event within the specified event range (from Start at for Count events) are read from file, and segment-length chunks of these data are windowed (i.e. passed through a filter that tapers values at the edges) and then analysed by FFT. Successive segments within each event are overlapped if desired. The power at each frequency is accumulated over the successive segments, and finally the average power for each frequency is reported.

Any data left-over at the end of a segment (including that from events whose total duration is shorter than the segment length) is ignored by default. However, it can be included by checking Pad fragments. In this case, the left-over data is windowed and then the segment is zero-filled to bring it up to the required power-of-two length. This means that genuine data are not wasted and thus the maximum resolution and lowest noise can be achieved. However, the actual value of the power becomes distorted, since the zero-valued data clearly contain no power and so reduce the overall power levels in the averaging process.

With the resolution set to 1024 each FFT process requires 256 ms of data, and with 50% overlap the FFT segments done shows that a total of 102 FFT segments have been analysed from this file.

The spectral density (below, a) reveals a broad power range which peaks at about 170 Hz, but which extends up to about 900 Hz before rapidly decreasing. The latter reflects the setting of the analogue filter on the recording device.

a
fly clicks
b
fly song buzz spectrum
Analysis of fly song. a. The power spectrum of click sections of song. b. The power spectrum of buzz sections of song.

Note that the “clicks” which are visually dominant in the recording and occur at about 30 Hz actually contribute very little to the total power. This is because they are swamped by the high-frequency ups-and-downs within each click. To analyse the clicks themselves:

This time the peak at 170 Hz is much more prominent (above, b), reflecting the more consistently dominant frequency in the “buzz” part of the song. There is also a broader harmonic at about twice the dominant frequency. However, there is less high-frequency power than in the “click” sections of the recording. There are very sharp peaks at regular intervals of 100 Hz; these are almost certainly caused by mains interference in the recording.

In this channel the events are tightly focussed around the individual clicks, and each event is only 25 ms long. The FFT requires 256 ms of data, so for this event channel it is essential to Include fragments since none of the events are long enough to provide a full segment of data for FFT analysis. But with 700 events there are enough data to provide a reasonable average. The frequency profile is very similar to that of channel a, but the overall power levels are lower because of all the zero-filling.

Carrier frequency

The frequency with the highest power level in a spectrum is known as the carrier frequency. You can find the carrier frequency for each event in a channel using the Scan events facility. The frequency is stored within the variable value associated with the event.

Note that each event in channel a in the main display now has a number associated with it. This is the carrier frequency of the data within that event. The carrier frequency of each event can then be retrieved through the various event parameter options (graph, histogram, list etc).