Spectral analysis

Spectral analysis within events: fly song

Spectral analysis

This section starts with some background information on spectral analysis. You can skip this if you want and go straight to the Dataview Power spectrum or Spectrogram facilities.

Background

Spectral analysis is a technique which estimates the powerThere is a lot of inconsistency in the literature in how power is expressed in spectral analysis, but in Dataview power is energy per unit time, expressed as the mean square of the signal amplitude. of a signal at different frequencies. The signal is first digitizedAnalog (continuous-time and continuous-amplitude) signals are digitized by passing them through an analog-to-digital converter (ADC). This turns the signal into a series of numbers representing the amplitude of the signal at discrete time intervals set by the sample rate of the ADC. to produce a sequence of sample values and these are then passed through the fast Fourier transform (FFT) algorithm. The algorithm produces a set of equally-spaced frequency bands in a range from 0 (DC) to the Nyquist frequency (half the sampling frequency), and tells us how much power there is within each of these band. The number of frequency bands is determined by the number of samples passed to the FFT routine – there are exactly half-as-many-plus-one bands as there are samples, so the greater the number of samples passed to the algorithm, the higher the frequency resolution. The FFT algorithm only works on data chunks which are a power-of-two samples in length (e.g. 32, 64, 128 etc.). Short signals can be zero-padded to fill them out to the length required for a particular resolution, but this of course reduces the overall power values that the FFT reports.

Standard spectral analysis combines all the samples in the signal and produces a single 2D graph showing the average powerIt is often more informative to plot the logarithm of the power rather than its linear value, since this makes it easier to detect low (but still significant) power components. However the display can be switched to linear if desired. in each frequency band plotted against the centre value of the frequency band. This tells us nothing about any changes in frequency within the signal, it just gives the overall power/frequency profile of the whole signal. It is available through the Power spectrum command on the Analysis menu. However, sometimes we are interested in how the frequency changes with time within the signal. In this case the signal is divided into successive chunks which are analysed in series to produce a plot of frequency vs power vs time known as a spectrogramA spectrogram is sometimes called a sonogram – but that term is now mainly used for medical ultrasound images.. These are usually displayed as a 2D graph of frequency vs time, with power (the Z-dimension) being colour-coded. Spectrograms are also available from the Analysis menu in Dataview.

voice recording — Spectral analysis of the file voice. a. The raw signal derived from a microphone recording of someone speaking. b. The power spectrum of the whole file. c. A spectrogram of the file with time on the x axis, frequency on the y axis, and power colour coded.

sound spectrum — Spectral analysis of the file voice. a. The raw signal derived from a microphone recording of someone speaking. b. The power spectrum of the whole file. c. A spectrogram of the file with time on the x axis, frequency on the y axis, and power colour coded.

Fine-Tuning the Analysis

There are several parameters that fine-tune the analysis that are common to both the power spectrum and spectrogram.

Averaging

If the power spectrum is calculated from a single FFT episode, deep math or a book on Fourier analysis tells us that the uncertainty (standard deviation) of the estimate at each frequency is the same as the estimate itself! The uncertainty can be reduced by averaging successive FFT episodes, but of course this increases the amount of data needed for the analysis, and so reduces the time resolution for a given number of samples. It would seem, for instance, that averaging 4 FFT episodes would need 4 times as many samples as just processing one FFT episode. However, Welch's ingenious overlap method (see below) reduces the number of samples needed.

Windowing

In principal, Fourier analysis should be applied to an infinitely long section of periodic data, with the length of the FFT segment an exact integer multiple of the cycle period. In practice, of course, this is never the case. The finite length of non-repeating data used in real analyses causes “edge effects” to appear as artefacts in the form of spectral leakage of power into inappropriate frequencies. These artefacts can be reduced by windowing the data. A window, in this context, is a filter that smooths out the start and end edges of a chunk of data by gradually tapering them to low amplitude or zero. In the concept image below (available in file window) the raw signal in the upper trace is multiplied by the window in the middle trace, producing a signal in the lower trace that is full-strength in the middle, but tapers to zero at either edge.

There are several different types of window available that can be chosen from the drop-down FFT window list in Dataview, but the default Hamming is often a good choice. You should consult a book on digital signal processing for information on the pros and cons of the different window types (note that the Rectangular window is actually no window at all – just the FFT applied to normal data with a sharp transition at the edges).

Overlapping

One problem with windowing is that you “throw away” data at the edges of each segment sent to the FFT routine as they taper away towards zero. However, you can to a certain extent both have your cake and eat it in terms of averaging for noise reduction while still maintaining time resolution. You can overlap successive episodes of FFT to reduce the amount of data required to achieve a certain number of averages. This partially recovers the data attenuated due to windowing, and has the added benefit of allowing greater frequency resolution for a given length of data. Obviously, because there is redundant information in the overlapped data, the noise reduction is less than with non-overlapped analysis, but the drop in noise reduction is not as great as might be expected, and the increase in time resolution can be very helpful. Averaging K episodes with 50% overlap reduces the variance by (9/11)K, as opposed to reducing it by K with no overlap (Press et al., 2007). By default Dataview uses an overlap of 50%, but you can experiment with different values selected from the Percentage overlap drop-down list.

Samples and Time Bins

A spectrogam is produced by combining a sequence of individual power spectrum estimates each with a specified duration. This duration determines the time bins of the X axis of the graph. At first sight it would seem that the time bin duration would be the same as that for a simple Power spectrum, and that each time bin would be calculated using a set of samples that was separate from the others. However, this is not the case if FFT episodes are overlapped since the boundaries between time bins are "smeared" by the overlapping data. The diagram below illustrates the sample distribution within a spectrogram where the Frequency resolution is 64, the Percentage overlap is 50%, the Number to average is 4, and Number of time bins is 2.

Time bin overlap diagram — Sample distribution within a spectrogram. The thick horizontal lines indicate the sections of data passed to the FFT routine, and each of these sections is 128 samples long because the frequency resolution is set at 64. Each power spectrum average is made up of 4 sections with 50% overlap, so each average contains data from 320 separate samples. The total analysis period contains 576 samples, and in the output display, these are divided into two horizontal sections, each 288 samples long. There is thus some bleed-through of information between adjacent time bins in the spectrogram display.

With this background information we can now look at the analyses in action.

Power Spectrum

Open the file power spectrum.

The data were constructed using the expression parser within Dataview, and consist of two sine waves in various combinations. You can see the formulaeNote these formulae come from an earlier version of DataView. Consult the help file for information about formula syntax for the current version. used to construct the data in the trace labels. The frequencies were chosen to match the centres of the frequency bands resulting from FFT analysis. These depend on the Nyquist frequency of 2 KHz, which in turn depends on the nominal ADC sample rate of 4KHz).

Select the Analyse: Power spectrum menu command to open the Power Spectrum dialog.

The power spectrum analysis processes as much of the data displayed in the main view as will fit within the power-of-two sample number constraint. On start-up the default settings are adjusted to maximize the Frequency resolution available from the data with 50% overlap and an Average of at least 4. The user can then adjust the resolution and overlap if desired, but cannot directly set the average since this is constrained by the other values and will always be the maximum possible. In this case the main Viewport is 7500 ms, which allows a resolution of 4096 frequency bins while averaging 6 FFT episodes. The Time needed (used) for the analysis is 7168 ms.

Now let's look at the results:

Uncheck the Log Y box.
Uncheck the Autoscale X box.
Set the right-hand scale of the X axis to 50.

You should now see a single large spike in power towards the left of the spectrum graph.

Power spectrum of 2.4 Hz sine wave — Power spectrum of a 2.441 Hz sine wave.

Hover your mouse over the power spike, and note the read-out in the Frequency box to the left of the graph. It should be about 2.4 Hz, reflecting the frequency of the sine wave in trace 1. Also note the peak of the power value, which is about 0.37.
Look at the text display above the graph. Bin 5 contains the 2.44 Hz power value, which is indeed about 0.37. But also note that there is significant power in the adjacent bins 4 and 6. All the other bins have a trivial amount of power.
Set the Trace ID to 2, and note that the power spike shifts to the right to about 24.4 Hz. This is the frequency of the sine wave in trace 2.
Set the Trace ID to 3, which contains the mixture of sine waves. There are now 2 spikes in the graph. The small one to the left is from the low frequency, low amplitude trace 1 component, the larger one to the right is from the higher frequency, higher amplitude trace 2 component.
Set the Trace ID to 4. This also contains a mixture of sine waves, but the two waves are completely separated in time, rather than being summed together. However, the power output is virtually identical to that of trace 3.

The similarity of the trace 3 and 4 power profile illustrates an important fact about Fourier analysis: within the analysis you completely lose all information about time. In techno-speak, you have moved from the time domain to the frequency domain, and when you gain one set of information you lose the other.

Power values

Set the Trace ID back to trace 1, which shows a pure sine wave.

Note that the sine wave has an amplitudeNote that sine wave amplitude is defined as the distance from the mean (0) value to the peak value. The peak-to-peak value of a sine wave with amplitude 1 is 2. of 1, and the mean square value of a sine wave of amplitude 1 is 0.5, so the peak power for trace 1 should be 0.5. But in fact it is about 0.37. Why?

As noted above and from the text output shown above the graph, some of the power in the 2.44 Hz sine wave is leaking into the adjacent frequency bins. If you sum the 3 power values from bins 4, 5 and 6 in the text output you should get a total power of almost exactly 0.5, which is what it should be. The leakage problem in this case is caused by the windowing, which reduces the accuracy of the power location. Normally, this is an acceptable trade-off because the alternative damage caused by spectral leakage is usually worse. But in this particular case, where the pure sine wave frequency has been tuned to fit the parameters of the analysis, it is better not to window the data.

Select Rectangular from the drop-down FFT window list (which actually means that no window is applied). The spike peak power value jumps up to its correct value of 0.5.
Set the Trace ID to 2, and note that the peak power is now 4.5. The amplitude of trace 2 is 3 times greater than trace 1 and so it will have 9 times the power, and 0.5 x 9 is indeed 4.5.
Set the Trace ID to 3. You can now see the peaks from both sine waves, and their peak power remains the same (0.5 and 4.5).
Uncheck the Autoscale Y box, so that the Y axis scales do not automatically adjust.
Set the Trace ID to 4. The power locations stay the same, but the actual power drops. This is because each sine wave now occupies only half of the data.

It must be emphasised that with normal biological data containing a wide and arbitrary range of frequencies, windowing is definitely the correct thing to do. Dataview provides a variety of window options, but to be honest, there is not much to chose between them when analysing biological data (as opposed to some physics or engineering problem when specific trade-offs may be important). The default Hamming option is usually perfectly adequate.

The bottom line is that quantitative interpretation of the individual power values is rarely attempted for biological signals. What is usually important is the relative power within the different frequency bands.

Shifting the Viewport

You can change the main viewport, and hence the data supplied for analysis, using the toolbar buttons within the dialogThe navigation tools in the main view are inaccessible because the dialog is modal..

Set the Trace ID to 4 (if it is not already at that value).
Click the Expand timebase button ().

Two things happen. First, the power returns to 4.5 because only the high-power sine wave is now within the viewport and available for analysis. Second, the Number to average drops from 6 to 2. This is because we have fewer available samples, but have kept the same frequency resolution. Note the non-linearity between sample length and number to average, which is due to the overlap algorithm.

Click the Expand timebase button again.

We are now warned that we don't have enough data to analyse at this resolution, and the expansion request will be ignored. If we really want to expand the viewport timebase, we will have to manually drop the Frequency resolution.

Zero-padding data

Click the Show all timebase () toolbar button, followed by the Compress timebase button ().

Now we have run off the end of the data file, and the right half of the main viewport is zero-filled. However, this does not affect the location of the peaks in the power spectrum, it just halves their values, since half the signal now has no power.

In this case nothing is gained from the zero-filled data. However, if a recording is just a bit too short to achieve a particular Frequency resolution, then extending the data by zero-filling up to the next power-of-two constraint may double the frequency resolution, with just a little drop in the numerical values of the peak power. And this can sometimes be quite useful.

Log or Linear Display

It is common in spectral power estimation for the power values to be displayed on a logarithmic scale. This enables low-power components to be visible when on a linear scale they would be completely swamped by the high power components. It also fits with Fechner's law that the human perceived experience of stimulus intensity scales with the logarithm of its actual intensity, rather than linearly.

Open the file noise 50. This is an intracellular recording that is heavily contaminated with interference at U.K. mains (50 Hz) frequency (you could use the Dataview de-buzz facility to clean up the recording).
Click the Show all button () in the main toolbar.
Select the Power spectrum command from the Analysis menu.
Uncheck and then recheck the Log Y button.

Logarithmic power spectrum — a. Logarithmic display of the power spectrum. b. Linear display of the same data.

Linear power spectrum — a. Logarithmic display of the power spectrum. b. Linear display of the same data.

It is obvious that much more detail is visible in the logarithmic display. There is high power at the low-frequency (left) side of the display, reflecting the DC component of the membrane potential. There is then a section of medium power, reflecting the PSPs (and interference) in the recording. There is then a drop to very low power. This reflects the cut-off frequency of the amplifier used to make the recordingThis indicates that the data have been oversampled in terms of meeting the strict Nyquist criterion - there is no information at these higher frequencies. However, to achieve a useful join-the-dots display in the main view oversampling is essential, so this is not a mistake. . In contrast, in the linear display only the DC component and a small adjacent spike are visible.

We can look at the interesting part of the spectrum by zooming in.

Uncheck the Autoscale X box.
Set the right-hand X axis scale to 200 Hz.
Check the grid box to put some markers on the graph.
Uncheck and then recheck the Log Y button.

50 Hz mains interference power spectrum — a. Logarithmic display of the power spectrum zoomed in to show the lower frequencies. Interference power spikes are visible at 50, 150, and 200 Hz. Interestingly, the second interference harmonic (100 Hz) has low power. b. Linear display of the same data. Only very low frequency power and the primary interference harmonic are visible.

Linear power spectrum zoomed — a. Logarithmic display of the power spectrum zoomed in to show the lower frequencies. Interference power spikes are visible at 50, 150, and 200 Hz. Interestingly, the second interference harmonic (100 Hz) has low power. b. Linear display of the same data. Only very low frequency power and the primary interference harmonic are visible.

The logarithmic display shows much more detail, but the linear display draws attention to the dominant features of the spectrum. You should chose your display according to what you wish to emphasize.

Spectrogram

Spectrograms are frequently used to analyse sounds such as speech or bird song. We will look at a different type of sound:

Load the file killer whale. This shows the vocalization of a killer whale recorded with an underwater microphone. It started as a standard Microsoft wav file, and was then converted to DataView format.
Select the Analyse: Spectrogram menu command to call up the Spectrogram dialog box. By default the visible display region (in this case the entire file) is analysed. If you wanted to just analyse a subset of the visible display, put 2 vertical cursors to delimit the area of interest.
If you have a sound system connected to your computer, click the Play button near the bottom-left of the dialog. Enjoy!

The dialog is quite complex and what follows is only a brief description. If you want more details on a particular option, press F1 on your keyboard to call up the context-sensitive help.

On the right of the dialog are two graphical displays. The large coloured display is the spectrogram itself – the result of the analysis. The horizontal axis is time, and you can see that it is quite “blocky”. In essence, each block represents the result of a FFT analysis on a sequential chunk of data. The number of blocks is set by the Number of time bins. The vertical axis is frequency, with high frequencies at the top and low frequencies at the bottom. Again, the display is “blocky”, and each block represents a frequency band. The number of bands is set by the Frequency resolution (and is constrained to be a power-of-two). The colour of each block in the display reflects the power of the signal within that time and frequency block. The accuracy of the power is largely set by the Number to average, with higher values giving more accurate results. These parameters all interact with each other, and increasing any of them requires more data for analysis. The details of these parameters were described earlier in the Fine-tuning the Analysis section.

Click the up spin-button to increase the Frequency resolution to 128. Note that the analysis now reads beyond the end of the file, and zero-pads the data.
Reduce the Number to average to 2. We are now again analysing data rather than zeros, but the variance of the power will have increased.
Increase the Percentage overlap to 75% by selecting it from the drop-down list.
You can now put the Number to average back to 4, to analyse the full song.

killer whale spectrogram — Spectrogram of the song of a killer whale. a. The spectrogram dialog. b. The power spectrum at the time of the scrubber bar.

killer whale song power spectrum — Spectrogram of the song of a killer whale. a. The spectrogram dialog. b. The power spectrum at the time of the scrubber bar.

If you move the mouse over the spectrogram display, the time, frequency and power values at the mouse location are shown just to the left of the display. These values change as you move the mouse. If you want to freeze the read-out (perhaps to copy the numbers to the clipboard), just click the mouse. Click again to un-freeze the display.

Below the spectrogram is a smaller display showing the section of data that is being analysed. The Start time (which is relative to the whole data file) can be adjusted here if desired, either by directly editing it, or by using the adjacent buttons. The end time is read only, since it is determined by other settings. If you had a multi-trace file (which this one isn't) you could set which trace is analysed by changing the Trace ID parameter at the top left of the dialog.

Standard power spectrum

You can produce a standard power spectrum (as described previously) for any region of interest in the spectrogram.

Drag the scrubber (the small block between the spectrogram and raw data displays) to the approximate location shown in the figure above.
Click the Spectrum button.

You should now see the display in part (b) of the figure. Note the power spikes occur in the frequency bands of the horizontal redish bars under the scrubber in the spectrogram.

Analyse your own voice

If you have a microphone input on your computer, it might be fun to have a look at a spectrogram of your own voice.

Select the File: Record/stimulate: Microphone input menu command.
Click Start in the Mic/line data acquisition recording window, then say the words "low, high", then click Stop.
Click Save to write your recording to a dtvw-rec file. The file will load automatically once it is saved.
Select the File: Close menu command to close the recording window.
Your voice recording should now become the active file. Click the Show all () toolbar button.
Select the Analyse: Spectrogram menu command.
Click the Linear choice so that you emphasize the main power components.
Set the Upper Hz value to 1500 to zoom inThe actual value will auto-adjust to the nearest FFT boundary. on the lower frequencies (if you are female, you may need to set a higher value).

You will (probably) see that the spoken word "high" has power in higher frequency components (is further up the display) than "low". This is not a result of some deep AI intelligence about the meaning of the words(!) - it's that the vowel sound in "high" is more squeaky than that in "low". In the unlikely event that you want to hear me, the file is my voice.

spectrogram of the words low and high — Spectrogram of me saying "low, high".

Spectral analysis within events: fly song

What can you do if the activity that you want to analyse occurs in relatively short bouts, and these are interspersed with activity with rather different characeristics? One solution is to delimit the regions with one type of activity by events in one channel, and the regions with the other type by events in another channel, and then to analyze the two channels separately.

Load file fly song. This shows the song of a fruit fly.
If you have sound on your computer, select the Sound: Play menu command.

The fly can produce two main types of sound; a burst of clicks which are delimited by events in channel a, and short buzzes delimited by events in channel b. The individual clicks within the bursts are delimited by events in channels c. Silent periods between songs have been made inactive to reduce file size. The aim is to do a separate spectral analysis for the different components of the song.

Activate the Event analyse: Frequency spectrum command to analyse the regions within events.
Set the Frequency resolution to 1024.
Click Analyse to investigate the spectrum within the click bursts identified within event channel a.

The frequency resolution determines the length of the data segments processed by the FFT. The data within each event within the specified event range (from Start at for Count events) are read from file, and segment-length chunks of these data are windowed (i.e. passed through a filter that tapers values at the edges) and then analysed by FFT. Successive segments within each event are overlapped if desired. The power at each frequency is accumulated over the successive segments, and finally the average power for each frequency is reported.

Any data left-over at the end of a segment (including that from events whose total duration is shorter than the segment length) is ignored by default. However, it can be included by checking Pad fragments. In this case, the left-over data is windowed and then the segment is zero-filled to bring it up to the required power-of-two length. This means that genuine data are not wasted and thus the maximum resolution and lowest noise can be achieved. However, the actual value of the power becomes distorted, since the zero-valued data clearly contain no power and so reduce the overall power levels in the averaging process.

With the resolution set to 1024 each FFT process requires 256 ms of data, and with 50% overlap the FFT segments done shows that a total of 102 FFT segments have been analysed from this file.

Check the Pad fragments box, and note that the number of FFT segments done increases to 179. This is because the program can now make use of short events (fragments) by zero-filling them up to the required length.
There is not much information in the right-hand side of the spectrogram, so uncheck the Auto scl X box, and set the right-hand time axis to 1500.
Uncheck the Autoscale Y box so that we can compare power levels more easily.

The spectral density (below, a) reveals a broad power range which peaks at about 170 Hz, but which extends up to about 900 Hz before rapidly decreasing. The latter reflects the setting of the analogue filter on the recording device.

fly clicks — Analysis of fly song. a. The power spectrum of click sections of song. b. The power spectrum of buzz sections of song.

fly song buzz spectrum — Analysis of fly song. a. The power spectrum of click sections of song. b. The power spectrum of buzz sections of song.

Note that the “clicks” which are visually dominant in the recording and occur at about 30 Hz actually contribute very little to the total power. This is because they are swamped by the high-frequency ups-and-downs within each click. To analyse the clicks themselves:

Select the Event Analyse: 2-D scatter graph menu command.
Set the Chan 1 to c. The default frequency vs time graph shows a dominant band at about 30 Hz which is caused by the clicks.
Dismiss the Scatter graph to return to the Spectrum dialog.

Now change the Event channel to b to look at the buzz sections of data.
Click Analyse again.

This time the peak at 170 Hz is much more prominent (above, b), reflecting the more consistently dominant frequency in the “buzz” part of the song. There is also a broader harmonic at about twice the dominant frequency. However, there is less high-frequency power than in the “click” sections of the recording. There are very sharp peaks at regular intervals of 100 Hz; these are almost certainly caused by mains interference in the recording.

Finally, set the Event channel to c and click Analyse.
Uncheck and then re-check the Include fragments box.

In this channel the events are tightly focussed around the individual clicks, and each event is only 25 ms long. The FFT requires 256 ms of data, so for this event channel it is essential to Include fragments since none of the events are long enough to provide a full segment of data for FFT analysis. But with 700 events there are enough data to provide a reasonable average. The frequency profile is very similar to that of channel a, but the overall power levels are lower because of all the zero-filling.

Carrier frequency

The frequency with the highest power level in a spectrum is known as the carrier frequency. You can find the carrier frequency for each event in a channel using the Scan events facility. The frequency is stored within the variable value associated with the event.

Analyse Event channel a again.
Click the Scan events button.
Call up the Event channel properties dialog box from the Event edit: Event channel properties menu command.
Select Carrier frequency from the Numeric display per event drop-down list.
Click OK (or Apply) in the Properties dialog box.

Note that each event in channel a in the main display now has a number associated with it. This is the carrier frequency of the data within that event. The carrier frequency of each event can then be retrieved through the various event parameter options (graph, histogram, list etc).