Skip to content

Introduction to Spectrum Analysis

by David Courtney working tools

This arti­cle ori­gi­nally appeared in the September 1992 (volume VIII #1) issue of “Experimental Musical Instruments”, published from Nicasio, CA. Page 18-22

Introduction

The technique of designing mus­ical in­stru­ments has not chan­ged much in the last seve­ral thousand years.  A maker builds an instrument, listens to the tone, then repeats the en­tire pro­cess with a slight change in construction.  This is a tedious pro­cess and one often thinks that it could be easier if there was a way to “see” the sound.  Spectrum analysis is a tool that gives us the ab­ility to see the timbre.  In this arti­cle we will dis­cuss its var­ious aspects; including sampling theory, pro­cessing, and graphic output.

Background

The graphic re­pre­sentation of sound has been an area of interest for years.  The ear­liest experiments focused beams of light ag­ainst a mirror which was at­ta­ched to a vibrating object.  This technique was used ex­ten­sively un­til the twentieth cen­tury when the oscilloscope was in­ven­ted.  Both light beams and oscilloscopes give a graphic re­pre­sentation of the vibratory nature of sound.

Musical sounds are us­ually visualized as “waves” of air that vibrate with a par­ti­cu­lar frequency. This frequency is expressed in cycles per se­cond; how­ever, instead of saying “cycles per se­cond” we say “Hertz”.  The range of human hearing is said to extend from 20 Hertz to 20 Kilohertz (i.e., 20 cycles to 20,000 cycles-per-second).  This range is re­fer­red to as the “audio spectrum”.

However, day-to-day sounds and mus­ical sounds consist of a mix­ture of dif­fer­ent frequencies.  It is the nature of this mix which helps to determine timbre.  There­fore, by look­ing closely at these compo­nent frequencies we get insight into the timbre of any sound.  This is spectrum analysis.

The pioneer of spectrum analysis was undoubtedly Hermann von Helmholtz.  He developed a series of hollow glass spheres with care­fully calibrated resonance frequencies.  They would vibrate when excited by the appropriate frequency and one could hear this by placing them ag­ainst the ear.  It was a very tedious pro­cess, but with these simple devices he was a pioneer in the field.

The Helmholtz resonators had their problems.  The were awkward and the lack of a graphic output meant that only a sub­jective evaluation of the compo­nent frequencies was pos­sible.  By the later part of this cen­tury they were replaced by to­tally electronic techniques.  Unfor­tunately, they were very expensive.

However today these once expensive spectrum analyzers are within the reach of the average instrument maker.  This is a consequence of the rapid drop in the price of digital electronics.  $200 and a per­sonal computer is all that one requires to enter the world of spectrum analysis. Table 1 is a small list of available packages. (NOTE – This arti­cle was published in 1992. Products and pricing are not cur­rent.)

TABLE 1

PRODUCT NAMEHARDWARE ENVIRONMENTMANUFACTURERSTREET PRICECOMMENTS
Digital Sound StudioAmigaGreat Valley Products$100Hardware / Software
Compuscope / GageCalcIBMGage Applied SciencesN/AHardware / Software
MacRecorder Sound SystemMacintoshMacromind$175Hardware / Software
MacRecorder ProMacintoshMacromind$240Hardware / Software
AlchemyMacintoshPassport Designs$695Software only

We have briefly reviewed what spectrum analysis is. It would be very appropriate to dis­cuss the technical de­tails.  One of the most fun­da­men­tal is the pro­cess of taking the sound and putting it into the computer.  This is a sub­ject known as sampling.

Sampling

If the computer is going to do our work, we have to find some way to get the music into the computer.  The hardware and software, with all of the myriad of technical con­si­dera­tions has been the topic of nu­mer­ous books, and dissertations.  How­ever the essen­tials are surprisingly simple.

The hardware in our sampling pro­cess revolves around a specialized peripheral called an Analog-to-Digital converter.  This device, us­ually called A/D converter for short is re­spon­sible for taking the analog signal and converting it into discrete num­bers that the computer can pro­cess.  These discrete num­bers are our samples

The con­cept be­hind sampling is quite simple.  The waveform in figure1-A can be sampled and expressed as figure 1-B.  This is si­mi­lar to the operation of a motion pic­ture camera.  Just as an event may be captured on film as a series of still frames, so too an audio signal may be captured as a series of discrete values.

Digitizing the wave

The con­cept may be quite simple but the implementation may be quite complicated.  There are a num­ber of fac­tors which must be kept in mind.  The two most impor­tant are the sampling rate and the resolution.

The sampling rate is an op­tion on most computer sys­tems.  But how fast should it be?

We must turn to the Nyquist theorem to help us find the cor­rect sampling rate.  It tells us that the sampling rate must be greater then twice the highest frequency to be encoun­tered.  Any attempt to sample at a lower rate re­sults in a pheno­mena known as aliasing.

Aliasing is where the frequencies above the Nyquist point (half the sampling rate) be­come ref­lected back down the audio spectrum.  This is il­lus­tra­ted in figure 2.  It is very much like the move­ment of the wheels in the old films.  If the wheels were moving slowly, the camera has no trouble “sampling” the event.  How­ever, as the wheels go fas­ter the apparent motion tends to slow down.  At a cer­tain point the wheel ap­pears to stop, thereafter it ap­pears to go backwards.  This apparent retrograde motion of the wheels is analogous to the aliasing which occurs in digitized audio signals.

Aliasing Noise

The resolution is an­other con­sideration.  Most low cost sys­tems default to eight bits.  An 8-bit code has 256 pos­sible com­bi­na­tions.  There­fore the maximum resolution that one could expect from an 8-bit code is 256 steps.  There are sys­tems which are capable of pro­cessing up to 16-bit codes.  This gives 65,536 pos­sible steps!  How­ever these sys­tems cost more than the average instrument maker would be wil­ling to spend.  For the pur­poses of the average craftsman an 8-bit resolution is quite suf­fi­cient.

This digitizing pro­cess, with all of its con­si­dera­tions is the first step.  How­ever merely putting the information into the computer is insufficient to produce any useful re­sult.  The data must be pro­cessed to yield the frequency information.

Processing

The key to spectrum analysis lies in the computer pro­cesses. These pro­cesses are variations upon an ext­remely complicated field of mathematics known as Fourier transforms.  The utility of the Fourier transform is underscored by the fail­ure of simpler meth­ods to yield clear information about mus­ical timbre.

The oscilloscope is a classic example of the inadequacy of a simpler tech­no­logy.  Virtually any instrument maker can afford to purchase an oscilloscope.  Yet the images that appear fail to give much information about timbre.  It fails be­cause the oscilloscope func­tions in what is called “Time domain” while our perception of timbre depends upon something called “Frequency domain”.  These are re­fer­red to as “inverse domains” of each other.

The con­cept of the inverse domain may sound very intimidating but it is based upon a simple idea.  Let us begin by look­ing at figure 3.  Here is a simple ques­tion.  Which one is the quar­ter?  We know that both images re­pre­sent the same object even though they look abso­lutely nothing alike.  Once we accept the fact that to­tally dif­fer­ent images may re­pre­sent the same object, we have made the first con­ceptual breakthrough in the under­stand­ing of inverse domains.

Two sides of the same coin.

A fur­ther under­stand­ing of inverse domains is seen in com­mon wall cur­rent.  Wall cur­rent (60Hz, 120V) is graphically shown by the two diagrams in figure four.  Figure 4-A shows voltage as a func­tion of time.  This is the standard sine wave which is fami­liar to most peo­ple.  Figure 4-B shows voltage with respect to frequency.  This shows a sin­gle spectral line at 60Hz.  It does not require a strong technical or mathematical back­ground to see that both of these diagrams re­pre­sent the same phe­no­menon.

Inverse domains

The rea­son that these two re­pre­sentations are re­fer­red to as inverse domains is equally simple.  The time domain diagram (fig. 4-A) shows the period as be­ing .01667 sec.  The Frequency domain (fig. 4B) shows the frequency as be­ing 60Hz.   The re­la­tionship is simple:

Reciprocal re­la­tionships

We see that this is a simple reciprocal re­la­tionship.  It is be­cause of this simple re­la­tionship that they are called inverse domains.

Unfortunately, the real world con­di­tions do not allow us to take a simple reciprocal and ob­tain our spectra.  To derive spectra from comp­lex sounds we are forced to perform what is called a Fourier transform.

The Fourier transform may be visualized as a magic “Black Box” which is able to convert time domain to frequency domain. There are nu­mer­ous algorithms to accomplish, how­ever the most com­mon is an algorithm known as the “Fast Fourier Transform”. This par­ti­cu­lar algorithm is us­ually abbreviated as FFT.  The FFT is the most com­monly used algorithm for small computer sys­tems.

The Fourier transform was developed by Jean Baptiste Joseph Fourier in the begin­ning of the 19th cen­tury.  The life of Fourier would make an inter­est­ing book in its own right.  He was suc­cess­ful at politics, sciences, and mathematics.  It is also curious that the mathematical pro­cess that made him immortal was not developed for acoustics.  It was instead developed du­ring the course of his work on thermodynamics.  How­ever to us it is his “black box” that con­verts time domain to frequency domain which is impor­tant.

Although the Fourier transform may be visualized as “black box” there are still some con­si­dera­tions which should be observed.  Primarily we need to keep in mind the ef­fects of our sample.

The size of the sample is ext­remely impor­tant.  This is be­cause the amount of information which goes into the pro­cess is going to be the same as the information which comes out.  The Fourier transform merely changes the form of the information.  It does not generate nor destroy information.  There­fore a larger sample will give us a higher frequency resolution.  Let us say that we transform a sample which has 1024 points.  Our output will have 512 frequency bands.

At this point the attentive reader will be saying “Hey, that is only half the information which went into the transform.  Where did the other information go?”  This would be a convenient place to zoom into the stratosphere with an esoteric dis­cus­sion of imaginary num­bers, but we will not do that.  The simple fact is that the other half of the information is the phase re­la­tionship of the var­ious frequency bands. Therefore the 1024 point sample was transformed into 512 frequency bands and the corresponding 512 phase re­la­tionships.  How­ever, this phase information is gen­erally ignored.

There are si­tua­tions when a cha­rac­teristic of the sample produces a frequency which is not in the ori­gi­nal.  This is called an artifact.  Aliasing is one example of an artifact.

There is an­other artifact which is par­ti­cu­larly troublesome for the Fourier transform.  This arises when the sample does not correspond to an even num­ber of per­iods.  We find that the Fourier transform presumes that it is dealing with an even num­ber of per­iods and generates the frequency information ac­cor­dingly.  There­fore the presumed waveform from the sample in figure 5-A would be the waveform in figure 5-B

Sampling Artifact

This artifact points to a fun­da­men­tal weakness of the Fourier transform.  The pro­cess presumes that there is a repeating pattern and that the sample conforms to an even num­ber of per­iods.

Unfortunately, real world sounds tend to show an ab­sence of such simple repeating patterns.  This ab­sence is us­ually der­ived from seve­ral mechanisms.  The first is a random compo­nent in the sound (i.e., white noise).  The an­other is the ef­fect of the envelope (i.e., the at­tack and decay of the sound).  And an­other deals with dif­fer­ent envelopes for each compo­nent frequency.  Although such fun­da­men­tal inconsistencies exist bet­ween the presumptions of the Fourier transform and the real world, this does not weaken the value of the pro­cess.  It merely means that we must be conscious of the artifacts and how they may influence our final re­sults.

Usually these artifacts are of such a low amplitude that we do not need to worry about them.  How­ever, if one suspects that an area of interest may be an artifact, the easiest thing to do is to resample with a dif­fer­ent sample size.  If the par­ti­cu­lar compo­nent shows wide variation, it is prob­ably an artifact.  If it shows a cer­tain consistency then it is prob­ably a legitimate compo­nent.

We have seen that the Fourier transform is the major tool by which we are able to ob­tain the frequency information from a sample.  We have also shown that there are cer­tain con­si­dera­tions which should be observed if the transform is to be reliable.  How­ever we have not dis­cus­sed one of the most impor­tant aspects of the pro­cess.  That is the graphic re­pre­sentation of the information.

Output

The output of the spectrum analyzer is of prime importance.  This is what is going to be interpreted by the instrument maker.  An unintelligible output renders the whole sys­tem worthless.

Undoubtedly a simple numeric table would be the most fun­da­men­tal computer output.  After all, the Fourier transform is just a mathematical pro­cess which takes in num­ber and spits out num­bers.  Unfortuanately, this is not an intuitive way to read the data.  It is for this rea­son that a numeric output is not com­mon for spectrum analyzers.

The simple X/Y plot is the most com­mon form of output.  This sim­ply plots the data from the Fourier transform in standard Cartesian coordinates.  The X axis is conventionally fixed as frequency and the Y axis is conventionally fixed as the amplitude.  Fur­ther­more there is a tendency to “fill” the diagram to make it visually more ap­pealing. Figure 6 is a typical X/Y spectrum of a guitar with a black fill.

X-Y plot

The simple X/Y has one disadvantage.  It does not have the ab­ility to show how the spectrum changes with respect to time.  It is a cha­rac­teristic of acoustic in­stru­ments that the spectrum is not fixed but changes over the course of time.  If we take repetitive samples and plot them on the Z axis, then we can better illustrate the timbre of an instrument.

This is the principal be­hind the 3-D wireframe.  In figure 7 we see a 3-D re­pre­sentation of the sound of a mridangam.  There are seve­ral char­acter­is­tics which may be seen that would not be apparent in a simple X/Y plot.  For insta­nce there is a moderate compo­nent of white noise (random vibration) in the initial sounding.  This is indicated by the unusually broad peaks and the large degree of filling bet­ween them.  The initial spectrum very quic­kly dies away and is replaced by a re­la­ti­vely stable 2nd, 3rd, and 4th harmonic.  There is a peak in the se­cond harmonic at an unusually long period after the drum was excited.  All of these are char­acter­is­tics which are clear when viewed as an 3-D wireframe but would not be so evident in a simple X/Y plot.

3-D plot

There is an­other way to re­pre­sent the same information in a 2-D format.  This is in the form of a “sonogram”. This par­ti­cu­lar form of re­pre­sentation gained wide pop­ular­ity in the pre-computer era be­cause it lent it­self well to analog techniques of spectrum analysis.  This technique uses the X axis to display time and the Y axis to portray frequency.  The amplitude is denoted by the darkness of the print.  This method is still in use today in voice-print analysis, how­ever for vir­tually all other applications it is on the de­cline.

Sonogram

All of the previous examples utilized a linear method of pres­enting the information.  That is to say that each unit of time or voltage corresponded to a sin­gle unit of vertical or horizontal displacement.  How­ever, this one-to-one re­la­tionship is inconsistent with human perception.

Haven’t you al­ways wondered why when you walk into a dark room and turn on a light it gets bright but when you turn on two lights it doesn’t get twice as bright.  This is be­cause human perception is not linear.  Sometimes spectrum analyzers allow you to look at the spectrum in a non linear fash­ion somewhat analogous to the way we hear.  This is re­fer­red to as a power spectrum while the normal linear graph is re­fer­red to as a normal spectrum.  Figure 9 (A & B) shows both the normal spectrum and the power spectrum of steel drums.

Normal - Power spectra

It is apparent that that the power spectrum shows much more de­tail than the normal spectrum.  Unfor­tunately it takes some practice to properly interpret the relative values of the compo­nent frequencies.  The choice bet­ween displaying the power spectra or normal spectra is often a ques­tion of per­sonal choice.

We may summarize the whole topic of output quite sim­ply.  Although the output from the Fourier transform must be numeric, vir­tually every package gives a graphic output.  These may a standard X/Y plot, the older spectrogram, or the much more attractive 3-D wireframe.

Conclusion

Spectrum analyzers are not out of the reach of the com­mon man.  Software/ hardware packages are now in the range where al­most anybody can afford one.  How­ever, the comp­lexity of the sub­ject still means that there has to be a cer­tain attention to de­tail.  If the nature of sampling and the quirks of the Fourier transform are known, it may be a useful tool for vir­tually any ser­ious instrument builder, es­pec­ially with an appropriate graphic output.