----- Experience in your own room the magical nature of stereo sound -----

 

 

What's new

 

LX - Store

 

Conversations
with Fitz

 

OPLUG
Forum

 

Basics

The Magic in 2-Channel Sound

Issues in speaker
design

Dipole models

Active filters

Amplifiers etc

Microphone

FAQ's

Loudspeakers

Crossovers

Room acoustics

Stereo Recording and Rendering

Audio production

Conclusions

 

Projects

Your own design

LXmini

LXmini+2

LXstudio

LX521.4

LX521
reference

ORION
challenge

ORION-3.4

PLUTO-2.1

WATSON-SEL

PLUTO+
subwoofer

THOR
subwoofer

PHOENIX
dipole speaker

Three-Box active
system (1978)

Reference
earphones

Surround
sound

Loudspeaker
& Room

 

Resources

Publications

Sound recordings

Links

Other designs

My current setup

About me

Site map

 

HOME

 

------------------
Digital Photo
Processes

 

------------------
The
Sea Ranch

 

------------------
My Daughter
the Jeweler

 

What's new

 

LX - Store

 

Conversations
with Fitz

 

OPLUG
Forum

 

 

 

---  The Magic in 2-Channel Sound  ---  The Importance of Directivity  ---

 

Below is the paper that I wrote for the REPRODUCED SOUND 2015 Conference of the Institute of Acoustics in the UK. 
You can also download the paper as PDF.
PowerPoint Slides and a Sound Track (90 minutes with Q&A, 43.3 MB, 64kbps mp3, mono) allow you to see and hear what I presented. 

 

 

The Magic in 2-Channel Sound Reproduction
 
Why is it so rarely heard?

ABSTRACT 

Hearing, finding the direction, distance and significance of a source of sound in various acoustic environments, is a survival mechanism in the evolution of living organisms.  Hearing two strongly correlated sources of sound, either from earphones or two loudspeakers, is an unnatural phenomenon, from which the ear-brain apparatus is asked to draw an illusion of reality. For the illusion to happen convincingly misleading cues must be eliminated from the sound presentation. In the case of earphone presentation, which typically suffers from a high degree of distance distortion, i.e. distance foreshortening, the ear signals must change with head movement to externalize the illusion.  In the case of loudspeaker presentation there is already the distance between listener and speakers, which typically is perceived as the minimum distance to the illusionary aural scene or phantom scene. But that scene is usually hard bounded by the speakers, which are recognized by the brain and one or the other speaker is preferred as the source, when the listener moves a short distance away laterally from the "sweet spot". 

In a reverberant room, where the listener not only hears the direct sound but also the reflected sound, i.e. the off-axis radiated sound,  the ear-brain perceptual apparatus must be allowed to withdraw attention from the room and speakers and focus attention upon the direct sound to create a convincing illusion of the reproduced acoustic event.  For this to happen misleading perceptual cues must be eliminated. The speakers must be placed so that reflections are delayed relative to the direct sound. The speakers must be free from spurious resonant radiation and their off-axis radiation must follow their on-axis frequency response for the reverberant sound to be neutral. The polar radiation pattern must be essentially either omni-directional, cardioid or dipolar, aiming for constant directivity. The speakers must be acoustically small, yet capable of realistic volume levels at low non-linear distortion. 

Two prototype speakers and the evolution of their radiation pattern design will be discussed: a full-range, acoustically small dipole and a hybrid, omni-cardioid-dipole design. Either speaker is capable of disappearing from perception and rendering an aural scene in a reverberant room that is like a magic act.

 

1   INTRODUCTION

I claim that few stereo listeners, audiophiles, or audio professionals have heard what stereo is truly capable of because loudspeakers for domestic use are generally not designed with a reverberant playback space in mind. Furthermore, phantom sources, which are key elements of stereo reproduction, are unnatural phenomena; we are programmed by evolution to detect direction and distance of real sources of sound in a multitude of reverberant environments1. Stereo sound reproduction in a reverberant room must therefore be treated like the creation of a magic trick for the ears - a trick in which the listener withdraws attention from the loudspeakers and the room.  What remains to enjoy is the phantom auditory scene.

The room, loudspeakers, and recording participate in the creation of the illusion. Most important are the loudspeaker's polar pattern and the placement of the speakers in the room.  And though the room is often blamed for resulting poor stereo performance, it is mostly the loudspeaker and the source that illuminates the room with sound.

1.1   Hearing
Hearing has evolved as an effective survival mechanism for detecting direction, distance, and meaning of a source of sound in an environment where there may be multiple sources of sound and reflections. Hearing still operates like that today when we automatically turn our head towards a surprising or unfamiliar sound and figure out what is causing it and how far away it is. Having identified the source we respond either by running, watching out, ignoring it or being annoyed.

In our brain, we carry a vast catalog of familiar sounds. Voices of different individuals are recognized almost immediately even over a phone connection of poor quality. When we hear a voice coming from somewhere, we usually know whether it is from a loudspeaker or a real person. Why is that? Every loudspeaker has a sonic signature, a radiation pattern2, and a power response which defines how it interacts with the space around it. A person's voice has a sonic signature and, unless a loudspeaker is build to mimic that signature, it is easy to distinguish a live person from a loudspeaker reproduction, especially in a reverberant environment. Musical instruments, of course, also have their sonic signature3 be it a violin, cello, clarinet, flute, piano,  etc. Since it is impossible to build a loudspeaker that can assume all those signatures, the optimum solution will be a loudspeaker which does not add its own signature to the reproduction of a recording, i.e. a loudspeaker with a neutral signature when radiating into 4 reverberant space.

1.2   Anechoic Stereo Reproduction
It is instructive to first look at stereo reproduction under anechoic conditions; hearing under those conditions is not a natural experience, but it shows us how the ear/brain apparatus copes with such situations. We will start by looking at both headphones and loudspeakers.
 

1.2.1   Headphone Stereo
Headphones render sound under almost anechoic conditions. The cavities between eardrum and headphone diaphragm, whether from circum-aural, supra-aural or intra-aural phones, often color the sound. Most importantly, perceived distances to the auditory scene are foreshortened, which is an unnatural phenomenon. It is easy to make binaural recordings where a person whispers into your ears, but all sources of sound are closer than they were in reality. Typically, sounds in the frontal direction are perceived as originating inside one's head.
Figure 1. Much engineering effort is spent these days on embedding Head Related Transfer Functions into the signal transmission path in order to present sound sources in 3D-space.  But HRTFs are highly individualistic. Thus the results of such efforts are very limited unless the HRTFs are made to change with head movement.

With head tracking, sound sources can be placed at any distance from and around the listener. I am convinced from personal experience that hearing a change in HRTF with head movement is much more important than the individualized correctness of the HRTF. A headphone system with head tracking is clearly capable of the magic I am talking about as long as one can be comfortable with transducers occluding the ears and placing the listener into a virtual reality space. 

 

Figure 1. Headphone stereo. Localization of the auditory scene inside the head (a), and outside the head when the HRTF of the ear signals tracks with head movement.

1.2.2   Loudspeaker Stereo
Anechoic, reflection free spaces are essential for loudspeaker measurement and design; they are also essential to studying directional hearing effects.  For instance, assume that two identical loudspeakers and a listener are set up in equilateral triangle fashion in an anechoic chamber. Figure 2. Identical signals are fed to each loudspeaker. If the listener's head had been blocked - i.e. not allowed to move - and there were no visual cues as to the loudspeaker location, then the listener hears a phantom source inside his head between the ears (a), just as with headphones.

 If the listener is allowed to turn his head, he then perceives the phantom source in front of him (b2) at approximately the distance to the invisible loudspeakers. For highly directive speakers the center phantom may actually appear in front of the speakers (b1). If the source signal is noise and the listener moves a small distance laterally, then a change in tonality is heard due to changing interference of left and right loudspeaker signals at each ear (c). For greater lateral shifts (d) the signal collapses into the nearest loudspeaker and head turning confirms the location of the physical source of sound. 

Figure 2. Loudspeaker stereo under anechoic conditions. Phantom inside clamped head (a). Externalized phantom with head turning (b).  Tonality changes (combing) with small lateral shifts (c). Large lateral head shifts and jumping of phantom to the nearest speaker (d).

In the recording/mixing process,4,5,6 monaural sources are level panned to locations between the speakers and amplitude and phase differences between the outputs from one or more directional microphone pairs are used to render an acoustic scene between loudspeakers. An off-center phantom source, such as example (d) for a centered listener at (a), can be created by a larger amplitude signal from the left than the right loudspeaker. Below 800 Hz the superimposed loudspeaker signals at left and right ears are converted into timing differences7,8,9,10 between the ears (ITD) as if they were generated from a real source at location (d). Higher frequency content above 2 kHz with larger amplitude from left than right loudspeaker mimics the head shading (IID) effect and stabilizes the off-center phantom. But transient signals can quickly lead to identification of left or right speaker as the real source. It is a task for the mixing engineer to distribute a phantom scene between the loudspeakers. An acoustically dead room is generally preferred as work environment because it allows him to hear more clearly while making decisions. But it is an artificial environment.

 

 

2   ROOM RESPONSE

2.1   Room Reflections
While anechoic conditions are useful for studying directional hearing, one must be careful to translate the findings directly to situations where multiple reflections of a signal occur. Again, this is a situation that is predominant in natural hearing and where evolution has optimized the signal processing between the ears for survival purposes. For example, it is important not to be distracted by reflections in finding the direction from which a sound is coming. Psychoacoustic research has shown that a first reflection11, which occurs shortly after the direct signal (within <25ms), must be stronger than the direct signal before it shifts the direction of the first arriving signal. A second12 reflection from a different direction has to be even stronger than the first reflection to shift direction. But later reflections (>30 ms) draw increasingly more attention unless their amplitude decreases with longer delays. This makes sense because late reflections could actually be coming from a second source.

A loudspeaker in a room produces a large number of reflections13 and perceptual issues become difficult to study in detail because of the large number of signal streams that arrive at each ear of a listener. Figure 3. Matters become even more complicated when the loudspeaker changes polar characteristics with emitted frequency; speaker L is typical for the vast majority of box loudspeakers. These speakers radiate omni-directional at low frequencies and become increasingly more forward directional with higher frequencies while maintaining a flat on-axis response. Consequently such box loudspeaker illuminates the room quite differently from a constant directivity dipole - for example speaker R. The dipole's reflections produce different superimposed sound streams at the ears of a listener even when they arrive from the same directions as those of a box loudspeaker in the dipole's location. 

Figure 3. Direct signals and some of the reflections at the listener's ears for two types of loudspeakers: Dipole R with frequency independent radiation pattern and typical box loudspeaker L, which radiates omni-directional at low frequencies and becomes increasingly forward directional with increasing frequency.

Sound from a loudspeaker near the corner of floor and two walls produces at least 7 reflections13. Figure 4. Three of these are first order reflections, three are second order and one is of third order. In reality there would also be ceiling reflections and reflections from a speaker in the left room corner. The direct and reflected signals bounce around in the room building up the sound pressure level (SPL) of the reverberant field and reaching a constant level nearly everywhere in the room if the source signal is sustained in SPL.

Figure 4. A dipole loudspeaker D near the corner of three intersecting surfaces and its images behind the perfectly reflecting surfaces. The images define the direction and the path length of the reflection. In combination with the polar diagram of the loudspeaker and the absorptive/diffusive characteristics of the surfaces the images also define the attenuation of the reflection at any point in front of the loudspeaker. First order reflections S, F, R, second order reflections S+F, R+F, R+S and third order reflection R+S+F.
Figure 5. Example of a 1.25 ms burst signal and its room reflections as they arrive at the listening position during the first 50 ms. The burst amplitudes are progressively attenuated vs. time as the signal has traveled greater distances and hit multiple surfaces.
Figure 6. As Figure 5 but on expanded time scale to show more clearly the decay of the room reflections of the initial burst signal at the listening position. The 3200 Hz narrow band burst decays 60 dB in 319 ms below the strongest initial reflection.

Figure 5 and Figure 6 are real world examples giving an indication of the complexities with which the ear-brain hearing apparatus has to deal in order to find the direction and distance of the physical source. Obviously, fewer or weaker reflections make the task easier. With stereo and two loudspeakers we are not interested in the physical sources but the phantom sources and the auditory scene created by the direct loudspeaker signals. So the question becomes how to keep the reflections from becoming a distraction and how to move the room beyond the listener's acoustic horizon of attention.

2.2   Room Resonance Modes
Domestic listening rooms are acoustically small at low frequencies, where their largest dimensions are less than half of a wavelength. Sustained sounds will set up standing waves14,15 causing uneven distribution of SPL in the room. Figure 7. The position of a loudspeaker in the room, its low frequency radiation pattern, and the sound absorptive characteristics determine to what degree these resonant modes are excited. Whether the source radiates omni-directional, like a dipole or a cardioid, the longitudinal mode in Figure 7 will be set up. Only if the whole rear wall is totally absorptive or behaves like an open window will there be no standing wave. This is because there is no reflection back to the front wall. Standing waves are often problematic, particularly for a loudspeaker
, which radiates more energy at low frequencies into the room than at higher frequencies like box speaker L in Figure 3. A loudspeaker which is directional even at low frequencies - like the dipole R in Figure 3 or a cardioid loudspeaker - changes the coupling to offending modes by turning it.

Figure 7. Standing wave (room resonance, longitudinal mode) example for a room of 6.88 m length. For a continuous 50 Hz tone listener (a) sits at a SPL minimum, which occurs at 1/4-wavelength from the rear wall. Listener (b) sits at a SPL maximum, but at 25 Hz he would sit in a minimum. Listener (c) against the rear wall is in the maximum SPL region for all room modes.

Furthermore if the loudspeaker maintains the same polar pattern over the whole frequency range, then energy distribution from low to high frequencies in the room is only a function of the room's absorptive characteristics. Thus, bass from dipole loudspeakers is reproduced with greater articulation16 and more evenness at different listening room locations than from box speakers. By articulation I mean that the envelope modulation of a bass signal is better preserved for different locations in the room. Any ambiance from the recording venue will be heard more clearly because the listening room is illuminated neutrally.

2.3  Reverberated Sound Field
If domestic rooms were simple rectangular boxes with known sound absorption coefficients for their boundary surfaces and without furniture in them, then it would be possible to predict14 the large number of modes that could be excited by a loudspeaker. Table 1. The number of modes 'N' increases with frequency as does the number of reflections since it takes two or more boundaries to set up a mode. In the example a) of a room with proportions
deemed to have acceptable  maxima and minima mode spatial distribution b), there could be up to 55 modes excited by the loudspeaker below fm = 150 Hz depending upon its placement in the room and its radiation pattern below 'fm', c). In general, as frequency increases the average frequency separation of modes 'df' decreases, being down to 1.6 Hz at 'fm'. Each mode has a 3 dB bandwidth 'bw' and corresponding reverberation time 'T60', which is determined by the wall absorption properties 'a' at the frequency of excitation, Table 1 d). A wall or surface absorption coefficient of 25% means that 1/4th of the room's surface area 'S' acts like an open window for sound to escape. That is a large equivalent  area. It would have to be increased to 45% if a reverberation time of 250 ms were targeted, which is only practically achieved for this size room by the addition of bulk absorbers and resonant absorbers. Such short reverberation times are useful for mixing studios with conventional box type monitors. But for domestic listening and the type of controlled directivity loudspeakers that I discuss later18, they are not desirable. I have found a normally furnished room with diffusive and absorptive elements and a reverberation time around 450 ms to be optimal.

Assuming a 456 ms reverberation time the mode bandwidth 'bw' becomes 4.8 Hz, Table 1e). The bandwidth 'bw' is inverse proportional to 'T60' and is the same over the whole frequency range if the reverberation time is constant.

Table1. Acoustic properties c) and e) of an unfurnished rectangular room with dimensions a) and estimated wall surface absorption d).

 With increasing frequency the mode separation 'df' decreases; at the Schroeder frequency14 more than two modes fall within 'bw'. The Schroeder frequency roughly defines the boundary between the modal region below and the reverberant sound field above. For domestic size listening rooms 'fs' lies between 100 Hz for dead rooms and 200 Hz for fairly live rooms. The sound field above 'fs' takes on a uniform SPL distribution in the room for distances from the source, which are greater than the reverberation distance 'R'. Table 1 e). Direct and reverberant SPL are equal at 'R'. The reverberation distance18 is rather small but significant by indicating that a listener seated in the reverberant field would experience a higher direct-to-reverberant sound ratio (D/R) for the more directional source, i.e. 4.8 dB for the ratio of dipole 'Rd' compared to the monopole 'Rm'. In other words a dipole source in a room with 450 ms reverberation time is equivalent in terms of D/R to a monopole in a room with 150 ms reverberation time.

The modes below 'fs' and the reverberated sound field above 'fs' build up with Trise = 0.32 T60 when sustained acoustic energy is supplied to the room15. It should be noted that 'Trise' is large compared to the duration of high frequency transients, meaning how well transients are heard at different locations in the room depends primarily on the dispersion of the direct sound from the loudspeaker and its reflections.

 

 

3   THE MAGIC IN STEREO

3.1   Typical Stereo Reproduction
Domestic listening rooms come in all shapes and sizes and rarely follow the simple model in Table 1. Except for a few of the lowest order modes and primary reflection areas, it becomes exceedingly difficult to make predictions about any room's potential acoustic behavior. Reverberation time is best measured and gives a general description for different frequency bands. How useful that is depends upon the loudspeakers that will be installed. I will even claim that how the room responds to sound is much less of a problem than how the loudspeaker illuminates the room.

How a loudspeaker will illuminate the room can in most cases be predicted by visual inspection of its shape, its physical dimensions, driver sizes and layout, which all help determine acoustic dimensions and diffraction effects. The vast majority17 of loudspeakers are constructed as rectangular boxes of various sizes, with narrow and tall front baffles, vertically aligned drivers and a vent either in front or in back. The tweeter and the design axis are positioned at seated ear height. Figure 8. There are variations to the front baffle design with rounded edges, a narrower baffle for the tweeter or staggered baffles for "time alignment" of the drivers. The driv