Microphones: Stereo 3D sound

The original meaning of the word stereo is 'solid'. Our dual spaced (left-right) sensory system enables us to have a 3 Dimensional perspective of the world. 3D perception is dependant on angular perspective with comparative detail. The greater the amount of comparative detail, the richer the 3D perception becomes. This is easily noticed with our visual sense.

In pitch blackness it is difficult to see if a single spot of light is a firefly sitting in a tree or a galaxy a million light years from earth. Or how far a tennis ball is when coming straight at you. Also with sound it is not possible to experience 3D sound with small low fidelity sound systems, or with music that has been excessively compressed, or in reverberant environments.

Techniques for stereo imaging have improved over time, and can be artificially created in post-production. Some classical music is skilfully recorded with stereo image, where there is consumer awareness.

Stereo photography with stereo viewers were popular pre 1970 and still have a small following today. The stereo 3D image requires that each eye separately view the left and right pictures. The eyes are very close and forward facing which is why the division is required. However it is possible to see a limited 3D image without division if focus can be shifted forward or back from the pictures. This requires skill and correct viewing position.
www.stereoscopy.com

Stereo recording was researched early in recording history. Our ears are at 180deg (6in 150mm) apart enabling a wide detailed 3D auditory experience. Two microphones spaced similarly to our ears, recorded onto separate tracks and played back through headphones. Similar to the visual 3D stereo experience, the recorded auditory 3D experience requires the ears to separately hear each sound source as with headphones. The result is a 3D quasi-spherical experience similar to reality. Many companies including Neumann have dummy head stereo mics available. There is also a low cost DIY dummy head project available from (ESP Elliott Sound Products) sound-au.com/project112.htm

Sweet spot A limited 3D stereo effect from speakers can be heard from a sweet spot of an equilateral triangle, in the forward plane. But this effect is limited to an off center position no greater than 3dB difference between the speakers. The larger the triangle the larger the sweet spot to include more people. This is acceptable because we naturally sit in front of a live musical performance. This effect is also noticed by people who choose to sit in the center middle of cinemas. Sound that appears from the other directions as echoes or people coughing is distracting.

Listening to forward facing stereo speakers (as in the above pic) enables both ears to hear both speakers, and therefore limits the ability of hearing localisation of separate sounds, except from the extreme positions of left and right. Another way to understand this, but must not be taken literally, is to imagine that the left right sparkers represent the perimeter of a single very large imaginary speaker.

Depth of field A symphony orchestra has a rich depth of field created by the immense comparative detail within the music. A single note of a flute has zero depth of field. Field depth is created by the richness of comparative and harmonic nuance detail within the music. Small low fidelity speakers can only reproduce a limited depth of field. Depth of field is also limited in reverberant environments and is not obtainable from excessively compressed recordings.

The enjoyment of listening to a live Jazz band (non-amplified) is not dependant on sitting in the middle to hear spatial left right orientation. Enjoyment is achieved by being close enough to hear a depth of field, which can be achieved from almost any angle. Depth of field is the primary influence for hearing realism, whereas localisation left right has little effect on realism. A single large full fidelity speaker system that gives a depth of field, will sound more enjoyable and realistic, than 5 small low fidelity home cinema speakers, that provide localisation only.

A simple experiment is to listen at low level to a single speaker within 100mm / 4in. Notice that a depth of field can be heard within the music but disappears when the listening distance is increased. Then listen with 2 speakers and notice that depth of field can be heard from a greater distance as the speakers are moved apart. The depth of field also remains semi-stable over a listening angle of approx 60deg.

Reducing the level of one speaker, causes the depth of field to diminish, regardless of the listening angle. Therefore both speakers must be at the same level. Some teenagers will be seen lying down listening to a small stereo, at low level, very close to their ears. Others may listen at a higher level, at a distance, with speakers further apart.

Propagation is the ability for the sound system to provide full bandwidth of the music over distance. Most small domestic 2 way speakers cannot achieve this. The speakers have to be very close together to enable lower voice and bass energy to combine to obtain lower frequency propagation. Therefore the majority of pop music is recorded as dual mono (pseudo-stereo) to increase the propagation of sound energy. True stereo to obtain realism depth of field is often ignored. Understandably each left and right speaker should independently be capable of providing full bandwidth propagation, for a true a stereo field to be obtained.

Mono comb filter Panning to center so both speakers are reproducing the same sound, to increase propagation, can only be heard accurately from exact center between speakers. But off center, the different path lengths from the speakers create a comb filter effect which causes cancellations, reduces hi frequency energy and decreases intelligibility. This effect can be used positively to decreasing the sharpness of a voice or instrument, particularly drums to obtain a softer splash. An instrument paned hard left or right will sound sharper and therefore closer.

3D Spatial realism

3D Spatial realism requires a minimum of 2 stereo fields. A single field from only 2 speakers enables one part of the 3D experience to be obtained. This point is maybe fully understood by recording engineers who also have electro-acoustic engineering backgrounds not understood by pop recording engineers who do not have science or engineering backgrounds. As in the below pic three speakers can create three stereo fields from which fields 1 and 2 create spatial localisation. This means that musical instruments and be positioned and maintained into a left - center - right correlation over a 60deg listening angle. The 5.1 protocol for home cinema puts the rear speakers at too great an angle to be correlated into stereo fields. The rear speakers are for novelty effect only.

Panoramic 3D spatial realism with 5 speakers (as in the pic below) enables 10 stereo fields which can be correlated to replicate a full symphony orchestra, providing the speakers are full fidelity. Directly adjacent speakers 1, 2, 3, 4, provide the primary stereo fields giving accurate localisation of instruments which should contain high frequency detail with transients. Fields 5, 6, 7, can generalise sections of instruments or choirs. Fields 8, 9, 10, are wide apart, and most listening positions will be off center which will cause the closest speaker to be heard only.

The procedure for obtaining 3D spatial realism requires most if not all sound sources to be separately miked in stereo format. That is 2 microphones for each instrument or section of instruments and voices. This can also include separate stereo miking for the reverberant field.

Ambisonics Having a good understanding of the physics behind 3D audio spatial realism is required to implement the objectives of immersive audio, such as Ambisonics (open source) and the many competing commercial variations of it.

wikipedia.org/Ambisonics

Reverberant ambience

The wider fields can be used for reverberant ambience where no localisation or hi frequency or transient information is required. The hi frequencies in the wider fields may require attenuation (-3dB / octave) or -6dB shelving above 250 Hz. These wider fields have an advantage to provide deep bass and bass effect localisation because of the greater distances favouring longer wavelengths.

Applying high frequency shelving to the reverberant field, (particularly the rear surround speakers), increases the feeling of envelopment, without the reverberant sound appearing to come directly from the speakers. However to avoid phase shift biasing the direction of the original sound, the reverberation should not come from the source, but only from the other speakers or the surround speakers. But there are other limitations. Longer path length reflections (50ms to 150ms) can overly create a sense of distance, at the cost of reduced intelligibility causing the whole sound to be cluttered and distant.

A simple technique to retain intelligibility is to put a delay between the original sound and when the reverberation begins. Early stereo recordings experimented with the original sound form one speaker and the reflected reverberation from the other. This often extended to a panned reversal to create a spatial moving effect. History proves that the greatest creativity is expressed when new technology is made available and then later substituted with un-imaginative conformity.

Cinema sound 5.1 only provides for one sub-woofer. Localisation below 250Hz is ignored. Commercial cinemas could provide the full bass register within each of the left - center - right speaker stack to enable bass localisation for thunder (movement of space ships?) and army tanks crossing the screen. This would require cinemas to be non-reverberant at low frequencies. The 1950s Cinerama experience clearly demonstrated this, but was considered not commercially viable for our modern low-cost high-profit driven world. There is a possibility that cinematic experiences of this magnitude will be re-created with digital audio management and digital projection in the future.

Compression

Compression of recorded music reduces left right localisation and depth of field. Over compressed music is similar to reverberation remaining at a constant level which has zero spatial information. Repeat: Left right localisation and depth of field from recorded music is dependant on minimal compression and listening in a non-reverberant environment. Also when music is over compressed it doesn't mater what mic or mic technique is used it will all end up sounding the same.