The original Greek meaning of the word stereo is 'solid'. Our dual spaced (left-right) sensory system enables us to have a 3 dimensional perspective of the world. 3D perception is dependant on angular perspective with comparative detail. The greater the amount of comparative detail, the richer the 3D perception becomes. This is easily noticed with our visual sense.
In pitch blackness it is difficult to see if a single spot of light is a firefly sitting in a tree or a galaxy a million light years from earth, or how far a tennis ball is when coming straight at your face. Also, with sound it is not possible to experience 3D sound with small low-fidelity sound systems, or with music that has been excessively compressed, or in reverberant environments.
Techniques for stereo imaging have improved over time and can be artificially created in post-production. Some classical music is skilfully recorded with stereo image, but limited to a relatively small consumer aware market.

Stereo photography with stereo viewers were popular pre-1970 and still have a small following today. The stereo 3D image requires that each eye separately view the left and right pictures. The eyes are very close and forward facing which is why the division is required. However, it is possible to see a limited 3D image without division if focus can be shifted forward or back from the pictures. This requires skill and correct viewing position.
www.stereoscopy.com

Stereo recording was researched early in recording history. Our ears are at 180deg (6in 150mm) apart enabling a wide detailed 3D auditory experience - two microphones are spaced similarly to our ears, recording onto separate tracks and played back through headphones (now referred to as 'binural'). Similar to the visual 3D stereo experience, the recorded auditory 3D experience requires the ears to separately hear each sound source, as with headphones. The result is a 3D quasi-spherical experience similar to reality. Many companies including Neumann have dummy head stereo mics available. There is also a low cost DIY dummy head project available from (ESP Elliott Sound Products) sound-au.com/project112.htm
The sweet-spot. A limited 3D stereo effect from speakers can be heard from a sweet-spot of an equilateral triangle, in the forward plane. But, this effect is limited to an off-centre position no greater than 3dB difference between the speakers. The larger the triangle the larger the sweet-spot to include more people. This is acceptable because we naturally sit in front of a live musical performance. This effect is also noticed by people who choose to sit in the centre middle of cinemas. Sound that appears from the other directions as echoes, or people coughing, is distracting.

Listening to forward facing stereo speakers (as in the above pic) enables both ears to hear both speakers, and therefore limits the ability of hearing localisation of separate sounds, (except from the extreme positions of left and right). Another way to understand this, but must not be taken literally, is to imagine that the left-right speakers represent the perimeter of a single, very large imaginary speaker.
Depth-of-field. A symphony orchestra has a rich depth-of-field created by the immense comparative detail within the music. A single note of a flute has zero depth-of-field. Field depth is created by the richness of comparative and harmonic nuance detail within the music. Small low-fidelity speakers can only reproduce a very limited depth-of-field. Depth-of-field is also limited in reverberant environments and is not obtainable from excessively compressed recordings.
The enjoyment of listening to a live (non-amplified) jazz band is not dependant on sitting in the middle to hear spatial left-right orientation. Enjoyment is achieved by being close enough to hear a depth-of-field, which can be achieved from almost any angle. Depth-of-field is the primary influence for hearing 'realism', whereas localisation left-right has little effect on 'realism'. A single large full-fidelity speaker system that gives a depth-of-field will sound more enjoyable and realistic than 5 small low-fidelity home cinema speakers, that provide localisation only.
A simple experiment is to listen at low level to a single speaker within 100mm / 4in. Notice that a depth-of-field can be heard within the music but disappears when the listening distance is increased. Then, listen with 2 speakers and notice that depth-of-field can be heard from a greater distance as the speakers are moved apart. The depth-of-field also remains semi-stable over a listening angle of approx 60deg.
Reducing the level of one speaker causes the depth-of-field to diminish, regardless of the listening angle. Therefore, both speakers must be at the same level. Some teenagers will be seen lying down listening to a small stereo, at low level, very close to their ears. Others may listen at a higher level, at a distance, with speakers further apart.

Propagation is the ability for the sound system to provide full bandwidth of the music over distance. Most small domestic 2-way speakers cannot achieve this. The speakers have to be very close together to enable lower voice and bass energy to combine to obtain lower frequency propagation. Therefore, the majority of pop music is recorded as dual-mono (pseudo-stereo) to increase the propagation of sound energy. True stereo, to obtain realism and depth-of-field, is often ignored. Understandably, each left and right speaker should independently be capable of providing full bandwidth propagation, for a true a stereo-field to be obtained.

Mono-comb filter. Panning to centre so both speakers are reproducing the same sound, to increase propagation, can only be heard accurately from exact centre position between the speakers. But, off-centre, the different path lengths from the speakers create a comb-filter effect which causes cancellations, reduces high-frequency energy and decreases intelligibility. This effect can be used positively to decrease the sharpness of a voice or instrument, particularly drum, to obtain a softer splash. An instrument panned hard left or right will sound sharper and therefore closer.
3D Spatial realism
3D Spatial realism requires a minimum of 2 stereo fields. A single field from only 2 speakers enables one part of the 3D experience to be obtained. This point is can be fully understood by recording engineers who also have electro-acoustic engineering backgrounds but not likely understood by pop recording engineers who do not have science or engineering backgrounds. As in the below pic, three speakers can create three stereo fields from which fields 1 and 2 create spatial localisation. This means that musical instruments are to be positioned and maintained into a left - centre - right correlation over a 60deg listening angle. The 5.1 protocol for home cinema puts the rear speakers at too great an angle to be correlated into stereo fields. The rear speakers are for novelty effect only.

Panoramic 3D spatial realism with 5 speakers (as in the pic below) enables 10 stereo fields which can be correlated to replicate a full symphony orchestra - providing the speakers are full-fidelity. Directly adjacent speakers 1, 2, 3, 4, provide the primary stereo fields giving accurate localisation of instruments which should contain high-frequency detail with transients. Fields 5, 6, 7, can generalise sections of instruments or choirs. Fields 8, 9, 10, are wide apart, and most listening positions will be off-centre which will cause the closest speaker to be heard only.

The procedure for obtaining 3D spatial realism requires most, if not all, sound sources to be separately miked in stereo format. That is; 2 microphones for each instrument or section of instruments and voices. This can also include separate stereo miking for the reverberant field.
Ambisonics. Having a good understanding of the physics behind 3D audio spatial realism is required to implement the objectives of immersive audio, such as Ambisonics (open source) and the many competing commercial variations of it such as Atmos.
wikipedia.org/Ambisonics
Reverberant ambience
The wider fields can be used for reverberant ambience where no localisation, high-frequency or transient information is required. The high-frequencies in the wider fields may require attenuation (-3dB / octave) or -6dB shelving above 250 Hz. These wider fields have an advantage to provide deep bass and bass effect localisation because of the greater distances favouring longer wavelengths.

Applying high frequency shelving to the reverberant field (particularly the rear surround speakers), increases the feeling of envelopment, without the reverberant sound appearing to come directly from the speakers. However, to avoid phase shift biasing in the direction of the original sound, the reverberation should not come from the source, but only from the other speakers or the surround speakers. But, there are other limitations. Longer path-length reflections (50ms to 150ms) can overly create a sense of distance at the cost of reduced intelligibility, causing the whole sound to be cluttered and distant.

A simple technique to retain intelligibility is to put a delay between the original sound and when the reverberation begins. Early stereo recordings experimented with the original sound form one speaker and the reflected reverberation from the other. This often extended to a panned reversal to create a spatial moving effect. History proves that the greatest creativity is expressed when new technology is made available and then later substituted with un-imaginative conformity.
Cinema sound 5.1 only provides for one sub-woofer. Localisation below 250Hz is ignored. Commercial cinemas could provide the full bass register within each of the left - centre - right speaker stacks to enable bass localisation for thunder (movement of space ships?) and army tanks crossing the screen. This would require cinemas to be non-reverberant at low frequencies. The 1950's Cinerama experience clearly demonstrated this, but was considered not commercially viable for our modern low-cost high-profit driven world. There is a possibility that cinematic experiences of this magnitude will be re-created with digital audio management and digital projection in the future.
Compression
Compression of recorded music reduces left-right localisation and depth-of-field. Over compressed music is similar to reverberation remaining at a constant level, which has zero spatial information.
Repeat: Left-right localisation and depth-of-field from recorded music is dependant on minimal compression and listening in a non-reverberant environments. When music is over compressed it doesn't mater what mic or mic technique is used, it will all end up sounding the same.


