Cinema sound: Cinema sound formats

Cinema sound is now digital. But, in the battle for monopolistic corporate control, is the debris of un-resolved sound formats and flawed plans for how it is to be resolved. Cinema sound was/is (in principle) a mono center-channel format. Then, left-right tracks got added, then surround speakers. But - but the majority of people, at home, only have a stereo system, with left-right speakers only.

The belief is, AI will solve the problem by magically creating a center channel out of the left-right speakers and magically reflect sound from the walls and ceiling giving an illusion of a full immersive surround sound. Science will not be required, because AI generated marketing spin will enforce us believe we are collectively experiencing the same magical illusion. "Isn't Capitalism wonderful?"

By paying attention, we can hear that a particular movie sound track is magnificent, the dialogue, rich and clear. On another movie, the sound level is chaotic, dialogue intelligibility almost non-existent. If we wish to understand why, and if we we wish to be part of solving the problem, we must understand the background and history of how this mess came about.

35mm has been the standard film stock for the majority of movies throughout cinema history. There had always been a variation of Anamorphic lens for 35mm film for achieving variations of wide screen aspect ratios. The most commonly known was Cinema-Scope. Over time the 35mm film stock and lens technology greatly improved. Economic rationalising favoured 35mm and this caused the high-cost superior 70mm films with its 5 screen sound channels to be used less and less.

Dolby Stereo Mono sound dominated the movie experience in the 35mm medium till 1976 when an international agreement to allow the Dolby two track optical format to become the new standard and included the Dolby A noise reduction. A matrix technique enabled 4 channels of sound to be achieved from the 2 track optical format (Left) - (Center) - (Right) - (Surround). The tracks are described as Left total (Lt) and Right total (Rt).

The matrix technique of obtaining four channels of sound from a dual format system had been used in the communications industry and with earlier quad vinyl recordings. However the existing matrix techniques could not enable backwards compatibly for film stock to be read from existing mono only reading projectors.

A cleaver and more complex matrix system was developed that enabled backwards compatibly. Over time electronic IC (integrated circuit) and component technology had improved, enabling higher fidelity with lower noise. Bass performance was also improved referred to as OBE (Optical bass extension). Many cinemas started to add an independent bass extension speaker. By 1986 this final optical sound improvement became known as 'Dolby SR' (Spectral Recording) and remained in place till digital technology took over.

Digital sound formats Between 1990 - 1993 four different and competing cinema digital sound formats were developed of which two remain in popular use. The (.1) sub-bass LFE (low frequency effect) was added which behaves as a separate channel limited to approx 250Hz. All systems automatically default to the analogue optical system as a security back up. The Dolby digital system dominated due to its simplicity of application and being more economical to manage.

Unfortunately many cinemas have limited fidelity 2-way passive speaker systems which are unseen and have remained basically as they were 50yrs ago. It can often take a trained ear, to hear if the sound is being taken from the old optical analogue, or the new digital formats.

Dolby SR-D is the most common digital format. The digital information is stored between the sprocket holes. The small space between the sprocket holes can only contain a small amount of data, this being a major limitation. A common belief exists that this area of the film stock was chosen because it suffers the least wear, and is the most reliable. However there are many reviewers and projectionists who claim that the space between the sprocket holes suffers the most wear and is the least reliable place to put digital data.

DTS Digital Theater Systems is said to be the preference of audiophiles. This view is also disputed. It uses a specially designed external CD player and requires 2 CDs per film. The CD player is sync locked to the SMPTE time code on the film stock. DTS is less preferred by cinema chains and film distributors because of extra cost, effort and possibility of CDs being lost.

SDDS Sony Dynamic Digital Sound has the capacity for 8 channels. It is strongly argued that SDDS is the best performing format. The digital data is stored on the outer edges of the film stock. There are some reviews that claim the outer edges are vulnerable to wear and damage, and again there are other reviews that state the opposite. SDDS can provide for 5 screen channels plus independent surrounds and sub bass. It appears SDDS is technically supported but not promoted.

Dolby 320Kb/s compression ratio 10:1 average 64Kb/s per channel for 5 channels.
DTS 1.04Mb/s compression ratio 4:1 average 240Kb/s per channel for 5 channels.
SDDS 2.46Mb/s compression ratio 5:1 average 307Kb/s per channel for 8 channels, but because SDDS has back up tracks the average may be similar to DTS.

The Dolby method is said to take advantage of momentary space from any one channel to increase the capacity of the other channels. Instead of the audio quality being inversely proportional the number of channels, it is approximately inversely proportional to the √ of the number of channels (√5 = 2.24). DTS is said to include frequency domain sharing between sub woofer and surrounds, hence no bass under approx 160Hz in the surrounds. The SDDS system is said to have a fixed bit allocation per channel to maintain channel independence.

Achieving 5 to 8 channels of sound from a limited bit rate was not a simple task. Research and development exceeded $20 million. Regardless of their limitations, the ability to approximate the performance of the analogue magnetic format is an astounding achievement to say the least.

Wiki sdds Sony Dynamic Digital Sound
www.dts.com
www.dolby.com

Fact or Fiction The above statements on digital formats are a summary of different views by projectionists, web sites, periodicals and competing experts at a local wine bar. Vested interests behind digital technology tend to be secretive and are said to willingly waste unlimited resources on law suits. These competing formats were served up fait accompli, each one claiming it is the best.

The cinema-going public had and have no say whatsoever, including the majority of the worlds computer engineers and scientists. What should have been done was an open collective approach to achieve a single digital lossless format (compressed or un-compressed) that had the performance capacity and fidelity of the best analogue magnetic format.

What we got was something that fell far short of it. Whether the SDDS system is capable of this, who knows, as the majority of the public have not experienced sound in this format. I am sure we all would have no hesitation to pay a small % on the ticket price for an independent authority to represent the best outcome for cinematic experience.

Return to top

Understanding Digital sound

An audio analogue signal on magnetic tape or vinyl records is infinite in detail, but suffers deterioration and increased noise from repeated use and mass transfer. Analogue sound quality is measured by frequency response. The lower the frequency response the lower the quality. Digital audio does not suffer from loss of frequency response, but with a low bit rate it suffers from quantization noise (random static) and smearing.

Digital reduces the infinite detail of an audio analogue signal to a representation of finite bits of 1's and 0's Each 1 and 0 is absolute and therefore is simple to produce and mass transfer, and does not suffer deterioration or noise with repeated use. The maximum allowable number of bits 1's and 0's per second (b/s) is the only limitation of digital technology to provide full sound quality.

The 1's and 0's bits are grouped into 'words' of 16 bits (1000110001101010) for domestic CD , and 18 bit 'words' for Pro-audio (100011000110101001). A word can consist of any number of bits. The number of words per second is called 'sampling rate'. Domestic CD sampling rate is 44.1K words per second. Pro audio sampling rate is 48K words per second.

Bits per word defines dynamic range.
Sampling rate defines frequency response and must be greater than 2x highest audio frequency.

dB FS Full Scale 0 dBFS is the highest level of word sample.
Therefore lower audio levels will be - dBFS numbers.

1111 1111 1111 1111 = 0 dBFS
0000 0000 0000 0001 = -96 dBFS.

16 bit word = 96 dB dynamic range.
20 bit word = 120 dB dynamic range.
24 bit word = 144 dB dynamic range.

Note: Only a word group of exactly 8 bits is called a 'Byte' and refers to storage capacity. A CD can store 700 Mega Bytes (700MB) We must not confuse bits with Bytes. Bytes is upper case 'B'. bits is lower case 'b'.

The greater the total numbers of bits/second (b/s), the faster the sampling rate, and/or the greater number of bits in a word. But for a given number of bits per second (b/s) there can be a choice between a slow sampling rate with a large number of bits per word, or a fast sampling rate with a less number of bits per word. The continuing explanations will simply refer to total bits/second (b/s) to represent audio quality.

Domestic CD is 1,411,000 bits per second (1.411Mb/s) for 2 channels. Therefore each channel is 705,600 bits per second (705.6Kb/s) For the majority of people this bit rate is high enough to enable music fidelity to be in-distinguishable from quality analogue formats (20Hz - 20kHz with 96dB dynamic range).

Before CD players were available the only domestic digital recording medium was video tape recorders. The fractional number of 44.1K sampling rate (words per second) was the maximum allowable for high fidelity audio to be digitally recorded onto video tape formats, and the 44.1K sampling rate was retained when domestic CD arrived. Pro audio Digital Audio Tape (DAT) is 48K sampling rate.

Basic explanation https://www.cs.columbia.edu/~hgs/audio/44.1.html
Audio demonstration of various sampling rates https://www.cs.cf.ac.uk/Dave/Multimedia/node150.html Quantization noise wikipedia.org/Quantization

Resolution The primary reason to have the highest sampling rate possible is to obtain the best small signal resolution for fine harmonic detail and nuances within the music. The second reason for high sampling rates is to enable the signal to be digitally EQ modified and processed. It is therefore understandable that digital recording naturally favours increased level over fidelity

Digital recording The trend is to increase the recorded sound to the maximum possible level. This can easily be achieved without incurring overload by over-modulation, uniquely suited to digital recording, in comparison to previous analogue recording. Modern production techniques are often made up of what appears to be an infinite number of competing 'sound grabs'. The dominant objective is to increase average level by adding more and more, including eliminating space within the music (easily achieved with modern software). Once taken to the highest level, the digital recording can then be compacted by excessive over-use of dynamic compression, enabling more to be added.

This childish behavior in the misuse of digital recording has resulted in rendering harmonic detail and nuances in-audible, thereby masking the fidelity in music. Possibly the worst perpetrators of this problem are from the proliferate questionable audio recording schools and software merchants who promote these irresponsible practices to impressionable young people desperate to enter the recording and pop industries.

Return to top

Digital compression

Loss-less un-compressed format is for Pro-audio digital and domestic CD. Silence is recorded and played back at the full file size bit rate.
Loss-less compressed format is similar to a ZIP file that reduces file size by not recording the silence. When played back it replaces the original silence at full file size.
Lossy-compressed Lossy compression is Smoke and Mirrors which evolved from psycho-acoustic research. Silence including selected detail within the music can be discarded without the average person being able to notice, obtaining 30% to 90% reduction in file size. Discarded information can not be retrieved.

MPEG (MP3) Moving Picture Experts Group use a variations of techniques described as "perceptual noise shaping" or "perceptual sub-band transform coding" It is used for internet music downloads where only very small file sizes can be used. MP3 allows for various compression rates to be chosen. But how much information can be throw away without noticeably deteriorating of the quality of sound ?

256Kb/s 5:1 compression Music quality almost indistinguishable from the original CD.
192Kb/s 7:1 compression Popular choice for reasonable quality.
128Kb/s 11:1 compression Popular for internet download music and iPods.
96Kb/s 14:1 compression Easily discernibly lower sound quality than original CD recording.
64Kb/s 22:1 compression Mono speech only: don't attempt music.

The best test to hear a comparison of MP3 lossy compression to original un-compressed CD sound is to use white or pink noise, audience applause, rain on a tin roof, a bundle of keys thrown up in the air and caught, and worst of all a Harpsichord.

Achieving an acceptable performance from a very limited bit rate is a technological miracle greater than the biblical parable of the fishes and loaves. However when applied to multi-channel cinema sound we must not lose sight of marketing deception when promoting a brand image for 'white bread' as vitamin enriched. As in a product stripped of nutrients (or necessary bit rate) and selling it as being enriched with clever deletion algorithms or artificial vitamins enabling it to taste or sound acceptable. Hopefully when MP99 arrives it will discard clap-trap, boring cliché dialogue and TV commercials as well.
wikipedia.org/MPEG

Vinyl Many who grew up in the 60s and 70s with a hi-quality sound system and large vinyl collection can clearly hear the degradation of music quality of most lossy-compressed digital formats. But the majority of the modern digital generation have grown up in an excessively noise polluted world where hearing fine detail in nature and music is often not possible.

Return to top

Compressed - lossy Digital cinema sound

It is argued that much of what is recorded on an un-compressed loss-less CD format can not be heard. What cannot be heard, cannot be heard, therefore silence and any sounds below the threshold of hearing, or below the general ambient noise level of 40dBA can be deleted. Loud sounds mask softer sounds and the softer sounds can be deleted when louder sounds are being played. Some frequencies that are close together can mask each other, therefore the masked sounds can be deleted.

The perceived sound quality is dependent on limitations of listener attention, being in a high reverberant environment with ambient noise, listening un-attentively to a small, cheap, limited fidelity home cinema system while being distracted by vision. The final essential factor is that the listener's expectation has been influenced by marketing.

When all these external factors are combined they effectively mask the distortion anomalies created by the lossy compressed digital sound.

Ambient noise of cinema is approx 40dBA, any sound below this level can be deleted.
Dynamic range between background noise and maximum level is approx 40dB.
Hearing sensitivity of low and high frequencies is limited, only 20dB dynamic range required.
Sound outside of our ability to hear direction (chirping cricket) can be collapsed to mono.
High level sounds psycho-acoustically mask similar sounds of lower level, which can be deleted.

Bit pool: To achieve these deletions, plus many more, requires a sample (every fraction of a second) of the total information to be stored in a bit pool for analysis. Instantaneous decisions are made of what information can be deleted. But when all channels are over-used (at the same time), beyond the capacity of the bit pool, some essential information may have to be dumped.

Depending on the % of deletion, unpredictable outcomes may occur. A frequency band in one channel may be deflected to another. Frequency bands that are similar in different channels may be deleted leaving only the loudest heard. Similar bands from different channels may appear in the center channel. Ringing or pre-echo of percussive or transient sounds may occur etc. etc. etc.

Psycho-acoustic masking These random artifacts occur within a fraction of a second and are averaged by our hearing and expectation. The majority of non-discerning audience in a cinema do not notice if all channels are collapsed to mono, or if the surrounds are on. Sight can influence what we believe is the direction of sound. Also none of these lossy compression techniques reduce frequency response. For most people hearing high frequencies of any type is thought of as high fidelity.

The above compressed right pic has colours, and some colours are exaggerated, and we clearly see the picture as distorted, this is because the picture remains static in time. However because audio constantly changes in time, we can be more easily fooled. Stand back from the computer screen approx 3 meters (10ft) slightly squint or de-focus the eyes and notice how similar the 2 pictures become. This effect is similar to psycho-acoustic masking.

When audio is digitally compressed the outcome is similar, and some hi-frequencies become exaggerated giving a false perception of fidelity. However an attentive listener can easily hear poor music resolution, smearing, image loss, reduced depth of field and chaotic imbalance between channels.

If the anomalies of lossy compression are not to be heard, it requires a high correlation of similar sounds between the channels. A simple test to hear limitations of lossy compressed 5.1 formats is to put different full fidelity music on each sound track.

Left channel Mahler's 5th
Center channel Beethoven's 4th
Right channel Tchaikovsky's Nutcracker suite
Left-rear channel Loud Rock music
Right-rear channel Rap or Techno music
.1 Sub-bass channel African drums

This test is unrealistic as this level of sound separation is not required for multi-channel sound in film production. But this test will reveal the limitations and demonstrate what can be achieved. The simplest test that can be achieved is people simultaneously speaking different languages, recorded and then played back, with consistent separation results from each channel. However if the people are rotated as if on a merry-go round, strange things can start to happen.

Demonstration trailers promoting digital sound often consist of loud impressive animated computer sounds with minimal transients and harmonics, heard through mostly limited fidelity speaker systems, where the limitations and anomalies of lossy compression are not heard. Also the majority of companies behind digital technology are secretive, aggressively protect their interests and do not openly disclose problems and limitations.

Basic mixing rules that can be applied to achieve a consistent outcome in minimising lossy compression artifacts, but they do not need to be obeyed:

Dialogue to center channel only
Front left and right for low level background music
Surrounds used sparingly, during minimal use of front channels

The future aim for when digital cinema is available, is that the audio will hopefully be available in loss-less compressed format replayed exactly as the original recording was made. Only then will the sound fidelity and channel separation match the best analogue magnetic format of the past without its limitations.

One idea is that the film will be delivered on large format CDs and loaded into high speed hard drives from which the film will then be shown. However these decisions are not yet finalised.

www.www.mkpe.com Has a good critical historical explanation of digital sound formats.
www.jpeg.org go to JPEG 2000 link for new digital cinema standards.

Digital fidelity "Multi-channel digital transmission (encoding and decoding) that utilizes low bandwidth optical recording techniques is a compromise that must be appreciated for what it represents as an alternative to previous linear formats. Data compression, bit errors and dropouts are inherent in all restricted bandwidth systems that favour dynamic range as opposed to bandwidth and linearity.

Work done in the field of digital telephony has resulted in improved speech intelligibility at the expense of fidelity. This is of little concern for a dialog or sound effects channel but is entirely another matter for music. As the majority of cinematic productions are a complex mix of all these ingredients it is little wonder that music fidelity is masked by competing sound components that can be 6 to 10dB higher in level.

The musical content of a modern production can thus be seen as a series of 'sound grabs' competing with every other component in the mix. This aesthetic masking has now become a method of increasing the average levels without incurring overload by over-modulation". written by Keith McPherson (audio and telecommunications engineer)

Links

Sound System Formats

wikipedia.org Surround sound
howstuffworks.com/movie-sound Movie-sound 1-6
howstuffworks.com movie-screen
www.soundonsound.com Surround Sound Explained
www.mkpe.com Multi channel Film

Return to top