X Close

Digital Sound

An audio analogue signal on magnetic tape or vinyl records is infinite in detail, but suffers deterioration and increased noise from repeated use and mass transfer.   Analogue sound quality is measured by frequency response.   The lower the frequency response the lower the quality.   Digital audio does not suffer from loss of frequency response, but with a low bit rate it suffers from quantumising noise (random static) and smearing.

Digital reduces the infinite detail of an audio analogue signal to a representation of finite bits of 1's and 0's   Each 1 and 0 is absolute and therefore is simple to produce and mass transfer, and does not suffer deterioration or noise with repeated use.   The maximum allowable number of bits 1's and 0's per second (b/s) is the only limitation of digital technology to provide full sound quality.

The 1's and 0's bits are grouped into 'words' of 16 bits (1000110001101010) for domestic CD , and 18 bit 'words' for Pro-audio (100011000110101001).   A word can consist of any number of bits.   The number of words per second is called 'sampling rate'.   Domestic CD sampling rate is 44.1K words per second.   Pro audio sampling rate is 48K words per second.

Digital audio

Bits per word   defines dynamic range.
Sampling rate   defines frequency response and must be greater than x 2 highest audio frequency.

dB FS   Full Scale     0 dBFS is the highest level of word sample.
Therefore lower audio levels will be   - dBFS numbers.

1111 1111 1111 1111   =     0 dBFS
0000 0000 0000 0001   =   -96 dBFS.

16 bit word   =     96 dB dynamic range.
20 bit word   =   120 dB dynamic range.
24 bit word   =   144 dB dynamic range.

Note:   Only a word group of exactly 8 bits is called a 'Byte' and refers to storage capacity.   A CD can store 700 Mega Bytes (700MB)   We must not confuse bits with Bytes.   Bytes is upper case 'B'.   bits is lower case 'b'.

The greater the total numbers of bits/second (b/s), the faster the sampling rate, and/or the greater number of bits in a word.   But for a given number of bits per second (b/s) there can be a choice between a slow sampling rate with a large number of bits per word, or a fast sampling rate with a less number of bits per word.   The continuing explanations will simply refer to total bits/second (b/s) to represent audio quality.

Domestic CD is 1,411,000 bits per second (1.411Mb/s) for 2 channels.   Therefore each channel is 705,600 bits per second (705.6Kb/s)   For the majority of people this bit rate is high enough to enable music fidelity to be in-distinguishable from quality analogue formats   (20Hz - 20kHz with 96dB dynamic range).

Before CD players were available the only domestic digital recording medium was video tape recorders.   The fractional number of 44.1K sampling rate (words per second) was the maximum allowable for high fidelity audio to be digitally recorded onto video tape formats, and the 44.1K sampling rate was retained when domestic CD arrived.   Pro audio Digital Audio Tape (DAT) is 48K sampling rate.

Basic explanation   https://www.cs.columbia.edu/~hgs/audio/44.1.html
Audio demonstration of various sampling rates   https://www.cs.cf.ac.uk/Dave/Multimedia/node150.html Quantization noise   wikipedia.org/Quantization

Digital audio

Resolution   The primary reason to have the highest sampling rate possible is to obtain the best small signal resolution for fine harmonic detail and nuances within the music.   The second reason for high sampling rates is to enable the signal to be digitally EQ modified and processed.   It is therefore understandable that digital recording naturally favours increased level over fidelity.

Digital recording   The trend is to increase the recorded sound to the maximum possible level.   This can easily be achieved without incurring overload by over-modulation, uniquely suited to digital recording, in comparison to previous analogue recording.   Modern production techniques are often made up of what appears to be an infinite number of competing  'sound grabs'.   The dominant objective it to increase average level by adding more and more, including eliminating space within the music (easily achieved with modern software).   Once taken to the highest level, the digital recording can then be compacted by excessive over-use of dynamic compression, enabling more to be added.

This childish behaviour in the mis-use of digital recording has resulted in rendering harmonic detail and nuances in-audible, thereby masking the fidelity in music.   Possibly the worst perpetrators of this problem are from the proliferate questionable audio recording schools and software merchants who promote these irresponsible practices to impressionable young people desperate to enter the recording and pop industries.

Digital compression

  • Loss-less un-compressed   format is for Pro-audio digital and domestic CD.   Silence is recorded and played back at the full file size bit rate.
  • Loss-less compressed   format is similar to a ZIP file that reduces file size by not recording the silence.   When played back it replaces the original silence at full file size.
  • Lossy-compressed   Lossy compression is Smoke and Mirrors which evolved from psycho-acoustic research.   Silence including selected detail within the music can be discarded without the average person being able to notice, obtaining 30% to 90% reduction in file size.   Discarded information can not be retrieved.

MPEG (MP3)   Moving Picture Experts Group   use a variations of techniques described as "perceptual noise shaping" or "perceptual sub-band transform coding"   It is used for internet music downloads where only very small file sizes can be used.   MP3 allows for various compression rates to be chosen.   But how much information can be throw away without noticeably deteriorating of the quality of sound ?

256Kb/s     5:1 compression   Music quality almost indistinguishable from the original CD.
192Kb/s     7:1 compression   Popular choice for reasonable quality.
128Kb/s   11:1 compression   Popular for internet download music and ipods.
  96Kb/s   14:1 compression   Easily discernibly lower sound quality than original CD recording.
  64Kb/s   22:1 compression   Mono speech only don't attempt music.

The best test to hear a comparison of MP3 lossy compression to original un-compressed CD sound is to use white or pink noise, audience applause, rain on a tin roof, a bundle of keys thrown up in the air and caught, and worst of all a Harpsichord.

Achieving an acceptable performance from a very limited bit rate is a technological miracle greater than the biblical parable of the fishes and loaves.   However when applied to multi-channel cinema sound we must not loose sight of marketing deception when promoting a brand image for 'white bread' as vitamin enriched.   As in a product stripped of nutrients (or necessary bit rate) and selling it as being enriched with cleaver deletion algorithms or artificial vitamins enabling it to taste or sound acceptable.   Hopefully when MP99 arrives it will discard clap-trap, boring cliché dialogue and TV commercials as well.

Vinyl   Many who grew up in the 60s and 70s with a hi-quality sound system and large vinyl collection can clearly hear the degradation of music quality of most lossy-compressed digital formats.   But the majority of the modern digital generation have grown up in an excessively noise polluted world where hearing fine detail in nature and music is often not possible.

Compressed - lossy     Digital cinema sound

It is argued that much of what is recorded on an un-compressed loss-less CD format can not be heard.   What cannot be heard, cannot be heard, therefore silence and any sounds below the threshold of hearing, or below the general ambient noise level of 40dBA can be deleted.   Loud sounds mask softer sounds and the softer sounds can be deleted when louder sounds are being played.   Some frequencies that are close together can mask each other, therefore the masked sounds can be deleted.

The perceived sound quality is dependant on limitations of listener attention, being in a high reverberant environment with ambient noise, listening un-attentively to a small cheap limited fidelity home cinema system while being distracted by vision.   The final essential factor is that the listeners expectation has been influenced by marketing.

When all these external factors are combined they effectively mask the distortion anomalies created by the lossy compressed digital sound.

  • Ambient noise of cinema is approx 40dBA, any sound below this level can be deleted.
  • Dynamic range between background noise and maximum level is approx 40dB.
  • Hearing sensitivity of low and high frequencies is limited, only 20dB dynamic range required.
  • Sound outside of our ability to hear direction (chirping cricket) can be collapsed to mono.
  • High level sounds psycho-acoustically mask similar sounds of lower level, which can be deleted.

Bit pool:   To achieve these deletions, plus many more, requires a sample (every fraction of a second) of the total information to be stored in a bit pool for analysis.   Instantaneous decisions are made of what information can be deleted.   But when all channels are over-used (at the same time), beyond the capacity of the bit pool, some essential information may have to be dumped.

Depending on the % of deletion, unpredictable outcomes may occur.   A frequency band in one channel may be deflected to another.   Frequency bands that are similar in different channels may be deleted leaving only the loudest heard.   Similar bands from different channels may appear in the center channel.   Ringing or pre-echo of percussive or transient sounds may occur etc etc etc.

Psycho-acoustic masking   These random artefacts occur within a fraction of a second and are averaged by our hearing and expectation.   The majority of non-discerning audience in a cinema do not notice if all channels are collapsed to mono, or if the surrounds are on.   Sight can influence what we believe is the direction of sound.   Also none of these lossy compression techniques reduce frequency response.   For most people hearing high frequencies of any type is thought of as high fidelity.

Digital compression

The above compressed right pic has colours, and some colours are exaggerated, and we clearly see the picture as distorted, this is because the picture remains static in time.   However because audio constantly changes in time, we can be more easily fooled.   Stand back from the computer screen approx 3 meters (10ft) slightly squint or de-focus the eyes and notice how similar the 2 pictures become.   This effect is similar to psycho-acoustic masking.

When audio is digitally compressed the outcome is similar, and some hi-frequencies become exaggerated giving a false perception of fidelity.   However an attentive listener can easily hear poor music resolution, smearing, image loss, reduced depth of field and chaotic imbalance between channels.

If the anomalies of lossy compression are not to be heard, it requires a high correlation of similar sounds between the channels.   A simple test to hear limitations of lossy compressed 5.1 formats is to put different full fidelity music on each sound track.

  • Left channel       Mahler's 5th
  • Center channel   Beethoven's 4th
  • Right channel     Tchaikovsky's Nutcracker suite
  • Left-rear channel         Loud Rock music
  • Right-rear channel       Rap or Techno music
  • .1 Sub-bass channel     African drums

This test is unrealistic as this level of sound separation is not required for multi-channel sound in film production.   But this test will reveal the limitations and demonstrate what can be achieved.   The simplest test that can be achieved is people simultaneously speaking different languages, recorded and then played back, with consistent separation results from each channel.   However if the people are rotated as if on a merry-go round, strange things can start to happen.

Demonstration trailers promoting digital sound often consist of loud impressive animated computer sounds with minimal transients and harmonics, heard through mostly limited fidelity speaker systems, where the limitations and anomalies of lossy compression are not heard.   Also the majority of companies behind digital technology are secretive, aggressively protect their interests and do not openly disclose problems and limitations.

Basic mixing rules   that can be applied to achieve a consistent outcome in minimising lossy compression artefacts, but they do not need to be obeyed:

  • Dialogue to center channel only
  • Front left and right for low level background music
  • Surrounds used sparingly, during minimal use of front channels

The future aim   for when digital cinema is available, is that the audio will hopefully be available in loss-less compressed format replayed exactly as the original recording was made.   Only then will the sound fidelity and channel separation match the best analogue magnetic format of the past without its limitations.

One idea is that the film will be delivered on large format CDs and loaded into high speed hard drives from which the film will then be shown.   However these decisions are not yet finalised.

www.www.mkpe.com   Has a good critical historical explanation of digital sound formats.
www.jpeg.org   go to JPEG 2000 link for new digital cinema standards.

Digital fidelity  "Multi-channel digital transmission (encoding and decoding) that utilizes low bandwidth optical recording techniques is a compromise that must be appreciated for what it represents as an alternative to previous linear formats.   Data compression, bit errors and dropouts are inherent in all restricted bandwidth systems that favour dynamic range as opposed to bandwidth and linearity.

Work done in the field of digital telephony has resulted in improved speech intelligibility at the expense of fidelity.   This is of little concern for a dialog or sound effects channel but is entirely another matter for music.   As the majority of cinematic productions are a complex mix of all these ingredients it is little wonder that music fidelity is masked by competing sound components that can be 6 to 10dB higher in level.

The musical content of a modern production can thus be seen as a series of  'sound grabs'  competing with every other component in the mix.   This aesthetic masking has now become a method of increasing the average levels without incurring overload by over-modulation".

Written by Keith McPherson (audio and telecommunications engineer)

End of Topic 1
Created: 12-Dec-2008