# Psychoacoustics - MULTIMEDIA

Recall that the range of human hearing is about 20 Hz to about 20 kHz (for people who have not gone to many dances). Sounds at higher frequencies are ultrasonic. However, the frequency range of the voice is typically only from about 500 Hz to 4 kHz. The dynamic range, the ratio of the maximum sound amplitude to the quietest sound humans can hear, is on the order of about 120 dB.

Recall that the decibel unit represents ratios of intensity on a logarithmic scale. The reference point for 0 dB is the threshold of human hearing - the quietest sound we can hear, measured at 1 kHz. Technically, this is a sound that creates a barely audible sound intensity of 10 - 12 Watt per square meter. Our range of magnitude perception is thus incredibly wide: the level at which the sensation of sound begins to give way to the sensation of pain is about 1 Watt / m2, so we can perceive a ratio of 1012!

The range of hearing actually depends on frequency. At a frequency of 2 kHz, the ear can readily respond to sound that is about 96 dB more powerful than the smallest perceivable sound at that frequency, or in other words a power ratio of 232. Table lists some of the common sound levels in decibels.

Equal - Loudness Relations

Suppose we play two pure tones, sinusoidal sound waves, with the same amplitude but different frequencies. Typically, one may sound louder than the other. The reason is that the ear does not hear low or high frequencies as well as frequencies in the middle range. In particular, at normal sound volume levels, the ear is most sensitive to frequencies between 1 kHz and 5 kHz.

Fletcher - Munson Curves.The Fletcher - Munson equal - loudness curves display the relationship between perceived loudness (in phons) for a given stimulus sound volume (Sound Pressure Level, in dB), as a function of frequency. The following figure shows the ear's perception of equal loudness. The abscissa (shown in a semi - log plot) is frequency, in kHz. The ordinate axis is sound pressure level — the actual loudness of the tone generated in an experiment. The curves show the loudness with which such tones are perceived by humans. The bottom curve shows what level of pure tone stimulus is required to produce the perception of a 10 dB sound.

All the curves are arranged so that the perceived loudness level gives the same loudness as for that loudness level of a pure tone at 1 kHz. Thus, the loudness level at the 1 kHz point is always equal to the dB level on the ordinate axis. The bottom curve, for example, is for 10 phons. All the tones on this curve will be perceived as loud as a 10 dB, 1,000 Hz tone. The figure shows more accurate curves, developed by Robinson and Dadson, than the Fletcher and Munson originals.

The idea is that a tone is produced at a certain frequency and measured loudness level, then a human rates the loudness as it is perceived. On the lowest curve shown, each pure tone between 20 Hz and 15 kHz would have to be produced at the volume level given by the ordinate for it to be perceived at a 10 dB loudness level, The next curve shows what the magnitude would have to be for pure tones to each be perceived as being at 20 dB, and so on. The top curve is for perception at 90 dB.

For example, at 5,000 Hz, we perceive a tone to have a loudness level of 10 phons when the source is actually only 5 dB. Notice that at the dip at 4 kHz, we perceive the sound as being about 10 dB, when in fact the stimulation is only about 2 dB. To perceive the same effective 10 dB at 10 kHz, we would have to produce an absolute magnitude of 20 dB. The ear is clearly more sensitive in the range 2 kHz to 5 kHz and not nearly as sensitive in the range 6 kHz and above.

Fletcher - Munson equal loudness response curves for the human ear (remeasured by Robinson and Dadson)

At the lower frequencies, if the source is at level 10 dB, a 1 kHz tone would also sound at 10 dB; however, a lower, 100 Hz tone must be at a level 30 dB — 20 dB higher than the 1 kHz tone! So we are not very sensitive to the lower frequencies. The explanation of this phenomenon is that the ear canal amplifies frequencies from 2.5 to 4 kHz.

Note that as the overall loudness increases, the curves flatten somewhat. We are approximately equally sensitive to low frequencies of a few hundred Hz if the sound level is loud enough. And we perceive most low frequencies better than high ones at high volume levels. Hence, at the dance, loud music sounds better than quiet music, because then we can actually hear low frequencies and not just high ones. (A "loudness" switch on some sound systems simply boosts the low frequencies as well as some high ones.) However, above 90 dB, people begin to become uncomfortable. A typical city subway operates at about 100 dB.

The general situation in regard Frequency masking is where the frequencies of two (or more) instruments are battling it out for space - ie... they share the same frequencies - and is probably one of the most common problems encountered when mixing.

Basically, a sound may have (in addition to its root sound) other harmonic sounds that contribute to its overall timbre. If two sounds (timbres) share similar frequencies you could easily find yourself in the position where some of these harmonics are being masked in the mix; meaning that the instruments sound different than they do in isolation.to masking is as follows:

A lower tone can effectively mask (make us unable to hear) a higher tone.

• The reverse is not true. A higher tone does not mask a lower tone well. Tones can in fact mask lower - frequency sounds, but not as effectively as they mask higher - frequency ones.
• The greater the power in the masking tone, the wider its influence — the broader the range of frequencies it can mask.
• As a consequence, if two tones are widely separated in frequency, little masking occurs.

Threshold of Hearing.The following figure shows a plot of the threshold of human hearing, for pure tones. To determine such a plot, a particular frequency tone is generated, say 1 kHz. Its volume is reduced to zero in a quiet room or using headphones, then turned up until the sound is just barely audible. Data points are generated for all audible frequencies in the same way.

Threshold of human hearing, for pure tones

Effect on threshold of human hearing for a 1 kHz masking tone

The point of the threshold of hearing curve is that if a sound is above the dB level shown — say it is above 2 dB for a 6 kHz tone — then the sound is audible. Otherwise, we cannot hear it. Turning up the 6 kHz tone so that it equals or surpasses the curve means we can then distinguish the sound.

An approximate formula exists for this curve, as follows:

The threshold units are dB. Since the dB unit is a ratio, we do have to choose which frequency will be pinned to the origin, (0, 0). Frequency Masking Curves. Frequency masking is studied by playing a particular pure tone, say 1 kHz again, at a loud volume and determining how this tone affects our ability to hear tones at nearby frequencies. To do so, we would generate a 1 kHz masking tone at a fixed sound level of 60 dB, then raise the level of a nearby tone, say 1.1 kHz, until it is just audible. The threshold in the above figure plots this audible level.

It is important to realize that this masking diagram holds only for a single masking tone: the plot changes if other masking tones are used. The following figure shows how this looks: the higher the frequency of the masking tone, the broader a range of influence it has.

If, for example, we play a 6 kHz tone in the presence of a 4 kHz masking tone, the masking tone has raised the threshold curve much higher. Therefore, at its neighbour frequency of 6 kHz, we must now surpass 30 dB to distinguish the 6 kHz tone.

The practical point is that if a signal can be decomposed into frequencies, then for frequencies that will be partially masked, only the audible part will be used to set quantization noise thresholds.

Effect of masking tones at three different frequencies

Critical Bands

The term critical band, introduced by Harvey Fletcher in the 1940s, refers to the frequency bandwidth of the "auditory filter" created by the cochlea, the sense organ of hearing within the inner ear. Roughly, the critical band is the band of audio frequencies within which a second tone will interfere with the perception of a first tone by auditory masking Psychophysiologically, beating and auditory roughness sensations can be linked to the inability of the auditory frequency - analysis mechanism to resolve inputs whose frequency difference is smaller than the critical bandwidth and to the resulting irregular "tickling" of the mechanical system (basilar membrane) that resonates in response to such inputs.

Critical bands are also closely related to auditory masking phenomena – reduced audibility of a sound signal when in the presence of a second signal of higher intensity and within the same critical band. Masking phenomena have wide implications, ranging from a complex relationship between loudness (perceptual frame of reference) and intensity (physical frame of reference) to sound compression algorithms.

At the low - frequency end, a critical band is less than 100 Hz wide, while for high frequencies, the width can be greater than 4 kHz. This indeed is yet another kind of perceptual nonuniformity.

Experiments indicate that the critical bandwidth remains approximately constant in width for masking frequencies below about 500 Hz — this width is about 100 Hz. However, for frequencies above 500 Hz, the critical bandwidth increases approximately linearly with frequency.

Generally, the audio frequency range for hearing can be partitioned into about 24 critical bands (25 are typically used for coding applications), as the following table shows. Not withstanding the general definition of a critical band, it turns out that our hearing apparatus actually is somewhat tuned to certain critical bands. Since hearing depends on physical structures in the inner ear, the frequencies at which these structures best resonate is important. Frequency masking is a result of the ear structures becoming "saturated" at the masking frequency and nearby frequencies.

Hence, the ear operates something like a set of band - pass filters, which each allows a lim­ited range of frequencies through and blocks all others. Experiments that show this are based on the observation that a constant - volume sound will seem louder if it spans the boundary between two critical bands than it would were it contained entirely within one critical band, In effect, the ear is not very discriminating within a critical band, because of masking.

Critical bands and their bandwidths

Bark Unit. Since the range of frequencies affected by masking is broader for higher frequencies, it is useful to define a new frequency unit such that, in terms of this new unit, each of the masking curves.

The new unit defined is called the Bark, named after Heinrich Barkhausen (1881 - 1956), an early sound scientist. One Bark unit corresponds to the width of one critical band, for any masking frequency. The conversion between a frequency f and its corresponding critical - band number b, expressed in Bark units, is as follows:

Effect of masking tones, expressed in Bark units

in terms of this new frequency measure, the critical - band number b equals 5 when / = 500 Hz. At double that frequency, for a masking frequency of 1 kHz, the Bark value goes up to 9. Another formula used for the Bark scale is as follows:

where f is in kHz and b is in Barks. The inverse equation gives the frequency (in kHz) corresponding to a particular Bark value b:

Frequencies forming the boundaries between two critical bands are given by integer Bark values. The critical bandwidth (df) for a given center frequency f can also be approximated by

where f is in kHz and df is in Hz

The idea of the Bark unit is to define a more perceptually uniform unit of frequency, in that every critical band's width is roughly equal in terms of Barks.

The louder the test tone, the shorter the amount of time required before the test tone is audible once the masking tone is removed

Recall that after the dance it takes quite a while for our hearing to return to normal. Generally, any loud tone causes the hearing receptors in the inner ear (little hairlike structures called cilia) to become saturated, and they require time to recover. (Many other perceptual systems behave in this temporally slow fashion — for example, the receptors in the eye have this same kind of "capacitance" effect.)

To quantify this type of behavior, we can measure the time sensitivity of hearing by another masking experiment. Suppose we again play a masking tone at 1 kHz with a volume level of 60 dB, and a nearby tone at, say, 1.1 kHz with a volume level of 40 dB. Since the nearby test tone is masked, it cannot be heard. However, once the masking tone is turned off, we can again hear the 1.1 kHz tone, but only after a small amount of time. The experiment proceeds by stopping the test tone slightly after the masking tone is turned off, say 10 msec later.

The delay time is adjusted to the minimum amount of time such that the test tone can just be distinguished. In general, the louder the test tone, the less time it takes for our hearing to get over hearing the masking tone. The above figure shows this effect: it may take up to as much as 500 msec for us to discern a quiet test tone after a 60 dB masking tone has been played. Of course, this plot would change for different masking tone frequencies.

Test tones with frequencies near the masking tone are, of course, the most masked. Therefore, for a given masking tone, we have a two - dimensional temporal masking situation. The closer the frequency to the masking tone and the closer in time to when the masking tone is stopped, the greater likelihood that a test tone cannot be heard. The figure shows the total effect of both frequency and temporal masking.

The phenomenon of saturation also depends on just how long the masking tone has been applied. The following figure shows that for a masking tone played longer (200 msec) than another (100 msec), it takes longer before a test tone can be heard.

As well as being able to mask other signals that occur just after it sounds (post - masking), a particular signal can even mask sounds played just before the stronger signal (pre - masking).

Effect of temporal masking also depends on the length of time the masking tone is applied. Solid curve: masking tone played for 200 msec; dashed curve: masking tone played for 100 msec

Pre - masking has a much shorter effective interval (2 - 5 msec) in which it is operative than does post - masking (usually 50 - 200 msec).

MPEG audio compression takes advantage of these considerations in basically construct­ing a large, multidimensional lookup table. It uses this to transmit frequency components that are masked by frequency masking or temporal masking or both, using fewer bits.