Home Email this page Print this page Bookmark this page Decrease font size Default font size Increase font size
Noise & Health  
Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Email Alert *
Add to My List *
* Registration required (free)  

   Article Figures
   Article Tables

 Article Access Statistics
    PDF Downloaded17    
    Comments [Add]    
    Cited by others 5    

Recommend this journal


  Table of Contents    
Year : 2011  |  Volume : 13  |  Issue : 53  |  Page : 277-285
Hearing speech in music

Audiological Research Centre (Ahlséns), University Hospital of Örebro, Sweden

Click here for correspondence address and email
Date of Web Publication14-Jul-2011

The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (P<.01). Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01) and SPN (P<.05). Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01), but there were smaller differences between masking conditions (P<.01). It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

Keywords: Hearing impairment, masking, music, normal hearing, speech

How to cite this article:
Ekström SR, Borg E. Hearing speech in music. Noise Health 2011;13:277-85

How to cite this URL:
Ekström SR, Borg E. Hearing speech in music. Noise Health [serial online] 2011 [cited 2021 Jan 25];13:277-85. Available from: https://www.noiseandhealth.org/text.asp?2011/13/53/277/82960

  Introduction Top

Music is an increasingly prevalent component of the acoustic environment. Although it is normally played for pleasure and entertainment, it has a wide range of effects on the human mind and body - general physiological and possibly pathogenic effects, therapeutic and pain-relieving effects; [1],[2] and it can be hazardous to the ear (review [3] ). On the psychosocial level, it is possible to identify three types of music-listening situations:

Active listening: You choose to listen to music - in your home, on a dance floor or in a concert hall; or as art, entertainment, for relaxation, etc. Music has a positive value.

Positive background: You are surrounded by music while involved in another activity - at work, when driving, shopping, visiting a restaurant, etc. The music may influence your activity, feelings and behavior, but it is not disruptive or disturbing. You accept it, feel neutral or comfortable.

Disturbing and annoying environment. The music disrupts, disturbs and annoys you; it interferes with your present situation and you wish it would stop.

Some aspects of all three situations are the focus of the present investigation, and one general ambition is also to gather systematic empirical information on how music with different features interferes with understanding of running speech (or more formal recitations). There is a long tradition explaining how to compose and arrange music so that vocal parts and individual instruments are perceived in the orchestral background. Few scientific studies, however, have analyzed the relation between music features and speech masking. This topic has bearing on music as such but perhaps even more bearing on music as part of our general environment.

When the Department of Restaurant and Culinary Arts at Orebro University built up its restaurant education program - a program designed to reflect a tradition of good restaurant-management and craftsmanship - five dimensions and five guidelines were formulated in relation to the Room, the Product, the Meeting, the Atmosphere and the Management control system. [4] The influence of music on these dimensions of restaurant visits has received very little attention in research, with the exception of purchase behavior and atmosphere, which have been studied by North and Hargreaves. [5] A restaurant conversation is the most obvious activity (Gustafsson et al.[4] ) in which the acoustic environment has an impact. The atmosphere affects the encounter/the activity, generating the emotional basis for the interaction between the dinner participants. The individual preferences of the guests are of great importance; as is the general context of the dinner, whether it is a dignified celebration or a casual party.

The encounter is by definition built on communication, which is a multidimensional phenomenon. [6] One important component is verbal communication. The restaurant environment should therefore ensure the fulfillment of at least two important conditions:

It should be easy to hear members of your own party.

It should not be easy for people at surrounding tables to overhear your conversation, and vice versa.

The music played at the restaurant thus plays at least three roles for the communication among the guests:

To create a contextually relevant and positive acoustic atmosphere.

To interfere with relevant speech, neither for individuals with normal hearing (NH) nor for the large number of people [7] with hearing impairment (HI).

To prevent perception of irrelevant speech.

The latter two concern the masking effect of music on speech perception.

Interestingly, there are very few publications addressing these issues. The reason is probably at least partly related to the difficulty of defining music in acoustic terms. Music can cover the whole acoustic spectrum and range from just audible to painful. Tsaneva [8] investigated the effects of pop music, classical music and pop music played backwards, all at 65 dBA, on the perception of monosyllables. She found no difference between the music pieces in terms of masking ability, provided the signal-to-noise ratio was the same. Rhebergen et al.[9] found that a specific piece of music had an intermediate masking effect on speech, between that of frog song and construction noise. Music may also interfere with speech perception as "an irrelevant signal" in the same manner as speech itself has been found to be an irrelevant interfering factor (e.g., [10] ). An interesting finding was presented by Russo and Pichora-Fuller. [11] They found that word identification was better in the presence of familiar music than unfamiliar music or babble noise. This was only the case for young listeners, not for older ones. Trained musicians had better speech-in-noise performance than did non-musicians, as well as better frequency discrimination and working memory. [12]

In the area of speech, hearing and audiological science, practically all masking studies have focused on the perception of speech masked by synthetic noise or combinations of interfering speech. We have found no earlier studies on the masking effect of music on speech perception in relation to music features and hearing status. The music used for such a masking experiment has to be designed on the basis of current knowledge of auditory masking, with both the above-mentioned aspects in mind: Hearing relevant speech and not being distracted by irrelevant speech.

The literature on masking is extensive, and only a few aspects will be mentioned here. Masking means making a target sound (signal, often speech) inaudible (complete or total masking); or less audible, i.e., weaker (partial masking). A sound (a masker) that covers the spectrum of the target sound and that occurs at the same time causes direct, simultaneous masking. The masker, however, can cause masking even before it actually starts (backward or pre-masking) and, more importantly, after it has terminated (forward or post-masking). In addition, the masking effect can spread outside the physical spectrum of the masker. This spread of masking is most important towards higher frequencies (upward spread of masking) and to a lesser extent towards lower frequencies (downward spread of masking). The upward spread of masking increases at higher masker levels. The temporal effect of masking is most marked at and near the masker frequency, but it is also spread outside this range. A review on speech intelligibility in noise is provided by Bronkhorst. [13] The terminology of masking is, however, not unambiguous. For instance, the term "amount of masking" has two meanings: The level of the masker and the shift in the hearing threshold caused by the masking sound. Therefore, in the present article, "masking delivered" is suggested as a synonym for the level of the masker, and "the amount of masking produced" denotes the amount of shift in the hearing threshold caused by the masker.

These basic psychoacoustic properties also apply to masking of speech in realistic conditions, where the masker is music, speech from one speaker or several other speakers, or ambient noise. The masking effect of any sound is also highly dependent on the hearing status of the listener, whether he/she has NH or HI. Persons with HI need a higher signal-to-noise ratio than do individuals with NH. [14] Given that a large proportion of adults have HI, studies including subjects with HI are not solely of academic interest. This group must also be considered by those who wish to reach an audience including individuals over 60 years of age. [7]

A series of realistic masking noises has been constructed and evaluated within the International Collegium of Rehabilitative Audiology (ICRA) [15] and distributed widely. The material contains nine noises with spectra shaped by the long-term average spectrum for male or female voices, with the individual talking in a normal, raised or loud voice. The time course is shaped by one, two or six speakers. The envelope of the speech spectrum-weighted noise is thereby modulated with a peak at 4 Hz for the one-talker condition and practically unmodulated for the six-talker condition. [15]

As described above, the temporal fluctuations are simulated in the ICRA noises, but the spectral fluctuations in natural speech are not maintained. This is a deliberate simplification but also a potential limitation. By playing music at different speeds and in different octaves, one can circumvent this limitation. The effect of both temporal and spectral fluctuations can be maintained and manipulated selectively while the masker (music) is still perceived as music. Thus, the use of especially composed and systematically designed music as a masker of speech may be of interest from both a practical and a theoretical point of view. The focus of the present study is to highlight the importance of hearing speech in music, and music as a practically and theoretically important masker. In order for the results to achieve high everyday validity, the dependent variable was running speech rather than words or artificial sentences.

The overriding question of the present study is: How should music be designed to allow speech to be heard - or to prevent it from being heard? The specific question is: How does the masking effect vary when a masking piano piece is played at different tempi and in different octaves, for individuals with NH and for those with HI?

  Methods Top


The participants consisted of two groups of young subjects, one comprising individuals with NH (n=15) and the other (n=14) comprising individuals with varying degrees of sensorineural HI [Table 1]. All participants were white-collar workers, and most had a university education. The age of the first group was 20-39 years; and of the second group, 18-38 years. The age range of the subjects with HI was chosen to match that of the NH group so as not to introduce confounding differences in cognitive abilities. The obvious drawback of this selection method is that the HI group was less representative of the majority of HI subjects, who are older, on an average. The hearing thresholds of the NH group were 20 dBHL (ISO 389) or better between 0.25 and 4.0 kHz. All 14 subjects with HI had attended clinical audiological rehabilitation programs, and all but one used hearing aids.
Table 1: Audiological characteristics of the hearing-impaired subjects

Click here to view

The audiometric features of the 14 subjects with HI are shown in [Table 1]. In [Table 1], the order of the subjects is identical to the order of the arrows in [Figure 1], including information on hearing aid status. The hearing aids were fitted using standard procedures and the fitting programs provided by the manufacturers, and were worn during the tests, thereby increasing the everyday validity of the measurements.
Figure 1: Individual values for unmasked and masked thresholds, JFC, Holgersson target, for 15 individuals with NH and 14 individuals with HI using hearing aids. The subjects with HI are presented in the same order from top of the figure as that in Table 1. Left end of arrow shows unmasked thresholds, arrowhead shows masked thresholds. Left: Masking with low-octave piano piece (O2), fast (180 beats/min). Right: Masking with high-octave piano piece (O5), slow (60 beats/min)

Click here to view

For the 13 subjects with bilateral HI, the thresholds were as follows:

M3 (the average value for the better ear at the frequencies 0.5, 1.0, 2.0 kHz, i.e., conventional Pure Tone Average, PTA) was in the range 6.7-75.0 dBHL. Mean value was 46.3 dBHL (SD=17.2 dB).

M4 (average for the frequencies 0.5, 1.0, 2.0, 4.0 kHz) was in the range 22.5-72.5 dBHL, and the mean was 49.5 dBHL (SD=14.2 dB).

M5 (average for the frequencies 0.5, 1.0, 2.0, 3.0, 4.0 kHz) was in the range 29.0-71.0 dBHL, and the mean was 50.9 dBHL (SD=13.3 dB).

The subject (Subject 9) with unilateral HI had the following thresholds on the affected ear: M3=58.0 dBHL; M4=57.5 dBHL; M5=57.0 dBHL.

Equipment and test conditions

The tests were performed in the acoustic environmental room (3 mΧ3.4 m; height, 2.4 m) at Ahlsιn Research Institute [16] ; the room's reverberation time is 0.2 second. The room is equipped with 12 loudspeakers (Bose, model 101, Music monitor in circular arrangement, radius=1.4 m), and the masking sound and the target sounds were presented through a loudspeaker at zero degrees azimuth.

The subject was instructed to adjust the level of the target sound and to indicate when he/she could just follow the conversation ["just follow conversation" (JFC) method]. [16],[17],[18] For the present purpose, the JFC method was preferable to the conventional "speech reception threshold" (SRT) methods, because it is quicker and has greater face validity. This was a critical advantage in light of the large number of threshold determinations needed in the present study design. On the other hand, no comparisons with SRT estimations based on articulation index (AI) or speech intelligibility index (SII, e.g. [19] ) could be made. Such comparisons require a different study design and a new set of data.

Calibration of the masker and test sound in the acoustic environmental room was performed using a Larson and Davies sound-level meter and a Bruel and Kjaer 2209 sound-level meter. The audiometric thresholds were obtained using standard clinical procedures with audiometers calibrated according to ISO 389.

Test material

The target sound (the signal) was a male voice reading a story in Swedish from the Swedish book "The Wonderful Adventures of Nils Holgersson" by S. Lagerlφf (standardized by Borg et al.[16] ). The long-term (first 3 minutes, male reader) spectrum of the target test-reading is shown in [Figure 2].
Figure 2: Spectrum (1/3-octave) of the first 3 minutes of the test material for determination of speech reception thresholds with the “just follow conversation” (JFC) method. Male reader. Read from “The Wonderful Adventures of Nils Holgersson” (S. Lagerlöf)

Click here to view

The masker was a pentatonic piano composition in five beats per bar [Figure 3], specially composed for this experiment (by S-R E). In order to achieve a homogenous flow of tones, on an average, five tones were played within each bar. This simplified the procedure of adjusting the level of the target sound. The melodic range of the piano piece was limited to one octave (for example, the piano key f in the third octave to f in the fourth). The piece was played in four different octaves [octave 2 (O2) to octave 5 (O5)] and at three speeds: Slow (s), M=60 (M - Metronome speed, beats per minute); medium (m), M=120; and fast (f), M=180. With this design, it is possible to measure masking in relation to frequency range (octave) and speed.
Figure 3: Piano piece used for masking

Click here to view

The physical acoustic features of the piano music are shown in [Figure 4]. The left side shows the time course of the sound level for the music in the fifth octave and at the three different speeds.
Figure 4: Acoustic features of the piano piece. Left side shows time course (slow, medium and fast). The vertical axis shows the sound level in dBA. The horizontal axis shows the time in seconds; Right side shows spectrum for the piano piece played in four octaves (O2-O5). The vertical axis shows the sound level in dBA. The horizontal axis shows the frequency in octaves, from 0.05 to 16 kHz

Click here to view

The time gaps are more pronounced for the slow speed diagram, which may allow for better sound perception during the gaps. In addition (not shown), the sound level had a slower decay and thereby shorter or nonexisting gaps between the piano key strokes when played in the lower octaves.

The right side of the figure shows the average acoustic spectrum of the first minute of the composition, played in octaves O2 to O5 at medium tempo. The sound level for all 15 maskers was the same, 50 dBA-equivalent level (well above the hearing threshold for all subjects, as the subjects with HI used their hearing aids in the test situation). Observe the upward shift of the low-frequency skirt of the spectrum from O2 to O5. In the O2 and O3 diagram, the sound level is clearly higher in the lower-frequency region than for the spectrum of the higher octaves, which may indicate more masking of speech for the piano piece played in the lower octaves.

In addition to the 12 piano maskers, three standard maskers were also included. Two ICRA (International Collegium of Rehabilitative Audiology [15] ) noises were selected, consisting of two- and six-person babble noise (tracks 5 and 8 of the ICRA-CD, normal vocal effort). ICRA noises are artificial noise signals with speech-like temporal properties, used for speech-perception studies and hearing-instrument assessment. In addition, one SPN (speech spectrum-filtered noise; cut-off at 1.0 kHz, -12 dB/oct [20] ) was used from a standard clinical diskette, CA Tegnιr AB, Stockholm, Sweden.


After ear inspection and verification of normal eardrum status, a standard clinical audiogram was obtained. The subjects with HI used their hearing aids in the tests [Table 1]. In the masking measurements, the subjects were seated in the test room and instructed about the test conditions and the JFC method. The participants adjusted the target reading until it was possible to just follow the conversation, without necessarily hearing every word.

The different maskers were presented in random order to prevent a learning effect. [21] Each masker was repeated twice, in descending order (O5 fast tested first) and ascending order (O2 slow tested first). If the masked threshold differed by 5 dBA or more, the masker was presented once more. The subjects were allowed to take the time they needed to assess their JFC threshold. At the end of the session, the subjects were asked whether or not they thought the music was enjoyable or annoying.

Statistical procedures

Standard parametric methods were used: The Student t test and Pearson correlation coefficient (r). Also nonparametric tests were used with SPSS versions 15 and 17: Mann-Whitney U test and Spearman rho. Repeated measurement ANOVA (RM-ANOVA) and t test were used for assessment of significance of octave and tempi effects on JFC thresholds, and the role of hearing status. To test for violation of sphericity, the Mauchly's Test of Sphericity was used. Huynh-Feldt correction was used if sphericity could not be assumed.


The study was approved by the Regional Ethics Committee, Uppsala, Sweden (Reg. no. 2007/141).

  Results Top

Masked JFC thresholds for the NH group (mean±SD) and for two typical individuals are shown in [Figure 5] for the piano piece played in octaves O2 to O5 at slow speed. As seen in the figure, the masking effect decreases as the frequency of the piano piece is increased, that is, when it is performed in a higher octave. There is no individual peak indicating a possible preferred combination of speed and octave.
Figure 5: Just-follow-conversation (JFC) thresholds (dBA) during masking with a piano piece played at slow speed, S=60 beats/min, for different octaves (O2-O5). Two subjects with NH and mean for the whole group with NH

Click here to view

The average masked JFC thresholds (±SEM, standard error of the mean; see figure legend) for the groups of subjects with NH and HI are shown in [Figure 6]. The vertical axis indicates the level of hearing threshold in dBA. The 15 different maskers are specified along the horizontal axis of the diagram. First from the left is the value of the hearing threshold with no noise (no masker). Then there are the values for the 12 pieces of piano music (more precisely, 12 versions of the piano piece of [Figure 3], starting with the piece in the second octave played at the slow tempo, O2s, followed by the piece in the same octave played at the middle tempo, O2 m. The last of the piano pieces is in the highest octave played in fast tempo, O5f. Finally there are values for the two ICRA noises and the SPN. The three speeds (tempi) are identified by the three lines: Continuous line for slow speed, long dashes for medium speed, and short dashes for fast speed.
Figure 6: Mean values for 15 subjects with NH (SEM, 1.00-1.74 dB) and 14 subjects with HI using hearing aids (SEM, 1.25- 2.57 dB) for unmasked thresholds and thresholds obtained with masking with a piano piece played at a slow (s, 60 beats/ min), medium (m, 120 beats/min) and fast (f, 180 beats/min) tempo and in four octaves (O2-O5). The three speeds (tempi) are identified by the three lines – continuous line for slow, long dashes for medium, and short dashes for fast speed. Thresholds for ICRA noise with 2 (SEM for subjects with NH and HI, 0.97 and 1.30 dB, respectively) or 6 (SEM for subjects with NH and HI, 0.87 and 1.56 dB, respectively) talkers and SPN (SEM for subjects with NH and HI, 0.71 and 1.30 dB, respectively) are shown to the right

Click here to view

Two tendencies are clear, but more pronounced for the NH group. First, the masked threshold decreases when the octave of the music piece increases, by 3.5 dB/octave. For all tempi, the octave is a significant factor for both subjects with NH and subjects with HI (e.g., RM-ANOVA F(2.0,42)=110; P=0.000 Huynh-Feldt correction, for slow speed for subjects with NH).

Second, the masked threshold increases with increased speed, by 1.5 dB per increment of 60 beats per min. Speed was found to have a significant effect for all octaves for the NH subjects (RM-ANOVA P< 0.002, Huynh-Feldt correction, independently of octave). For the subjects with HI, speed had a significant effect (RM-ANOVA P< 0.01, Huynh-Feldt correction, for octaves O2, O4 and O5) for all octaves except O3.

To summarize, the fast piano music played in the lowest octave gives the highest masked thresholds, whereas the slow piano music played in the highest octave gives the lowest masked thresholds, both for subjects with NH and for those with HI. Furthermore, the masked threshold for the ICRA noise and the SPN is higher than that of the O2f piano music (for NH: P< 0.01; for HI: P< 0.05, Mann-Whitney U test).

The variation in masked threshold between the different masking conditions is much greater for the subjects with NH than for the subjects with HI. The difference between the lowest and the highest masked thresholds (for O5s and O2f, respectively) was significantly greater for subjects with NH than for those with HI (P<0.01, df=27 equal variances assumed, t test).

The two piano pieces that masked the most and the least are shown in [Figure 1], left and right, respectively. In [Figure 1] (left), the piano piece is played in the second octave at fast tempo (O2f). In [Figure 1] (right), it is played in the fifth octave and at slow tempo (O5s). The figures show how each individual JFC threshold is affected by the two different maskers. The two groups of horizontal arrows represent the two groups of subjects. The group of 15 arrows in the upper part of the figure shows the results for the individuals with NH. The group in the lower part represents the individuals with HI, who are presented in the same order as in [Table 1]. The horizontal axis gives the JFC threshold value in dBA. The left point of the arrow indicates the JFC threshold without masker, and the right point, the arrowhead, indicates the JFC threshold when masked by the piano music; and the length of the arrow indicates the amount of masking produced.

As seen in the figure, the amount of masking produced is generally greater for the NH group than for the group with HI. Within the groups as well, there is a relationship between the amount of masking produced (raise of-increase in the JFC threshold, length of the arrow) and the unmasked JFC threshold: The lower the unmasked threshold, the larger the amount of masking produced. For the subjects with HI, the correlation is significant (rho =−0.89, P<.01 for O2f; nonsignificant for O5s). For the NH subjects, there was no correlation, and some subjects showed a markedly larger amount of masking produced than the majority (e.g., NH subject number 13 in [Figure 1], octave O5s). Individual threshold criteria probably play an important role in the explanation of such differences (see further in "Discussion" Methodological aspects").

The music was perceived as annoying by 3 of the subjects with NH and by 5 of those with HI (a nonsignificant difference).

In summary, the masking effect (masked threshold and amount of masking produced) of the tested piece of piano music increased with increasing tempo and decreasing octave.

  Discussion Top

Methodological aspects

The choice of JFC as a dependent variable was motivated by the higher validity of everyday conversation as compared to monosyllables or synthetic sentences. Also favorable was the large number of measurements in the present study design. A comparison between JFC and 50% SRT determinations [17] shows about 3 dB higher thresholds for JFC than for 50% SRT for subjects with NH; and about 1.5 dB, for subjects with HI. Larsby and Arlinger [22] found a considerably larger difference, a 10.5 dB higher signal-to-noise ratio for the JFC than for the 50% SRT method pooled over subjects with NH and HI and different masker noises. An analysis of individual correlations showed that there was no significant correlation between JFC thresholds and 50% SRTs in a steady noise masker; and a moderate correlation with speech maskers, 0.71-0.79.

The differences between SRTs and JFC thresholds, as well as the individual differences in masked thresholds in the NH subjects of the present study [Figure 1], may reveal other interesting aspects of communication and failures in communication. For example, the amount of masking produced in one of the NH individuals (represented by the third arrow from the bottom in the right column of [Figure 1]) is much greater than that produced in the others. The audiogram for this susceptible individual is normal (M5: 5 and 7 dBHL). This individual is one of three in the NH group who were annoyed by the piano music during the test. Thus, this observation indicates the importance of emotional components. Furthermore, the JFC method entails the possibility of an influence of an individual threshold criterion. It is probable that the NH subjects with great masking produced (long arrows in [Figure 1]) have high demands on signal quality before they feel they can just follow the conversation. In some, but not all cases with HI, both unmasked and masked thresholds are elevated. Among the subjects with HI, there are also cases for which the thresholds are unexpectedly high, or low, in relation to their audiograms. Because the subjects with HI used hearing aids, no detailed analysis can be made for those cases in the absence of data on free-field thresholds with hearing aids.

An obvious disadvantage of the above-mentioned lack of agreement between JFC thresholds and SRTs is that it becomes difficult to compare the present threshold data with the data obtained when applying different AI models with the present music maskers. Furthermore, in the test situation, the subjects with HI used their hearing aids in order to achieve a high everyday validity in the findings. This also made application of AI and SII models more complicated (see below, in section "Consideration of Speech Intelligibility Index models").

Music as masker

The reason why the speech-masking effect of music has not previously been investigated is at least partly related to the difficulty of defining music in acoustic terms. Music can cover the whole acoustic spectrum and range from just audible to painful. Obviously, no general conclusions can be drawn about music, except for the effect of sound level. Therefore, we were presented with a challenge some years ago when a young subject with HI asked, "Why do you use speech as a masker? It's music that keeps us from hearing." This question gave the idea to start an investigation on the masking effect of music. Our ambition has then been to find out the answer to the following question: How can a piece of music be created to cause minimum versus maximum masking, while still maintaining the same level, a similar character and being perceived as music?

The spectrum of the music of the lowest octave (O2) overlapped with the speech frequencies and had a considerable masking effect on vowels and low-frequency consonants [Figure 4] and spectra in Lide΄n (accent egu) work. [20] Considering the upward spread of masking, the high-frequency consonants are also influenced. Music played in the higher octaves, on the other hand, only influenced part of the highest speech-frequency range. Therefore, it has less influence on perception of speech (a small influence on the JFC threshold).

The dependence of the masking effect on the level of the unmasked threshold is clear: The lower the threshold, the larger the increase during masking. This is compatible with recruitment and other masking data. For the NH group, the importance of other factors is also evident. The individual's threshold criterion has an obvious influence. A person requiring that most of the words be clearly heard will have a higher JFC threshold than a person who feels a low percentage is enough. A person who is familiar with the text is also likely to indicate a lower JFC threshold than a person for whom the text is new. Furthermore, cognitive capacity is important, particularly for listening in fluctuating noise. [23] The benefit of fluctuation of the masking noise is much less for subjects with HI than for those with NH (first shown by Carhart and Tillman [24] ). High age also decreases the listener's ability to utilize the silent periods. The advantage of having a young study group, as in the present study, is that it minimizes the confounding effect of low and variable cognitive capacity. An emotional component might have influenced one NH subject (Subject no. 13 in [Figure 1]) with an exceptionally large shift, and this observation merits further investigation.

The observed decrease in the masking effect when the peak of the spectrum of the masker was shifted from the lower to the higher part of the speech spectrum is compatible with the fact that the upward spread of masking is more pronounced than is the downward spread (de Marι and Rφsler, [25] and others). The lower masking effect of O3 compared to O2 is, however, somewhat surprising, as O3, too, covers most of the important middle and high frequencies of the target speech, as shown in [Figure 4].

The different presentation speeds correspond to differences in the periods of low sound level (silent intervals) between the piano key strokes. The length of these silent intervals depends on the octave, the lower tones having a longer duration and shorter pauses. For the O5 octave [Figure 4], the silent interval is more than 0.5 second at low speed. At high speed, the sound level seldom reaches baseline, and there is virtually no silent period. This can be compared with the data on ICRA noise as described by Wagener et al.[26] In the original recording with one male talker, the pauses were up to 2 seconds long. In a revised version of the ICRA noise, Wagener et al.[26] tested 250-millisecond and 62.5-millisecond pauses. They found that the masking effect increased for the shorter pauses.

Consideration of speech intelligibility index models

As mentioned in the section "Methodological aspects", the AI and SII models cannot be applied to the present masking data, because different dependent variables are used: JFC and 50% correct response. Furthermore, AI and SII are valid only for low-probability test material (SRT), whereas JFC is based on high-probability material. JFC thereby has superior everyday communication validity than SRT. The problem becomes insurmountable given the poor correlation between the variables, [22] and complicated by the use of hearing aids by the subjects with HI. However, some comparisons can be made at a semi-quantitative level.

The extended speech intelligibility index (ESII) is constructed to predict SRT in masking noises, which vary in both the spectral and temporal domain. [9],[19] It can be deduced from the data in the study by Rhebergen et al.[9],[19] that the ESII model gives a better prediction in time-varying masking noise than does the conventional SII model. Intermittent noises with longer silent intervals gave less masking than did noises with short intervals, and low-frequency and wide-band noises gave a larger amount of masking than did high-frequency time-varying noises. At this crude level of comparison, the present data with piano music as the masker give the same pattern of results as that expected from ESII.


The present results can be considered when designing music for dining and restaurants. One can focus on either maximum speech intelligibility at a certain table, or a high level of privacy, i.e., masking away the speech coming from other tables in the restaurant. The extreme condition is maximum masking in order to minimize verbal interaction. This may lead to maximal intake at the bar or minimal eating time in the lunch restaurant. The present results clearly indicate that one should select low-frequency, fast music for a decrease in speech intelligibility, and high-frequency, slow music for optimal speech reception. In addition, the overall intensity can of course be varied, creating, together with frequency and tempo, a wide range of possibilities to tailor music optimally for various communication purposes.

The higher masked thresholds for the subjects with HI (also when using hearing aids [Figure 1] and [Figure 6]) and the associated higher risk of becoming fatigued emphasize the importance of focusing on the acoustic features of social settings. Restaurant owners should be encouraged to create areas with different acoustic conditions. The guest with HI should be specific and expressive when selecting a restaurant.

In composing music, there are also a number of obvious general problems concerning masking between different instruments, which will not be discussed here. Only the role of a possible hearing impairment of the composer or the performer may be considered. As is shown in the present study, the masking pattern of one musical sound source (the piano) is altered for listeners with HI. This may influence the creation as well as the performance of music. One direct application of the present results is in relation to the music accompanying vocalists - often the piano, but also an entire orchestra, e.g., in opera performances. A composer with a sensorineural HI may prefer a slower tempo or a higher octave for the accompanying instruments than would a composer or performer with NH. In this way, the audience may enjoy more of the singer's articulation. The masking problem is, however, mutual. It is not only the music that can mask the perception of the voice. The singer's own voice can mask other non-vocal as well as vocal sounds for the audience, and in the singer's own ear [27] as well as in the ears of companion singers, e.g., in a choir. [28]

  Conclusions Top

The piano composition had the greatest masking effect when played at a high speed in a low octave and the least masking effect when played at a slow speed in a high octave, even though the equivalent level was the same (50 dBA). The masked threshold was higher for the hearing-impaired than for the normal-hearing subjects, but the difference in the masked threshold between different octaves and tempi of the masking piano music was smaller. The everyday validity of the results was increased by using JFC thresholds. The present article points out the importance of creating a balance between speech, singing and accompanying instruments in order to optimize the hearing of speech in music - at the opera and concerts, as well as at restaurants.

  Acknowledgments Top

This study was supported by grants from Swedish Council for Working Life and Social Research (FAS, Forskningsrεdet fφr Arbetsliv och Socialvetenskap), the University Hospital of Φrebro and Φrebro University. Thanks to Christina Bergkvist for performing the measurements.

  References Top

1.Borg E. Physiological and medical effects of noise. Acta Otolaryngol 1981;381:1-68.  Back to cited text no. 1
2.Nilsson U. The anxiety-and pain-reducing effect of music interventions: A systematic review. AORN J 2008;87:780-807.  Back to cited text no. 2
3.Kähäri KR. The influence of music on hearing, Dissertation. Dept. of Otolaryngology. Sweden: University of Gothenburg;2002. p.1-73  Back to cited text no. 3
4.Gustafsson I, Öström Å, Johansson J. Mossberg L. The five aspects meal model: A tool for developing meal services in restaurants. J Foodservice 2006;17:84-93.  Back to cited text no. 4
5.North AC, Hargreaves DJ. The effect of music on atmosphere and purchase intentions in a cafeteria. J Appl Soc Psychol 1998;28:2254-73.  Back to cited text no. 5
6.Hansson H. Monaurally deaf persons. Sweden: Stockholm University;1993.  Back to cited text no. 6
7.Davis AC. Hearing in adults. The prevalence and distribution of hearing impairment and reported disability in the MRC Institute of Hearing Research´s National Study of Hearing. London: Whurr Publishers; 1995. p.1-1101.  Back to cited text no. 7
8.Tsaneva L. Masking effect of music upon oral information processing. Centr Eur J Public Health 2003;11:173-5.  Back to cited text no. 8
9.Rhebergen KS, Versfeld NJ, Dreschler WA. Prediction of the intelligibility for speech in real-life background noises for subjects with normal hearing. Ear Hear 2008;29:169-75.  Back to cited text no. 9
10.Schlittmeier SJ, Hellbrück J, Thaden R, Vorländer M. The impact of background speech varying in intelligibility: Effects on cognitive performance and perceived disturbance. Ergonomics 2008;51:719-36  Back to cited text no. 10
11.Russo FA, Pichora-Fuller MK. Tune in or tune out: Age-related differences in listening to speech in music. Ear Hear 2008;29:746-60.  Back to cited text no. 11
12.Parbery-Clark A, Skoe E, Lam C, Kraus, N. Musician enhancement for speech-in-noise. Ear Hear 2009;30:653-61.  Back to cited text no. 12
13.Bronkhorst AW. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acustica 2000;86:117-28.  Back to cited text no. 13
14.Hagerman B. Clinical measurement of speech reception thresholds in noise. Scand Audiol 1984;13:57-63.  Back to cited text no. 14
15.Dreschler WA, Verschuure H, Ludvigsen C, Westermann S. ICRA noises: Artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International Collegium of Rehabilitative Audiology. Audiology 2001;40:148-57  Back to cited text no. 15
16.Borg E, Wilson M, Samuelsson E. Towards an ecological audiology, stereophonic listening chamber and acoustic environmental tests. Scand Audiol 1998;17:195-206.  Back to cited text no. 16
17.Hawkins DB, Montgomery AA, Mueller HG, Sedge RK. Assessment of speech intelligibility by hearing-impaired listeners. In: Berglund B, Karlsson J, Lindvall T, editors. Noise as a Public Health Problem. Vol 2. Stockholm, Sweden. Swedish Council for building research; 1988. p. 241-6.  Back to cited text no. 17
18.Hygge S, Rönnberg J, Larsby B, Arlinger, S. Normal-Hearing and Hearing-impaired subjects´ ability to just follow conversation in competing speech, reversed speech, and noise backgrounds. J Speech Hear Res 1992;35:208-15.  Back to cited text no. 18
19.Rhebergen KS, Versfeld NJ, Dreschler WA. Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. J Acoust Soc Am 2006;120:3988-97.  Back to cited text no. 19
20.Lidén G. Speech audiometry. An experimental and clinical study with Swedish language material. Acta Otolaryngol 1954;114:1-145.  Back to cited text no. 20
21.Rhebergen KS, Versfeld NJ, Dreschler.WA. Learning effect observed for the speech reception threshold in interrupted noise with normal hearing listeners. Int J Audiol 2008b;47:185-8.  Back to cited text no. 21
22.Larsby B, Arlinger S. Speech recognition and just-follow-conversation tasks for normal-hearing and hearing-impaired listeners with different maskers. Audiology 1994;33:165-76.  Back to cited text no. 22
23.Houtgast T, Festen JM. On the auditory and cognitive functions that may explain an individual´s elevation of the speech reception threshold in noise," Int J Audiol 2008;47:287-95.  Back to cited text no. 23
24.Carhart R, Tillman TW. Interaction of competing speech signals with hearing losses. Arch Otolaryngol 1970;91:273-9.  Back to cited text no. 24
25.de Maré G, Rösler G. [Investigations on the masking effect in conductive impairment and middle ear impairment]. Acta Otolaryngol 1950;38:179-90.  Back to cited text no. 25
26.Wagener KC, Brand T. Kollmeier B. The role of silent intervals for sentence intelligibility in fluctuating noise in hearing impaired listeners. Int J Audiol 2006;45:26-33.  Back to cited text no. 26
27.Borg E, Bergkvist C, Gustafsson D. Self-masking during vocalization. Normal hearing. J Acoust Soc Am 2009;125:3871-81.  Back to cited text no. 27
28.Ternström S. Hearing myself with others: Sound levels in choral performance with separation of one´s own voice from the rest of the choir. J Voice 1994;8:293-302.  Back to cited text no. 28

Correspondence Address:
Erik Borg
Audiological Research Centre (Ahlséns), University Hospital of Örebro; SE 70185, Örebro
Login to access the Email id

Source of Support: Swedish Council for Working Life and Social Research, Conflict of Interest: None

DOI: 10.4103/1463-1741.82960

Rights and Permissions


  [Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6]

  [Table 1]

This article has been cited by
1 Music mixing preferences of cochlear implant recipients: A pilot study
Wim Buyens,Bas van Dijk,Marc Moonen,Jan Wouters
International Journal of Audiology. 2014; : 1
[Pubmed] | [DOI]
2 Susceptibility to interference by music and speech maskers in middle-aged adults
Deniz Baskent,Suzanne van Engelshoven,John J. Galvin
The Journal of the Acoustical Society of America. 2014; 135(3): EL147
[Pubmed] | [DOI]
3 Self-generated sounds of locomotion and ventilation and the evolution of human rhythmic abilities
Matz Larsson
Animal Cognition. 2013;
[Pubmed] | [DOI]
4 Accuracy of cochlear implant recipients in speech reception in the presence of background music
Gfeller, K. and Turner, C. and Oleson, J. and Kliethermes, S. and Driscoll, V.
Annals of Otology, Rhinology and Laryngology. 2012; 121(12): 782-791
5 Music masking speech in hybrid cochlear implant simulations
Hossain, S., Assmann, P.
Proceedings of Meetings on Acoustics. 2011;