| Article Access Statistics|
| Viewed||10897 |
| Printed||330 |
| Emailed||7 |
| PDF Downloaded||158 |
| Comments ||[Add] |
| Cited by others ||4 |
|Year : 2003
: 6 | Issue : 21 | Page
|The use of male or female voices in warnings systems : A question of acoustics
J Edworthy, E Hellier, J Rivers
Department of Psychology, University of Plymouth, Plymouth, United Kingdom
Click here for correspondence address
Speech warnings and communication systems are increasingly used in noisy, high workload environments. An important decision in the development of such systems is the choice of a male or a female speaker. There is little objective evidence to support this decision, although there are many misconceptions and misunderstandings on this topic. This paper suggests that both acoustic and non-acoustic differences (such as social attributions towards speakers of different sexes) between male and female speakers is negligible, therefore the choice of speaker should depend on the overlap of noise and speech spectra. Female voices do however appear to have an advantage in that they can portray a greater range of urgencies because of their usually higher pitch and pitch range. An experiment is reported showing that knowledge about the sex of a speaker has no effect on judgements of perceived urgency, with acoustic variables accounting for such differences.
Keywords: speaker sex, auditory warnings, speech communication, intelligibility, urgency
|How to cite this article:|
Edworthy J, Hellier E, Rivers J. The use of male or female voices in warnings systems : A question of acoustics. Noise Health 2003;6:39-50
| Introduction|| |
Communication in noisy environments is associated with a number of problems, not all of which are acoustic in nature. The obvious acoustic problems include masking of wanted by unwanted sounds and noise, and loss of intelligibility. Intelligibility is particularly a problem if speech messages are intended to be passed from one person to another, or from a speech communication system (such as a warning system) to a sometimes very busy listener. Whilst guidelines and theories exist for helping the designer of speech systems for use in noisy environments to cut down on loss of intelligibility and to minimise the effects of masking, there are also problems associated with noise that are a by-product of that noise rather than a direct result of it. For example, noise has effects on task performance, affects arousal, increases lack of control of tasks, and influences the way people perform a range of manual and particularly cognitive tasks (Weinstein et al., 2002; Baker and Stephenson, 2000; Ballard, 1996; Carter, 1996; Baker and Holding, 1993; Hanson et al, 1993; Jones and Broadbent, 1987). Noise also affects attention (Ballard, 1996; Baker and Holding, 1993).
Designers of warnings and communication systems therefore, not surprisingly, will make use of any design variables which appear to improve performance under such circumstances. One such variable is the sex of the speaker. If it can be shown that using a male voice, or a female voice, enhances the acoustic, performance, or other aspects of a communication system in noise, then it should be implemented in preference to the other sex.
Male and female voices are acoustically different, and listeners can differentiate male from female voices without having to see the speaker. Although the gender divide may be becoming more blurred in society, the associations that voices have with particular roles are, at least anecdotally, still prevalent. Male and female voices are typically unusual in particular circumstances; for example a male voice is unusual in a nursery or a creche, while a female voice is unusual in a helicopter cockpit (though female pilots are becoming more numerous nowadays). Female voices are often associated with nurturing, childhood, safety, security and so on whereas male voices might typically be associated with authority, dependability, and strength (e.g. Mcminn et al, 1993). Developers of speech warning systems sometimes believe that one voice is preferable over another (typically, that a female voice is better than a male voice), although the reasons vary and are generally unfounded in the scientific literature.
Some studies suggest that there are preferences for the voice of one sex over another, and this tends to favour the female voice, or the female listener, or both. For example, Whipple and Mcmanamon (2002) showed that the sex of advertisement presenters had a significant effect on the perception of gender-imaged products, but more specifically that female voices enhanced female-specific products, whereas the sex of the speaker had no effect for male-specific products. The sex of the speaker had no effect for neutral products (products directed at neither sex specifically). An earwitness testimony study (Wilding and Cook, 2000) showed that while males showed no differentiable ability to recognise male over female voices, females showed an enhanced ability to recognise female voices, as well as being more accurate overall. Thus when a preference is expressed, it seems to be for the female voice, and female listeners appear more attuned to speaker characteristics than do males.
This paper presents a study of the perception of signal words (such as 'Deadly', 'Danger', 'Note' and so on) in a quiet laboratory setting under varying sets of instructions concerning the sex of the speakers. The purpose of the study is to address the issue as to how factors other than the acoustic structure of voices might affect the perception of the words. In that the words are presented in quiet surroundings and the stimuli consist of only single words, the results are not aimed directly at application in themselves. However, the results do inform broader issues concerning the use of male and female voices in warnings and communications systems in both quiet and noise. Some of the relevant literature in this area is reviewed below.
Intelligibility differences between male and female voices
While evidence of real differences in people's more global judgement of spoken information depending on the sex of the speaker is scant, a more compelling argument might be that one sex should be chosen above another because it is more intelligible in noise. However, evidence is again minimal on this topic, most likely because each complex noise spectrum requires its own spectrally-tailored solution; whether to use male or female (or indeed whether the identification of the speaker's sex is important at all) will depend on the overlap of the noise spectrum and the spectrum of the voice or voices used.
Studies by Nixon et al (1996) suggest that although the intelligibility of male and female speech is approximately equal under ordinary noise and listening conditions, specific noise spectra may dictate the use of one or other voice, for example finding a niche in the spectrum where there is a lower level of noise. They carried out intelligibility tests in high level cockpit noise at levels ranging from 95dB to 115dB, finding that the intelligibility of female speech was significantly lower at the highest level of noise. However, when vocoded speech was tested using speech recognition (Nixon et al., 1998) no differences were found between the two sexes of speaker.
Nixon et al's study suggests that for the noise spectrum tested, there was a small advantage for the male voice in some conditions. If the overlap of noise and speech spectra is at the heart of this advantage, then we might equally expect there to be other complex noise environments (for example, a spectrum with a lot of low frequency noise) in which female voices might show some small advantages. Obviously the signal-to-noise ratio is key to the intelligibility of speech, and in complex noisy environments the determination of speech intelligibility is a function not only of overall levels of noise and speaker, but of the precise relationship between the spectrum of the noise and the spectrum of the speaker's voice, which will in turn be influenced by the functioning of auditory filters (Patterson 1974, 1976). Noise characterized by a spectrum that is similar to that of the voice will interfere more than noise with a different spectrum. Speech can also be digitally enhanced, so that potential differences between the loudness of a male and a female voice can be equalised, and also through peak clipping consonants can be boosted relative to vowels, in order again to make speech intelligible in noise. Filtering is also a possibility, in order to further improve the fit of the speech to the noise spectrum. However, it should be borne in mind that such transformations do influence the determination of factors such as the sex and age of the speaker, and their emotional state (Sorkin and Kantowitz, 1987). In speech communication systems the first two may be of no relevance, provided the message is understood, whereas understanding of a speaker's emotional state (such as the urgency with which they are speaking) may be of considerable importance. If more than one voice is to be used, then they should be different in pitch and quality. If this were allowed by the noise spectrum, it would be advantageous to use one female and one male voice.
There are other less direct acoustic influences on speech intelligibility, which can have as significant an impact as the acoustic structure of messages. For example, the relative probability of any message also has an impact on its intelligibility. Thus if a speech warning system, for example, has only a small number of alreadylearned messages then understanding may be achieved with lower signal-to-noise ratios; if that same system is also used to convey very rare, but potentially fatal, events then intelligibility will need to be higher. Thus appropriate attention in the design stage to the nature of the messages to be heard is likely to be as important as the question of overall intelligibility. In addition, advances in technology, particularly neural networks, make it possible to develop systems which produce signals at a level that adapts to current ambient noise levels (Kurisu and Fukuyama, 1996) and to boost secondary signals so that they can be heard appropriately in the context of primary signals (Cao et al., 1996). There are also other adaptations which improve signal-to-noise ratios, for example Yoon and Yoo (2002) propose a novel way of reducing additive time-varying noise by deciding whether each band in each time frame is noise- or speechdominant, and then reducing noise in the time frequency domain using modified spectral subtraction. This has the effect of reducing noise while minimizing speech distortion.
While improvements with technology continue to improve intelligibility issues, there will always be noise environments where speech intelligibility will remain a problem. In these cases, the best solutions may be not to use speech at all, but to use something more resistant to distortion such as nonverbal acoustic signals, or even visual signals if possible (though the kinds of environments where noise will typically be a problem are often those where the visual environment is already rather crowded). As to the choice of sex of speaker, we suggest that decisions should be made primarily on spectral grounds on a case-by-case basis as most other differences between male and female voices can be ironed out.
A question which can shed some light on the nature of the relationship between male and female voices, and in particular whether the male-female transition is a continuum or categorical in nature, is that of the ability of listeners to recognise the sex of a speaker.
There is evidence both to suggest that the perception of speaker sex is speaker-independent and unambiguous, and that the identification of speaker sex is not categorical but exists along a continuum. More specifically, it seems that there is a distinction between basic acoustic cues and phonetic cues. The male-female voice transition might be a continuum in purely acoustic terms but may be more categorical when one looks at phonetic influences. For example, Mullennix et al (1995) suggest that voice sex perception is based on an acoustic continuum, different in kind from the representation of phonetic information which appears to make the distinction between male and female voices more reliable. Wu and Childers (1991) showed that information about speaker gender in automatic speech recognition is time-invariant, phoneme independent, and speaker independent for both speaker genders.
They go on to distinguish ten vowels which appear to be responsible for distinguishing a speaker's sex. Mendoza et al (1996) analysed the speech of a number of male and female speakers using long-term speech analysis, and showed differences in amplitude across frequency bands for male and female voices. Female voices also showed greater levels of aspirational noise (making the voice more 'breathy') and lower spectral tilt. At a more general level, Gelfer and Young (1997) suggest that overall there is little difference between the conversational intensity levels of male and female speakers. Thus the evidence suggests that acoustical differences between the sex of speakers may exist along a continuum (so thus there could be genuinely ambiguous voices), but at a voiced and phonetic level the differences may become clearer.
If the male-female voice transition is a continuum for many important acoustic variables, then this make it less likely that there are qualitatively different responses to speech as a function of the sex of the speaker. Rather, it strengthens the view that responses are quantitative in nature and that it is quantitative differences in key acoustic parameters (such as pitch and level) which underpin differences between perception of male and female voices. The evidence presented in the following sections suggests that apparent differences between the perceived urgency of male and female voices are entirely underpinned by acoustic differences between the voices, and will vary systematically as the size of these differences varies.
Perceived urgency in speech warnings
The purpose of this paper is to argue that any preferences or advantages that exist between male and female voices are purely a result of differences between the acoustic structure of female and male voices. It is clear that as a phenomenon, perceived urgency is almost wholly underpinned by acoustic variables. We will also show that the differences in perceived urgency between male and female voices are underpinned by those same acoustic variables. We also suggest that with respect to perceived urgency there might be certain advantages in using female voices. A simple experiment using an obviously female voice and an ambiguous voice demonstrates the dominance of acoustic variables over knowledge of the sex of the speaker in urgency judgements.
There has been a considerable amount of research on perceived urgency in nonverbal warnings and the charting of the acoustic factors which underpin subjective assessments of urgency (Momtahan, 1990; Edworthy et al., 1991: Hellier et al., 1993; Haas and Casali, 1995; Burt et al., 1995; Haas and Edworthy, 1996). This work shows that the perceived urgency of nonverbal warnings can be determined very specifically through manipulation of acoustic parameters such as pitch, pitch range, level, speed, spectral structure and so on. More recently, Guillaume et al (2002) have demonstrated through multidimensional scaling studies that urgency differences between an experimental set of warnings generated by Edworthy et al (1991) are largely determined by their acoustical differences, as mappings based on difference judgements mirror very closely mappings based on urgency judgements.
More recently, focus has been given to acoustic influences in the perceived urgency in speech messages. The small body of research work that has focused specifically on the perceived urgency of speech has warnings has focused either on single signal words (words which are typically used to denote a hazard, such as 'Warning', 'Danger', 'Deadly', 'Note' and so on) or on very short phrases, two reasons for this being first that acoustic analysis of long speech messages is inevitably complex and noisy, and second that there is considerable research on signal words in the visual domain (e.g. Leonard, et al., 1989; Wogalter and Silver, 1990, 1995; Laughery et al., 1993; Braun and Silver, 1995; Wogalter et al, 1998; Hellier et al., 2000a; Edworthy et al., in press(a)). In particular, it is a well established point that signal words vary in their 'arousal strength' (Wogalter and Silver, 1995), a measure of a word's alerting quality which can be seen as the visual corollary of perceived urgency. Consistent patterns have been shown whereby some words such as 'Deadly' and 'Danger' are attributed a higher arousal strength than words such as 'Attention' and 'Note'. Thus word semantics have an impact on people's perception of the word, an additional factor which can be useful in application. Aside from replicating these effects for those words when presented visually, studies on the perception of spoken signal words have largely replicated the acoustic effects found for nonverbal warnings, as would be expected.
Many of the studies carried out concern the relationship between the speaker and the listener without detailed acoustic analysis being performed. However, we can infer from these studies that speakers are able to imbue a specific level of arousal in their utterances and that listeners interpret this accurately. For example, Barzegar and Wogalter (1998a, 1998b, 2000) found that words spoken in an 'emotional' way were rated higher in terms of the carefulness the listeners would show when hearing the words than words spoken either in a monotone fashion or a whisper. Hollander and Wogalter (2000) demonstrated that some of the main acoustic variables shown by Edworthy et al (1991) to affect urgency in nonverbal warnings also affect the perceived urgency (or arousal strength) of a small set of spoken warning words. These same acoustic variables are known to influence listeners' perception of fear (Scherer, 1986) as well as to be important in speech synthesis systems (e.g. Murray and Arnott, 1995).
Hellier et al., (2002) demonstrated the acoustic underpinning of these effects by carrying out detailed analysis of utterances. One male and one female speaker spoke a series of warning signal words in an 'urgent' and a 'nonurgent' way, and listeners' judgements confirmed that the urgent utterances were rated as more urgent than the nonurgent utterances. Acoustic analysis of the pitch, speed and level of those utterances varied across speaker, word, and utterance type, and when mapped against subjective responses, they very closely followed the urgency judgements. The results of the acoustic analysis were used to make synthetic 'urgent' and 'nonurgent' versions of the same set of signal words, with the differences between them being the mean of the pitch, speed and level differences between the urgent and nonurgent utterances produced by the human speakers. The results confirmed that urgency judgements almost exactly follow the acoustic manipulations made, although of course the story is a complex mixture of the relative strengths of the three acoustic parameters manipulated. Other acoustic variables are also likely to affect judgements, but these were not explored in this study.
Studies on the perceived urgency of speech have shown specific and replicable interactions that make an acoustic explanation likely. For example, one of the main findings from these studies is that female voices tend (though not always) to produce higher urgency ratings than male speakers. More importantly and consistently, female voices tend to produce a greater range of responses. Urgent utterances in a female voice are rated as more urgent than those in a male voice, while nonurgent utterances tend to produce lower ratings for female than for male voices (Barzegar and Wogalter 1998a, 1998b, 2000, Hellier et al, 2002; Edworthy et al (in press (a))). Edworthy et al (in press (a)) showed that when pitch alone is controlled but voices are still clearly male and female (the latter being more breathy and more typically female than the other), no differences in urgency judgments between male and female voices are to be found. This suggests not only that acoustic differences underpin sex differences, but also that the pitch of the voice is a key acoustic variable under such circumstances.
In cases where male speakers are perceived as being more urgent than female speakers, this can often be attributed to the male speaker talking at a considerably higher level than the female speakers, thus providing further evidence for an acoustic explanation of male/female voice differences. For example both Hellier et al (2002) and Barzegar and Wogalter (2000) found that, particularly in nonurgent conditions, the male voice was louder than the female voice and thus produced higher urgency ratings.
In the following experiment two voices are heard, one unambiguously female and one ambiguous (not clearly labelled as male or female), speaking a selection of signal words in a nonurgent and urgent manner. Participants were required to make urgency judgements of these words. In one condition participants were told that both voices are female; in another they were told that one is female, but that the ambiguous voice is male; and in the third they were told nothing about the sex of the speakers. Our hypothesis is that if urgency judgements are based entirely on acoustic structure then we should find the usual and replicable differences between urgent and nonurgent utterances, and between the speakers, but no differences in judgements across the three conditions.
A set of ten warning signal words was used in the study described below. The set was chosen because they have been shown to produce reliable differences in arousal strength and for the effects to be replicable across different scaling methods (Hellier et al, 2000a) and in the auditory as well as the visual modality (Weedon et al., 2000).
| Method|| |
The experiment was a mixed design with 3 within and one between-subjects variables: 2 (ambiguous female and unambiguous female voice: within) x 2 (urgent and non-urgent speaking styles: within) x 10 (signal words: within) x 3 (instructions given to listeners (2 female speakers; one male, one female speaker; no instructions: between).
Thirty (24 female and 6 male) stage 1 psychology undergraduates from the University of Plymouth participated in the study. Each participant received course credit for participating. All had normal or corrected to normal hearing. One participant was visually impaired and therefore needed their answers to be recorded by the experimenter. The age of the participants ranged from 18-48, with a mean of 22.3 years.
Pretest: Three female speakers with potentially ambiguous (male or female) voices were recorded speaking a set of ten signal words. The resultant tapes were then played to a set of twelve participants, in quiet surroundings, who were asked to state whether they believed the speaker was male or female. The most ambiguous speaker scored approximately equally (5 voted the speaker male, 7 female) so this speaker was selected as the experimental speaker in the ambiguous voice condition. A similar test of the single unambiguously female voice was carried out, with all judges confirming their belief that the voice was that of a woman.
Experiment proper: 10 warning signal words were used: Deadly, Danger, Warning, Caution, Risky, No, Hazard, Attention, Beware and Note. Two female speakers (one unambiguous female and one ambiguous female) recorded the words onto tapes. They were asked to speak each word in an urgent and a nonurgent manner. When speaking urgently they were asked to imagine that someone they loved was in immediate danger, and when speaking nonurgently they were asked to speak as if the word had cropped up in a normal conversation (as in Barzegar and Wogalter, 1998a; Hellier et al., 2000b; Weedon et al., 2000). The intensity level of the urgent utterances was approximately equalised for both of the speakers, at about 80dB(A) at the ear. Aside from this, the stimuli were allowed to vary freely, as there was little variation in their level on recording. The intensity level of the nonurgent utterances ranged over 65-70dB(A) for both speakers. The average fundamental frequency of one female speaker was around 600Hz ('female') while the other was around 400Hz ('ambiguous').
Experimental tapes: The order of the twenty words for each speaker (10 words x 2 styles) was randomised, then recorded onto audiotape in sets of four words. The ten tapes (5 for each speaker) were then presented to the participants in different orders.
Each participant was given 10 score sheets with four rating scales on each sheet, in order to rate the 40 words.
Each participant was randomly allocated to one of three instruction groups, resulting in 10 participants in each group. In the first condition the participants were told they would hear one male and one female voice, in the second condition they were told they would hear two female voices and in the last condition they were told nothing about the two voices. Participants were instructed that they would hear two voices speaking a set of words and were asked to rate the urgency of each word on a scale of one to ten giving a holistic judgement (1= not at all urgent, and 10 = extremely urgent). The order of the tapes was counterbalanced across listeners. When all 40 words had been heard, each participant was debriefed and thanked for their participation. All stimuli were presented in quiet conditions.
| Results|| |
Two 3-way analyses of variance were carried out. The first was collapsed across signal word and the second of which was collapsed across the instructions given to participants. The first 3way mixed analysis of variance (urgent/nonurgent x female/ambiguous speaker x female/mf/no instructions) showed a large significant effect for urgency (F (1,27) = 236.5, p < .001). Overall means were 6.07 for urgent utterances and 3.2 for nonurgent utterances. There was also a significant main effect for speaker (F (1,27) = 221.3, p < .001). The female speaker produced a higher mean urgency score (5.9) than the ambiguous speaker (3.4). There was no effect for instructions (F (1,27) = 1.5, p = .240). An interaction between urgency and speaker was found (F (1,27) = 26.9, p < .001). This is shown in [Figure - 1]. All permissible comparisons were significantly different from one another ( urgent female vs urgent ambiguous t = 12.6, p < .001; nonurgent female vs nonurgent ambiguous t = 13.4, p < .001; female urgent vs female nonurgent t = 14.4, p < .001; ambiguous urgent vs ambiguous nonurgent t = 12.9, p < .001). There were no other significant interactions.
As there were no effects for instructions, this variable was collapsed across for the second 3way ANOVA (urgent/nonurgent x female/ambiguous x deadly/danger/warning/caution/riksy/no/hazard/ attention/beware/note). As well as the main effects for urgency and speaker as before, this analysis also showed a significant effect for word (F (1,27) = 20, p < .001), and all possible interactions. As was the case for the main effects for urgency and speaker during the first analysis, this analysis also showed interactions between urgency and speaker, between urgency and word (F (9, 21) = 5, p < .001), speaker and word (F (9,21) = 4.7, p < .001) and a 3-way interaction between urgency, speaker, and word (F (9, 21) = 3.2, p < .001). This interaction is shown in [Figure - 2]. These effects account for little of the total variance, however.
| Discussion|| |
The results clearly show that nearly all the variance found in the study is due to the way the words are spoken (the urgent vs nonurgent comparison) and the differences between the two voices (female vs ambiguous). Although there are some other effects, they are small by comparison. Most notably, although participants were told that the voices were either both female, or one male and one female, or were given no instructions, and that it was previously determined that the ambiguous voice was truly ambiguous, no differences were found between the three groups. Therefore it is safe to conclude that the differences found between the speakers are due to the physical differences between the voices and not assumptions that the listener brings to bear in the light of the knowledge that the speaker is either male or female. The key acoustic difference between the speakers was the difference in pitch in their voices, and some differences in voicing. This difference is enough to produce very large differences in urgency ratings between the speakers. As striking, if not more so, is the difference between the urgent and the nonurgent utterances with urgent utterances being rated much more urgent than nonurgent utterances, for both speakers.
With respect to the interactions, a pattern similar to earlier studies was found whereby the more urgent voice overall (the female voice) produces a greater range of urgency ratings across the two voicing conditions. This has been replicated many times (Barzegar and Wogalter 1998a, 1998b, 2000, Hellier et al, 2002; Edworthy et al, in press (a)) using female and male voices. As to be expected, these other findings are more extreme in that they show the female voice to be more urgent in the urgent condition, but the male voice to be more urgent in the nonurgent condition; our results here show an advantage for the female speaker in both urgent and nonurgent conditions, with the differences being smaller in the nonurgent condition. The 3-way interaction [Figure - 2] shows just how influential acoustics are in these judgements, as it shows that responses to the nonurgent female voice are very similar to responses to the urgent ambiguous voice. Thus the female voice is much more effective in creating urgent utterances than the ambiguous voice. Here, as in other studies, the evidence points to urgency judgements being based entirely on acoustic parameters, and on pitch in particular.
As a methodological issue, it needs to be noted that studies in this area have typically used very few speakers. However, Barzegar and Wogalter (1998a, 1998b, 2000) used three speakers of each sex and found the same pattern of results within speaker sex, and Hollander and Wogalter (2000) took into account both the sex of the speaker and their individual voices and found no main effect for individual speaker. Moreover, if the effects are largely acoustic, then the number of speakers used does not matter, as subjective judgments will follow acoustic changes in a systematic way. The more speakers vary from one another, the more urgency judgements based on those voices will differ. In exactly the same way, differences in judgements between urgent and nonurgent utterances will vary more if the utterances are acoustically more different from one another. Hellier et al (2002) have already shown that physical differences in acoustic parameters can produce specific and systematic effects on urgency. Thus urgency judgements appear to be based on an acoustic continuum and are unaffected by secondary aspects such as knowledge of the sex of the speaker, and any implications that knowledge might have.
The effects for the individual words are also of some interest. A small but significant effect was found for signal word, and this effect largely replicates earlier findings both for spoken words and for written versions of those same words (e.g. Leonard et al, 1989; Wogalter and Silver, 1990, 1995; Laughery et al, 1993; Braun and Silver, 1995; Wogalter et al, 1998; Hellier et al, 2000b; Edworthy et al, in press (a)). Words such as 'Deadly' and 'Danger' produce higher urgency ratings than words such as 'Attention' and 'Note'. Whilst some of these effects will be due to the particular way in which individual words were spoken (as evidenced by the 3-way speaker x urgency x word interaction), there is plenty of evidence to suggest that part of the differences between the words is due to the semantic strength of the words themselves. For example, the word 'Deadly' has a more serious implication than the word 'Note' or 'Attention' and would therefore be expected to produce higher ratings of urgency just on the meaning of the word alone, independently of the way in which it is spoken. Research on written signal words shows that when font size, colour and other variables are controlled some words are still rated as more arousing or urgent than others. Similar work on spoken versions of the words show that when the acoustic presentation of the words is controlled as far as is possible, using synthetic voices (Edworthy et al (in press (a)), Experiment 2) then these differences remain, and in fact become more clearly delineated. For visual presentations of words, the word 'Lethal' is rated more highly than is found for auditorily presented words, which can be attributed to the softness of its articulation. A few other differences such as this exist, though none as obvious as 'lethal'. The interaction between semantics, acoustics and phonetics is one which warrants further investigation.
Practical relevance to speech warnings and information systems
Although our studies were conducted in quiet surroundings, they address an important issue concerning the relative importance of acoustic versus other factors (in the case, knowledge of the sex of the speaker). As acoustics seem to dominate judgements, we argue that in noise, it should be the noise spectrum above all else which determines the type of voice used in any warnings or communications system.
Intelligibility of signals in noise will always be a problem, and particularly so for speech because of its complexity and unevenness. However, there are a variety of ways in speech signals can be enhanced and background noise can be reduced. Signals can be enhanced relative to noise, overall noise levels can be reduced, and the signals themselves can be enhanced in a variety of ways so that they are more readily understood even at fixed signal-to-noise ratios. Aside from the physical boosting of signals, controlling the predictability of a set of speech messages can improve intelligibility without the need for physical enhancement.
In complex noise environments, there is no a priori case for selecting either a male or a female voice; the choice of the sex of the speaker is an irrelevance. What matters is the overlap of the noise spectrum and the speech spectrum. In some cases this will favour a female voice, in others it will favour a male voice and should be decided on a case-by-case basis. It should not matter even whether the voice can be correctly labelled as male or female, as there appears to be no real evidence of performance advantages of either male or female voices resulting from extra-acoustic variables. Thus when selecting a voice for use, the decision should be based on the noise spectrum and the voice spectrum. The voice spectrum can be equalised, enhanced, altered in several different ways, filtered, and so on, and checks done on its resultant intelligibility. If more than one voice is used, they should be significantly different from one another. The more obvious choice, if the noise spectrum allows it, is to use one female and one male voice (or one high and one low pitched voice).
If the noise spectrum is even, or moderate, then we suggest that there is some advantage in selecting a female voice. If adequate acoustic control is placed over the stimuli (such as ensuring that loudness levels are equal), there is little evidence to show that either female or male voices work better in such systems. Our data shows that in the perception of urgency, the female voice has a distinct advantage in that it can cover a larger range of urgencies, seemingly a result of the higher pitch and the greater available pitch range of the female voice. This might be of practical importance for a number of reasons. First of all, in any speech warning system it is likely that different situations will have different priorities. If the voice can convey these different priorities through differences in urgency (either through a fixed or adaptive warning system) then this will be a design advantage. A set of warnings which is matched in urgency to the priority of the situations they are signalling produced better performance than a set of warnings inappropriately matched to the same set of tasks (Edworthy et al., 2000). Urgency mapping appears to be an advantage also for speech warnings.
Another more obviously acoustic reason for favouring a female voice over a male is that, if some emotional connotation is lost through artificial boosting and equalising of signals (Sorkin and Kantowitz, 1987), then female voices might retain more emotion (urgency specifically here) if they are initially imbued with a greater range of urgency. Also, if we assume that pitch is the primary factor in determining urgency, and that it is probably easier to enhance the level of a speech message than it is to alter its pitch, this too favours use of female voices. Male voices are likely to be louder then female voices, but female voices can be increased in loudness. It may not be as easy to increase the pitch or pitch range of a male voice.
As far as task performance is concerned, there is probably more to be gained by focusing on the nature of the messages conveyed rather than on the male versus female question. Simple manipulations such as the use of appropriate signal words can convey appropriate levels of importance to the listener, and expressing risks in particular ways such as increasing explicitness (Laughery et al, 1993) and use of the personal pronouns (Edworthy et al., in press, b) can directly influence compliance to warnings and messages. These kinds of enhancements need to be implemented at an early stage of the design process, and are likely to produce significant improvements in responses to speech warning systems.
| References|| |
|1.||Baker, M.A. and Holding, D.H. (1993) The effects of noise and speech on cognitive task performance Journal of General Psychology, 120, 339-55. |
|2.||Baker, S.R. and Stephenson, D. (2000) Uncertainty of outcomes as a component of active coping: Influence of predictability and feedback on heart rate reactivity and task performance. Journal of Psychophysiology, 14, 241 51. |
|3.||Ballard, J.C. Computerized assessment of sustained attention: Interactive effects of task demand, noise, and anxiety Journal of Clinical and Experimental Neuropsychology, 18, 864-82. |
|4.||Barzegar, R. S. and Wogalter, M.S. (2000). Intended carefulness ratings for voiced warning statements. Proceedings of the IEA 2000/HFES 2000 Congress, 3, 686-9. Human Factors and Ergonomics Society: Santa Monica, CA. |
|5.||Barzegar, R and Wogalter, M S (1998a) Effects of auditorily-presented warning signal words on intended carefulness. In M A Hanson (Ed) Contemporary Ergonomics 1998, 311-5. London: Taylor and Francis. |
|6.||Barzegar, R. S., and Wogalter, M. S. (1998b). Effects of Auditorily-presented Warning Signal Words on Intended Carefulness. Proceedings of Human Factors and Ergonomics Society, 42, 1068-1072. |
|7.||Braun, C.C. and Silver, N.S. (1995) Interaction of signal word and colour on warning labels: Differences in perceived hazard and behavioural compliance. Ergonomics, 38, 2207-20. |
|8.||Burt, J. L., Bartolome, D. S., Burdette, D. W., and Comstock jr, J. R. (1995). A Psychophysical Evaluation of the Perceived Urgency of Auditory Warning Signals. Ergonomics, 38 (11), 2327-2340. |
|9.||Carter, N.L. (1996) Transportation noise, sleep, and possible after-effects Environment International, 22, 10516. |
|10.||Cao, Y C, Sridharan, S and Moody, M (1996) Simulation of cocktail party effect with neural network controlled iterative wiener filter. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, E79A, 944-6. |
|11.||Edworthy, J, Hellier, E J, Walters, K, Clift-Matthews, W, and Crowther, M (in press, a) Acoustic, semantic and phonetic influences in spoken warnings signal words. To appear in Applied Cognitive Psychology |
|12.||Edworthy, J., Hellier, E., Lambell, N., Grey, C., Aldrich, K. and Lee, A. (in press, b). Linguistic and location effects in compliance to pesticide warning labels. To appear in Human Factors. |
|13.||Edworthy, J, Hellier, E., Walters, K, and Weedon, B. (2000) Comparing Speech and Non Speech Warnings. Proceedings of the IEA 2000/HFES 2000 Congress, 3, 746-9. Human Factors and Ergonomics Society: Santa Monica, CA. |
|14.||Edworthy, J., Loxley, S., and Dennis, I. (1991). Improving Auditory Warning Design: Relationship between Warning Sound Parameters and Perceived Urgency. Human Factors, 33 (2), 205-231. |
|15.||Gelfer, M P and Young, S R 1997) Comparison of intensity measures and their stability in male and female speakers. Journal of Voice, 11(2), 178-86. |
|16.||Guillaume, A, Drake, C, and Pellieux, L (2002) Perception of urgency and auditory warning signals. Paper presented to 'Design Sonore' (Societe francaise d'acoustique), Paris, France, March 2002. |
|17.||Haas, E. C. and Edworthy, J. (1996). Designing urgency into auditory warnings using pitch, speed and loudness. Computing and Control Engineering Journal, 7(4), 193198. |
|18.||Haas, E. C., and Casali, J. G. (1995). Perceived Urgency of and Response time to Multi-tone and Frequencymodulated Warning Signals in Broadband Noise. Ergonomics, 38 (11), 2313-2326. |
|19.||Hanson, E.K.S., Schellekens, J.M.H., Veldman, J.B.P., and Mulder, L.J.M. (1993) Psychomotor and cardiovascular consequences of mental effort and noise Human Movement Science, 12, 607-26. |
|20.||Hellier, E., Edworthy, J Weedon, B., Walters, K. and Adams, A. (2002). The Perceived Urgency of Speech Warnings 1: Semantics vs Acoustics. Human Factors, 44(1), 1-17. |
|21.||Hellier, E. J., Wright, D. B., Edworthy, J. and Newstead, S. (2000a). On the stability of the arousal strength of warning signal words. Applied Cognitive Psychology, 14, 577-592. |
|22.||Hellier, E J, Weedon, B, Edworthy, J and Walters, K (2000b) Using psychophysics to design speech warnings. Proceedings of the IEA 2000/HFES 2000 Congress, 3, 698-701. San Diego, California: Human Factors and Ergonomics Society. |
|23.||Hellier, E.J., Edworthy, J. and Dennis, I. (1993). Improving auditory warning design: Quantifying and predicting the effects of different warning parameters on perceived urgency. Human Factors, 35,(4), 693-706. |
|24.||Hollander, T.D. and Wogalter, M.S. (2000). Connoted hazard of voiced warning signal words: an examination of auditory components. Proceedings of the IEA 2000/HFES 2000 Congress, 3, 702-5. Human Factors and Ergonomics Society: Santa Monica, CA |
|25.||Jones, D M and Broadbent, D E (1987) Noise. In G Salvendy (Ed.) Handbook of Human Factors. New York: John Wiley and Sons. |
|26.||Kurisu, K and Fukuyama, K (1996) Controlling public address systems based on fuzzy inferences and neural network Neurocomputing, 13(2-4), 231-45. |
|27.||Laughery, K R, Vaubel, K P, Young, S L, Brelsford, J W and Rowe, A L (1993) Explicitness of consequence information in warnings. Safety Science, 16(5/6), 597-614. |
|28.||Leonard, S.D., Hill, G.W. IV, and Karnes E.W. (1989) Risk perception and use of warnings. In Proceedings of the 33rd Annual Meeting of the Human Factors Society, 550-4. Santa Monica: Human Factors Society. |
|29.||McMinn, M.R., Brooks, S.D., Triplett, M.A., Hoffman, W.E. and Huizinga, P.G. (1993) The effects of God language on perceived attributes of God. Journal of Psychology and Theology, 21, 309-21 |
|30.||Mendoza, E, Valencia, N and Munoz, J (1996) Differences in voice quality between men and women: Use of the longterm average spectrum (LTAS). Journal of Voice, 10(1), 59-66. |
|31.||Momtahan, K L (1990) 'Mapping of psychoacoustic parameters to the perceived urgency of auditory signals'. Unpublished master's thesis, Carleton University, Ottawa, Ontario, Canada. |
|32.||Mullenix, J W, Johnson, K A, TopcuDurgun, M, and Farnsworth, L M (1995) The perceptual representation of voice gender. Journal of the Acoustical Society of America, 98(6), 3080-95. |
|33.||Murray, I.R. and Arnott, J.L. (1995). Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Communication, 16, 369-90. |
|34.||Nixon, C, Anderson, T, Morris, L, McCavitt, A, McKinley, R, Yeager, D and McDaniel, M (1998) Female voice communications in high level aircraft cockpit noises - Part II: Vocoder and automatic speech recognition systems. Aviation, Space and Environmental Medicine, 69(11), 1087-94. |
|35.||Nixon, C, Anderson, T, Morris, L, McCavitt, A, McKinley, R, Yeager, D and McDaniel, M (1996) Female voice communications in high level aircraft cockpit noises - Part 1: Spectra, levels, and microphones. Speech and Hearing Research, 39(6), 1159-70. |
|36.||Patterson, R D (1976) Auditory filter shape derived with noise stimuli. Journal of the Acoustical Society of America, 59, 640-54. |
|37.||Patterson, R D (1974) Auditory filter shape. Journal of the Acoustical Society of America, 55, 802-9. |
|38.||Scherer, K. R. (1986). Vocal Affect Expression: A Review and a Model for Future Research. Psychological Bulletin, 99 (2), 143-165. |
|39.||Sorkin, R D and Kantowitz, B H Speech communication. In G Salvendy (Ed.) Handbook of Human Factors. New York: John Wiley and Sons. |
|40.||Weedon, B., Hellier, E., Edworthy, J. and Walters, K. (2000) Perceived Urgency in Speech Warnings. Proceedings of the IEA 2000/HFES 2000 Congress, 3, 690-3. Human Factors and Ergonomics Society: Santa Monica, CA. |
|41.||Weinstein, S.E., Quigley, K.S., and Mordkoff, J.T. (2002) Influence of control and physical effort on cardiovascular reactivity to a video game task. Psychophysiology, 39, 591-8. |
|42.||Whipple, T W and Mcmanamon, M K (2002) Implications of using male and female voices in commercials: An exploratory study. Journal of Advertising, 31(2), 79-91. |
|43.||Wilding, J and Cook, S (2000) Sex differences and individual consistency in voice identification. Perceptual and Motor Skills, 91(21, 535-8. |
|44.||Wogalter, M.S., Kalsher, M.J., Frederick, L. J., Magurno, A.B., and Brewster, B.M. (1998) Hazard level perceptions of warning components and configurations International Journal of Cognitive Ergonomics, 2(1), 123-43. |
|45.||Wogalter, M. S., and Silver, N. C. (1995). Warning Signal Words: Connoted Strength and Understandability by Children, Elders, and Non-native English Speakers. Ergonomics, 38 (11), 2188-2206. |
|46.||Wogalter, M.S. and Silver, N.C. (1990) Arousal strength of signal words Forensic Reports, 3, 407-20. |
|47.||Wu, K and Childers, D G (1991) Gender recognition from speech 1: Coarse analysis. Journal of the Acoustical Society of America, 90(4), 1828-40. |
|48.||Yoon, S and Yoo, C D (2002) Speech enhancement based on speech/noise-dominant decision. IEICE Transactions on Information and Systems, E85D(4), 744-50. |
Department of Psychology, University of Plymouth, Drake Circus, Plymouth, Devon, PL4 8AA
Source of Support: None, Conflict of Interest: None
[Figure - 1], [Figure - 2]
|This article has been cited by|
||What is the appropriate speech rate for a communication robot?
| ||Kanda, T. and Shimada, M. |
| ||Interaction Studies. 2012; 13(3): 406-433 |
||Effects of talker sex and voice style of verbal cockpit warnings on performance
| ||Arrabito, G.R. |
| ||Human Factors. 2009; 51(1): 3-20 |
||How accurate must an in-car information system be? Consequences of accurate and inaccurate information in cars
| ||Jonsson, I.-M., Harris, H., Nass, C. |
| ||Conference on Human Factors in Computing Systems - Proceedings. 2008; : 1665-1674 |
||Selecting one of two regular sound sequences: Perceptual and motor effects of tempo
| ||Rivenez, M., Drake, C., Brochard, R., Guillaume, A. |
| ||Perceptual and Motor Skills. 2008; 106(1): 171-187 |