| Article Access Statistics|
| Viewed||378 |
| Printed||6 |
| Emailed||0 |
| PDF Downloaded||2 |
| Comments ||[Add] |
|Year : 2023
: 25 | Issue : 117 | Page
|Development of an Arabic “Command in Noise” Hearing Test to Assess Fitness for Duty
Iman Rawas1, Daniel Rowan2, Hannah Semeraro2, Stefan Bleeck2, Afaf Bamanie3
1 Institute of Sound and Vibration Research, University of Southampton, Southampton; Department of Otorhinolaryngology–Head and Neck Surgery, King Abdul-Aziz University, Jeddah, Saudi Arabia, UK
2 Institute of Sound and Vibration Research, University of Southampton, Southampton, UK
3 Department of Otorhinolaryngology–Head and Neck Surgery, King Abdul-Aziz University, Jeddah, Saudi Arabia
Click here for correspondence address
|Date of Submission||28-Nov-2022|
|Date of Decision||13-Jan-2023|
|Date of Acceptance||31-Mar-2023|
|Date of Web Publication||10-May-2023|
Objective: The goal is to implement the developed speech material in a hearing test to assess auditory fitness for duty (AFFD), specifically in areas where the intelligibility of spoken commands is essential. Design: In study 1, a speech corpus with equal intelligibility was constructed using constant stimuli to test each target word’s psychometric functions. Study 2 used an adaptive interleaving procedure to maximize equalized terms. Study 3 used Monte Carlo simulations to determine speech test accuracy. Study sample: Study 1 (n = 24) and study 2 (n = 20) were completed by civilians with normal hearing. Study 3 ran 10,000 simulations per condition across various conditions varying in slopes and speech recognition thresholds (SRTs). Results: Studies 1 and 2 produced three 8-word wordlists. The mean, standard deviation in dB SNR is −13.1 1.2 for wordlist 1, −13.7 1.6 for wordlist 2, and −13.7 1.3 for wordlist 3, with word SRTs within 3.4 dB SNR. Study 3 revealed that a 6 dB SNR range is appropriate for equally understandable speech using a closed-set adaptive technique. Conclusion: The developed speech corpus may be used in an AFFD measure. Concerning the homogeneity of the speech in noise test material, care should be taken when generalizing and using ranges and standard deviations from multiple tests.
Keywords: Arabic language, military, psychoacoustics, speech, noise hearing tests
|How to cite this article:|
Rawas I, Rowan D, Semeraro H, Bleeck S, Bamanie A. Development of an Arabic “Command in Noise” Hearing Test to Assess Fitness for Duty. Noise Health 2023;25:104-12
| Introduction|| |
Military environments are often acoustically challenging. The auditory environment, in most cases, is noisy, and employee roles commonly involve speech communication in background noise., Auditory fitness for duty (AFFD) was defined by Tufts et al. as “the possession of hearing abilities for safe and effective job performance.” Therefore, military institutions must ensure that recruits have sufficient hearing to be fit for their respective duties. The current method of assessment is mainly pure tone audiometry (PTA). Although it is a standard diagnostic test for hearing loss in quiet conditions, it is known to be less correlated with performance on more complex listening tasks, such as hearing speech in noise (SIN).,, Directly measuring SIN performance is arguably a more accurate predictor of AFFD, and inclusion of this type of testing has been recommended and begun to be implemented in AFFD assessment standards.,, For Arabic institutions such as the military, a critical roadblock in using SIN testing for AFFD assessment is the lack of suitable and validated Arabic speech test material. Therefore, this study focuses on developing and optimizing Arabic military-relevant speech materials to be implemented in an AFFD measure for military use. Moreover, valid speech test material must meet the criteria of homogeneity, meaning that all components of the speech corpus are equal and consistent in audibility under different background noise levels. To ensure homogenous audibility of a speech corpus, the intelligibility of the words is equalized to a target criterion, in most cases, the mean SRT of the test material. Since this test is designed for humans, inter-individual variation is considered, and a range within which results are considered equal in intelligibility in a sample is generated from the study results. Studies report validated equalized results,,,, but no standard exists for the amount of acceptable variation in the homogenous speech test material. However, despite an increase in the number of SIN tests available in multiple languages, there still needs to be a consensus on “how close is close enough” regarding the range of speech recognition thresholds (SRTs) for words within a speech corpus.
Therefore, this paper will discuss three experiments. Experiment I involved developing and equalizing speech material using a method of constant stimuli (MoCS), experiment II, assessed and optimized the equalized material using an interleaved adaptive procedure (IAP); and experiment III used Monte Carlo simulations (MCSs) to explore the acceptable amount of variation in a homogenous speech set.
| Materials and methods|| |
Development of speech material
The first step was to develop a set of target words having characteristics suitable for implementation in an AFFD test, the main characteristic being the similar intelligibility of the words in speech-shaped noise. This entailed the material selection, recording, and preparation of the test format. Because a sentence test was preferable to a word test, given its better simulation of real-world communication.
The test’s general structure was adapted from the Coordinate response measure (CRM) test due to its close representation of military communication, which is required in many occupational settings, especially military and emergency services. All the resulting sentences were syntactically but not necessarily semantically correct. For example, the call signs in the Saudi Arabian military are in Arabic letters or names. The test format chosen was "From (letter) to (codename) go (directive command) now." The words “from,” “to,” “go,” and “now” in the test sentence format are carrier phrases preceding the scored target words. The letters of the Arabic alphabet are divided into CV letters such as “taa” and CVC letters such as “meem.” CV letters were not included as they are phonemic in nature, and the unnecessary difficulty of phonemes does not serve the purpose of the test. Initially, nine CVC letters were selected from the Arabic alphabet. Based on mapping letter occurrence frequency, the most frequently occurring letters were included, except for the most frequent letter of the Arabic alphabet (alif), as it is disyllabic. Communication in the military takes the form of instructions vocalized with urgency and command. As such, the recordings were required to have the same quality. The words used in study 1 are listed in [Table 1]. Although phonetic balance is not crucial to valid speech material, it is favorable. The phonetically balanced speech material contains all speech sounds encountered in conversation of the test’s language. In this case, the emphasis was on the speech sounds in the conversational language used in the military. To assure the phonetic coverage of the wordlists, phrases and words used by the Royal Saudi Air Defence Forces (RSADF) during training were compiled. Phonetic analysis was done on both the compilation and the wordlists to ensure the developed material encompassed all the phonetic sounds used in military speech.
Analysis showed the developed wordlists cover the range of phonetic sounds used commonly and critically by the RSADF.
Recordings were performed using Adobe Audition, a babyface RME soundcard, and a Rode M5 microphone in a slight foam-lined sound attenuated chamber at the University of Southampton. The recordings were then chopped into individual words and adjusted to have the same root mean square (RMS) amplitude.
Equalization of the speech material
In this step, the test material was optimized by equalizing the intelligibility of the test based on the SRT (Experiment I). This was performed by testing the words and adjusting the RMS amplitude to match the word-group mean. There is no agreed-upon sample size calculation technique for this context. A total of 24 participants were recruited for study, similar to previous studies where 20 to 50 participants were recruited.,,,,, Participants were screened using an otological health questionnaire and PTA (15 dB HL at 0.25–8 kHz) recommended by the British Society of Audiology. The MoCS procedure was used for testing. This repeatedly used the same set of defined stimuli. The end represents an almost undetectable stimulus, whereas the top is an almost always detectable stimulus. The remaining stimuli were spread out to plot the psychometric functions (PFs) of performance.,
The experiment, approved by the University of Southampton (ERGO ref: 45925) and King Abdul-Aziz University (KAUH Ethics ref. 754-19) consisted of one session in which closed-word lists were presented binaurally through circumaural headphones at six selected signal-to-noise ratios (SNRs) ranging from −5 to −18 dB SNR in a background of stationary speech-spectrum noise (SSSN) generated by digitally filtering gaussian noise to have the same long-term average speech spectrum (LTASS) of the recorded words, concatenated together. Stationary noise has fewer fluctuations in noise level, making it better suited for this developmental phase., Resulting data from this psychophysical method, MoCS, were expressed as a psychometric function, explained in detail below. The data were evaluated for goodness of fit through bootstrapping, threshold similarity, and steepness of slopes.
Assessment of intelligibility of the equalized material using an IAP
Based on the results of the word assessment in experiment I, some words were removed and the required level adjustments were applied. This was done by adjusting the RMS amplitude of each word accordingly to match the mean true SRT across the three wordlists.
The MoCS is subjected to floor and ceiling effects and its inability to provide the exact value of each participant’s SRT on the curve, as many observations occur at regions distant from the threshold. It is also better suited for larger sample sizes. To adequately assess the equalized words and overcome the shortcomings of MoCS, an interleaved adaptive procedure (IAP) was used. In adaptive procedures, responses in each trial depend on previous responses, allowing the test procedure to become faster, eliminating floor and ceiling effects, and focusing on the dynamic range of the psychometric function (PF) illustrated in [Figure 1]. One of the shortcomings of the regular adaptive procedure is that it would produce one SRT for the word set without information regarding each word individually. This experiment aims to determine the individual SRT’s of the words in each wordlist. Performing the adaptive procedure for each word would enable participants to anticipate the difficulty and nature of the stimuli, which would systemically bias the procedure. To overcome this, the test was modified to present all the word tracks for each wordlist interleaved together non-sequentially, to randomize the stimuli, overcoming the effect of familiarization within the test material, thus minimizing bias and reducing predictability. Each wordlist was tested separately, resulting in a SRT for each word in each wordlist.
|Figure 1 Illustrates the representative points of a psychometric function adapted from (Strasburger, 2001).|
Click here to view
The accuracy and precision of the developed material was also investigated by the interleaved adaptive procedure. The difference between the individual-determined accuracy and population SRT means for each word. This is expressed as the standard error (SE). Precision was expressed as the equivalent standard deviation.
Often used adaptive staircase procedures come with the need to define essential parameters, which could be more straightforward. Often, it needs to be recognized what different choices can make to the result: the starting value, step size, and the termination rule.
The starting value of the test is the value at which the test begins. The optimum starting value is close to the threshold, but in many cases, the exact value of the threshold is not reported, but can lead to different results. Based on previous experiments’ results and the known PFs of all words, a starting value of −6 dB SNR was chosen.
The size of the steps in the staircase method affects the speed and efficiency of convergence to the final SRT. The larger the step size, the quicker it will converge to the SRT, and vice versa. For this test, the chosen step sizes were 8 dB for the first, 4 dB for the second, and 2 dB for the remaining reversals, in line with previous experiments. The larger beginning step sizes allow for more accessible choices at the beginning to ease the experiment and quicker convergence to the area of the estimated threshold.
The step size rule guides the change in reversals. A reversal occurs when the stimulus value changes direction from the value directly preceding it. Increasing reversals increase the test’s precision and ensures sufficient data collection for threshold estimation but lengthens the test procedure. A 1 down/1 up staircase method described below was used in this study to simplify the test. The stopping rule determines the terminating point of the test procedure.
Standard practice in psychophysical testing is to discard the first few reversals to eliminate the effect of increased variability at the beginning of the experiment from the range-finding reversals with typically higher variability. To balance sufficient data collection and reasonable procedure length, six reversals for each test repeat were chosen, discarding the first two reversals, resulting in a total of four scored reversals. Each wordlist was repeated twice, and repeats were averaged, resulting in eight scored reversals across repeats. This number was chosen based on similar tests., All experiments were performed in one session. The University of Southampton and King Abdul-Aziz University approved this experimental method (ERGO ref: 53345). The procedure was set up to calculate mean scored reversals (MSRs) for each wordlist and terminate after the required scored MSRs. The MSR is a simple method developed to estimate the SRT by averaging the high and low tips of all runs. The session consisted of six blocks; each word list was repeated twice, one list per block. Each block required approximately 10 to 15 minutes to complete. All experiments were designed to test if the mean word SRT is constant across repeats and words.
Exploring the acceptable amount of variation in the SRTs of a homogenous speech set
To investigate the acceptable amount of variation in the homogenous speech test material and to evaluate the repeatability of the test in terms of average SRT, MCSs were conducted by manipulating the parameters of slope and the ranges across which true SRTs are distributed to generate conditions ranging from very low to high amounts of potential variation. A MCS is a computerized, mathematical, and analytical technique that generates random data from a population, fits a predefined mathematical model to it, and obtains the necessary information from the model, repeating this process multiple times to determine the properties and most likely outcomes of the test. MCSs are frequently used in psychophysical research to assess the accuracy, efficiency, and overall performance of test procedures, and to evaluate the goodness of fit. In this study, the model generates the data from the PF, predetermined by the slope and location range, which vary across simulations, while all other parameters are fixed. The following parameters were considered:
The slope of a PF [Figure 1] represents the rate of change in performance in response to a change in stimulus level intensity. Steeper slopes yield higher sensitivity to more minor changes in stimulus intensity. The slope values of 0.25, 0.5, and 1 were explored to represent shallow, medium, and steep slopes, respectively. This corresponds to slopes of 6.25%, 12.5%, and 25%/dB SNR, respectively.
The location (“true SRT”) is an independent variable representing the steepest point in the slope. Its value does not affect the simulation in isolation but is significant relative to the choices of starting value.
This is the range within which the words’ locations are spread. For example, if the range is 3 dB SNR, the word locations are distributed within a maximum of 3 dB of each other.
Step size rule and SRT
The achieved SRT percent correct is estimated according to the step size rule. For example, a step size rule of 1 down/1 up corresponds to an SRT of 50%. This means the probability of a step up and a step down are equal.
The simulations were grouped into:
- Eighteen hypothetical conditions, not applicable to real-world testing, to assess the effect of SRT variation on homogeneity.
- Sixteen real-world conditions, utilizing given speech material development parameters to assess the performance of the adaptive procedure.
For each condition, 10 000 runs were performed using a MATLAB function. Considering the parameters that affect test performance variation to isolate the effects of location range and slope as much as possible, the following parameters were chosen for the simulation of the hypothetical conditions (experiment IIIa):
Three slope values are 0.25, 0.5, and 1. Each slope was tested at the SRT ranges 0, 3, 5, 7, 10, and 20 spread evenly. The starting value was −6 dB SNR. Fifty reversals were required to complete the test, with eight scored reversals. A 1 down/1 up adaptive procedure was run with the following step sizes; 4, 2, 1, and a guess rate of 0.0014 as, three words were required to be guessed correctly for a correct response.
For the real-world conditions (experiment IIIb), the following parameters were specified:
Two slope values, 0.5 and 1. The poor slope was not included, as the slopes of our developed material were steep. Each slope was tested at location ranges 0, 1, 2, 3, 4, 5, 6, and 7, spread randomly to simulate human performance better. The starting value was −8 dB SNR. Twelve reversals were required to complete the test, with eight scored reversals. A truncated adaptive procedure was run with the following step sizes; 4, 2, 2, and a guess rate of 0.0014.
| Results|| |
Experiment I (pre-equalization and equalization)
Data were gathered from 24 normal-hearing participants. List 1, “letters,” exhibited the highest variation in SRT. The words had a mean location of −12.5 dB SNR ranging across ±3.7 dB SNR. List 2, “codenames,” ranged within ±2.5 dB SNR with a mean location of −12.6 dB SNR. List 3, “directions,” ranged within ±4.2 dB SNR, with a mean of −12.9 dB SNR.
The slopes for list 1 were all steep except for two words. List 2 exhibited steep slopes except for one (8%/dB). The slopes for list 3 were all steep. All words exhibited a perfect fit except for two from the directions list (pDev = 0.8), a value still considered acceptable. Standard errors for locations of all the lists were all <2 dB SNR except for one word in each list. Standard errors for the slopes were also all <5 except for one word in list 3. Equalization was done based on the mean SRT across lists and equalized PFs were obtained [Figure 2].
Experiment II (post-equalization with an IAP)
Data were collected from 20 normal hearing subjects. Speech recognition thresholds were obtained by calculating MSRs. The mean SRT of the words was averaged across participants to find the mean SRT for each word and compare the words’ mean SRTs in terms of variability for each wordlist. Results are reported in mean SRT ± SD. Words ranged within ±1.8 for list 3 to ±2.7 dB SNR for lists 2 and 3. Wordlist 1 had an SRT of −13.3 ± 1.6 dB SNR. Wordlist 2 had an SRT of −3.2 ± 2.4 dB with a notably large SD for “Saham” = 6.5 dB SNR, as seen in [Figure 3], and the largest SE across wordlists SE = 1.3. Wordlist 3 had an SRT of −13.1 ± 1.6 dB SNR with a notably large SD for “khutwa” = 4.2 dB SNR and SE = 0.9.
A two-way repeated measures ANOVA was run to determine the effect of different words over time on test performance (measured by SRT). This was done to ensure the absence of an interaction between variation in word SRTs and repeats of the test. Each wordlist was analyzed separately. It was hypothesized that: all sample mean SRTs are equal across words averaged across test repeats; all sample mean SRTs are equal across repeats for all words; and variation in mean SRT across words is independent of test repeats, meaning that existence of variation in SRTs between words in a wordlist is not due to the training effect of repeating the test. A significant effect was found for the words and the repeats on test performance in all lists. No significant effect of repeats on words was found as illustrated in [Table 3].
Experiment III (Monte Carlo simulations)
A linear mixed model (LMM) was used to estimate the location-to-location variability. The model was fit to the data for each experiment. The model contains a random effect for a location as well as an overall mean. The model assumes the following form for yij, the ith row of the dataset, in location j.
yij = µ + bj + εij
where y is the vector of observations, µ is the intercept (grand mean), bj ∼ N (0, σb2) is the effect of location j and εij ∼ N (0, σ2) is the random error. N represents the variable’s normal distribution.
From the model, the intra-class correlation coefficient (ICC) was calculated. Intra-class correlation coefficients characterize the relationship among variables of the same class and are widely used in reliability analyses. A one-way random effect, absolute agreement, and multiple ICC measurements were chosen for this experiment. The ICC is the proportion of the variability in y that is due to location; this is
where σ2b and σ2 are the variance components estimated using the restricted maximum likelihood (REML) method, homogeneity can be assumed across locations if the ratio of location-to-location variability is small [Figure 4] [Figure 5].
|Figure 4 Boxplots depicting the mean SRTs for 10 000 runs across six location ranges in three conditions from left to right: poor slopes (0.025), medium slopes (0.5), and steep slopes (1), in experiment IIIa.|
Click here to view
|Figure 5 Boxplots depicting the mean SRTs for 10 000 runs across eight location ranges in two conditions from left to right: medium slopes (0.5) and steep slopes (1), in experiment IIIb.|
Click here to view
| Discussion|| |
This study aimed to develop and optimize speech material for use in an AFFD measure and assess the acceptable variation range in material considered homogenous.
Experiments I and II
The first aim of developing and optimizing an Arabic speech test relevant to fitness in duty was achieved. The test material was selected and tested to assess word intelligibility in noise. Based on the resulting PFs from experiment I, words were RMS equalized to the mean across the three wordlists and retested in experiment II.
The first study’s results showed a significant difference in test performance between words. This indicated that only some of the words were homogenous, meaning that the words differed in their recognition thresholds under the same noise ratio. This was due to the wide range of one word in list 2 and another in list 3, which exhibited higher ranges and SDs than the other words. Statistical analysis showed that the word “djeem” in list 1 was significantly more accessible than the rest of the words. The improvement in performance in the second repeat across lists ranging from 0.7 to 1.1 dB SNR is most likely due to the training effect, an effect documented to be present in most SIN tests which may be observed to reach a decrease of up to 2 dB SNR between the first and second measurements. The observed effect of training is comparable to previous studies.,,
The problematic words in each list were removed for the rest of the study, bringing the number of options in each word list down to eight as seen in [Table 2]. This is still an acceptable number of options in a closed-set test used previously in similar SIN tests. The 8-word lists were piloted on 15 participants using the same method as experiment II. The word lists are ready for implementation into a speech-in-noise sentence test [Table 4]. Standard deviations and range of spread of word SRTs are now comparable to previous studies with similar test materials and procedures.,
|Table 4 Means, SDs, and SRT ranges for 15 participants on the final 8-word lists|
Click here to view
The purpose of the simulations was to explore acceptable variation in homogenous speech material. This was done by evaluating the means, SDs, and ICCs of simulated 1 down/1 up adaptive procedures across varying slopes and word-location ranges. With the knowledge of their insensitivity, extreme location ranges and poor slopes were included to emphasize that looking at SDs or slopes alone could be misleading. Standard deviations did not exceed 1.7 dB SNR, even for the poorest condition, a value comparable to SDs reported in the literature on validated equalized speech material of commonly used and acknowledged tests such as the multilingual matrix test, the digit triplet test (DTT) (Smits et al.,), and the hearing in noise test HINT. The means were found to be within a range of 1 dB SNR across all slopes up to a location range of 7 dB SNR in both experiments a and b, and within 2dB SNR up to 10 DB SNR in experiment IIIa. The best predictions were found from ICCs. These were calculated to evaluate the proportion of variation in the model attributed to location variation, as opposed to the proportion explained by random effects. To meet the homogeneity requirements, the proportion attributable to location variation needed to be low, hence ICC values below 0.2 was aimed in the settings with steep slopes. This value was found at a location range of 5 dB SNR in experiment IIIa and b location range of 6 dB SNR in experiment IIIb. Other studies exploring different aims considered the effect of parameters explored in this experiment; type of adaptive procedure, slope, location, number of sentences, and scored words, and found them to interact together, with minor changes in procedure impacting performance.
| Limitations and future work|| |
The developed speech material was tested only in the background of SSSN. Reassessment of test material is required if it is to be presented in different types of noise. The MCSs were limited by the mentioned test conditions in this study. Conditions were not assessed with different adaptive procedures, additional variables, or word options. Further research with a broader scope of variation is recommended.
| Conclusion|| |
This study detailed the development and validation of a speech test for use as an AFFD assessment tool. The test is named the Arabic commands in noise test (ACINT) and is suitable to be assessed for validity as a measure of AFFD in a military population.
Monte Carlo simulations to assess the acceptable amount of variation in homogenous speech material for a closed set adaptive procedure with three variables and eight options for each variable showed an acceptable range of 6 dB SNR in homogenous speech material. Given that many papers reporting speech tests in the past have not reported the observed variation and given the degree of variation that can be assumed between words, this is a reassuringly extensive range to validate these studies post hoc.
Data Availability Statement
Datasets analyzed during the current study are available within the manuscript.
This research was approved by the University of Southampton (ERGO ref: 45925) and King Abdul-Aziz University (KAUH Ethics ref. 754-19). Informed consent were taken before distribution of questionnaire among participants.
Designing, executing, analyzing, and writing this article were the responsibilities of all contributors.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Nindl BC, Billing DC, Drain JR et al.
Perspectives on resilience for military readiness and preparedness: report of an international military physiology roundtable. J Sci Med Sport 2018;21:1116-24.
Clasing JE, Casali JG. Warfighter auditory situation awareness: effects of augmented hearing protection/enhancement devices and TCAPS for military ground combat applications. Int J Audiol 2014;53:S43-S52.
Nakashima A, Abel SM, Smith I. Communication in military environments: influence of noise, hearing protection and language proficiency. Appl Acoust 2018;131:38-44.
Tufts JB, Vasil KA, Briggs S. Auditory fitness for duty: a review. J Am Acad Audiol 2009;20:539-57.
Holmes E, Griffiths TD. ‘Normal’hearing thresholds and fundamental auditory grouping processes predict difficulties with speech-in-noise perception. Sci Rep 2019;9:1-11.
Smoorenburg GF. Speech reception in quiet and in noisy conditions by individuals with noise‐induced hearing loss in relation to their tone audiogram. J Acoust Soc Am 1992;91:421-37.
Vermiglio AJ, Soli SD, Freed DJ, Fisher LM. The relationship between high-frequency pure-tone hearing loss, hearing in noise test (HINT) thresholds, and the articulation index. J Am Acad Audiol 2012;23:779-88.
Brammer AJ, Laroche C. Noise and communication: a three-year update. Noise Health 2012; 14:281. [Full text]
Giguère C, Laroche C, Vaillancourt V, Soli SD. Development of hearing standards for Ontario’s constable selection system. Int J Audiol 2019;58:798-804.
Vaillancourt V, Laroche C, Giguere C, Beaulieu M-A, Legault J-P. Evaluation of auditory functions for Royal Canadian Mounted Police officers. J Am Acad Audiol 2011;22:313-31.
Kollmeier B, Warzybok A, Hochmuth S, Zokoll M. The multilingual matrix test: principles, applications and comparison across languages-a review. Int J Audiol 2015; 54(suppl):3-16.
Nielsen JB, Dau T. Development of a Danish speech intelligibility test. Int J Audiol 2009;48:729-41.
Smits C, Kapteyn TS, Houtgast T. Development and validation of an automatic speech-in-noise screening test by telephone. Int J Audiol 2004;43:15-28.
Soli SD, Wong LL. Assessment of speech intelligibility in noise with the Hearing in Noise Test. Int J Audiol 2008;47:356-61.
Semeraro HD, Rowan D, Van Besouw RM, Allsopp AA. Development and evaluation of the British English coordinate response measure speech-in-noise test as an occupational hearing assessment tool. Int J Audiol 2017;56:749-58.
Brierley C, Sawalha M, Heselwood B, Atwell E. A verified Arabic-IPA mapping for Arabic transcription technology, informed by Quranic recitation, traditional Arabic Linguistics, and modern phonetics. J Semitic Stud 2016;61:157-86.
Nissen SL, Harris RW, Channell RW, Conklin B, Kim M, Wong L. The development of psychometrically equivalent Cantonese speech audiometry materials. Int J Audiol 2011;50:191-201.
Vaez N, Desgualdo-Pereira L, Paglialonga A. Development of a test of suprathreshold acuity in noise in Brazilian Portuguese: a new method for hearing screening and surveillance. BioMed Res Int 2014;2014:652838.
Houben R, Koopman J, Luts H et al.
Development of a Dutch matrix sentence test to assess speech intelligibility in noise. Int J Audiol 2014;53:760-3.
Ozimek E, Kutzner D, Sęk A, Wicher A. Polish sentence tests for measuring the intelligibility of speech in interfering noise. Int J Audiol 2009;48:433-43.
Brungart DS, Sheffield BM, Kubli LR. Development of a test battery for evaluating speech perception in complex listening environments. J Acoust Soc Am 2014;136:777-90.
BSA. Recommended Procedure Pure-tone Air-Conduction and Bone-Conduction Threshold Audiometry with and without Masking; 2018.
Gescheider GA. Psychophysics: The Fundamentals. 3rd ed. New York: Psychology Press 1997.
Leek MR. Adaptive procedures in psychophysical research. Percept Psychophys 2001;63:1279-92.
Hagerman B. Attempts to develop an efficient speech test in fully modulated noise. Scand Audiol 1997;26:93-8.
Killion MC, Niquette PA, Gudmundsen GI, Revit LJ, Banerjee S. Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners. J Acoust Soc Am 2004;116:2395-405.
Treutwein B. Adaptive psychophysical procedures. Vision Res 1995;35:2503-22.
Brand T, Kollmeier B. Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests. J Acoust Soc Am 2002;111:2801-10.
Semeraro H. Developing a Measure of Auditory Fitness for Duty for Military Personnel. University of Southampton; 2015.
Gelfand SA. Hearing: An Introduction to Psychological and Physiological Acoustics. 5th ed. UK; 2004.
Koo T, Li M. Cracking the code: providing insight into the fundamentals of research and evidence-based practice a guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropractic Med 2016;15:155-63.
Theodoridis GC, Schoeny ZG. Procedure learning effects in speech perception tests. Audiology 1990;29:228-39.
Willberg T, Sivonen V, Hurme S, Aarnisalo AA, Löppönen H, Dietz A. The long-term learning effect related to the repeated use of the Finnish matrix sentence test and the Finnish digit triplet test. Int J Audiol 2020;59:753-62.
Yund EW, Woods DL. Content and procedural learning in repeated sentence tests of speech perception. Ear Hear 2010;31:769-78.
Vlaming MS, MacKinnon RC, Jansen M, Moore DR. Automated screening for high-frequency hearing loss. Ear Hear 2014;35:667.
Pedersen ER, Juhl PM. Simulated critical differences for speech reception thresholds. J Speech Lang Hear Res 2017;60:238-50.
Institute of Sound and Vibration Research, University of Southampton, Southampton
Source of Support: None, Conflict of Interest: None
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]
[Table 1], [Table 2], [Table 3], [Table 4]