Manual pure tone audiometry is considered to be the gold standard for the assessment of hearing thresholds and has been in consistent use for a long period of time. An increased legislative requirement to monitor and screen workers, and an increasing amount of legislation relating to hearing loss is putting greater reliance on this as a tool. There are a number of questions regarding the degree of accuracy of pure tone audiometry when undertaken in field conditions, particularly relating to the difference in conditions between laboratory calibration and clinical or industrial screening use. This study analyzed the output sound pressure level of four different commercial audiometers, all using TDH39 headphones and each of which had recently undergone calibration at an appropriate laboratory. Levels were measured using a Bruël and Kjaer Head and Torso simulator, which accurately replicates the size and shape of a human head, including the ears. A clinical environment was simulated by a trained audiometrist replacing the headphones for each test. Tests were undertaken at three presentation levels, and at the frequencies of 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz and 6 kHz. The results showed a high level of test-retest variability, both between different audiometers and within the same audiometer. Maximum variation of sound pressure level at the ear for the same tone presentation was 21 decibels, with a particularly high level of variation at 6 kHz for all meters. An audiometer with attenuating cups exhibited significantly higher variation than ones using supral-aural headphones. Overall the variation exhibited suggests that there is a higher degree of potential error with screening pure tone audiometry than is commonly assumed and that results particularly at the 6 kHz frequency need to be assessed carefully alongside other methods such as speech audiometry.
Keywords: Audiometer, audiometry, calibration, hearing threshold, variance
|How to cite this article:|
Barlow C, Davison L, Ashmore M, Weinstein R. Amplitude variation in calibrated audiometer systems in Clinical Simulations. Noise Health 2014;16:299-305
| Introduction|| |
Manual pure tone audiometry, carried out to the appropriate standard of the American Standards Association S3.6-2010 in the USA  and ISO 8253-1:2010  in Europe, is described as the "standard for clinical testing"  and the "gold standard" , for the assessment of hearing thresholds by airborne conduction.
Exposure to high levels of noise has long been recognized as a health hazard, with the long term result of noise-induced hearing loss (NIHL) in the majority of people. Several sources, including action on hearing loss  the UK Health and Safety Executive  and the American Speech-Language-Hearing Association  state that long-term exposure to sound pressure levels as low as 80 dBA poses some risk of damage to the hearing system for some people, with levels regularly above 85 dB LAeq posing a risk of mild hearing damage to most people. In particular, a common symptom is that of an audiometric "notch" between 4 and 6 kHz in which the hearing threshold is disproportionally reduced, though this is not always observed. 
With a significant global cost impact of NIHL,  and legislative requirements to screen and protect workers from occupational hearing damage, , the reliance on pure tone audiometry means that it is important to continually assess the test procedure and equipment used for repeatability and accuracy. Critically, it is important to assess that audiometers are being calibrated and maintained to the same standard and hence that results are repeatable and accurate.
The fundamental methodology and equipment for audiometry have stayed very similar for a long period of time, with particular transducers (notably the Telephonics ® TDH-39 and TDH-49 supra-aural headphones, Telephonics) remaining at the core of audiometric screening.  Although there has been discussion of the reliability of different types of screening (automated, computer controlled, manual), and the need for traceable calibration, there has been little analysis in recent years of the degree of variation in performance of calibrated audiometers in clinical situations.
Originally developed from "tuning fork" tests at the turn of the 20 th century, pure tone audiometry has now been in use for over 90 years,  with the first use of a "commercially available" audiometer (the Western Electric 1 A) reported by Fowler and Welch in 1923. 
The basic method of all pure tone audiometers uses a tone generator to present pure tones to a listener via headphones. The system (manual or computer controlled) varies the amplitude and frequency of the tones presented. The tester observes intensities for which listeners respond and intensities for which the listener does not respond, in order to determine the threshold level of hearing at each frequency.
There are a number of different systems for pure tone audiometry, including computer controlled and automatic  and authors have reported variation in validity and reliability according to different systems.  However, the international standards currently in use have standardized on a common method for manual pure tone audiometry as the benchmark for audiological evaluation, in which the test is controlled by a qualified audiometrist or audiologist.
Equipment standards,  the maximum permissible background noise levels, , and calibration requirements, , are laid out in various national and international standards, which state the calibration levels to be used with standard headphone types and standard couplers. This would suggest that use of audiometry in clinical performance should be a reliable measure.
There are a number of aspects, which suggest that calibrated audiometers may not be as reliable as generally thought. One issue is that the specifications do not currently require accreditation in the calibration process, with the guidelines stating simply that the calibration should be performed by "a competent laboratory." This leaves the standard open to interpretation, and many audiometer manufacturers recommend annual calibration to take place in their own facilities, which may or may not be accredited. This lack of accreditation has the potential for errors in the accurate production of tones in audiometer systems, with no centralized standardizing authority to supervise.
A second, but potentially more important issue is that of the level of uncertainty in "field" testing compared to laboratory conditions. The acoustic coupler (artificial ear) defined in IEC 60318-1:2009 used to assess particular headphones is a regular shape, standardized to particular dimensions.  The headphone is coupled to the artificial ear with a static force of 4.5 N (±0.5 N) from either a mass or calibrated jig,  rather than using the tension from the headphone band.
While this method allows for a high level of standardization in the testing of the transducer and tone generator in the system, it assumes that there is a minimal effect on the sound pressure level presented at the ear from nonstandard shapes and sizes of ears and heads, as well as variations in force of coupling.
It also does not account for the use of "attenuating cups", which are noise isolating enclosures for the TDH-39 model drive units, replacing the headband and earcup construction with a larger, semi-circumaural cup design. They are commonly used in industrial screening situations, where background noise is higher than the recommendations for a screening environment. According to IEC 60645-1, attenuating cups should be removed for calibration. 
Procedures laid out for audiometric testing aim to minimize the uncertainty caused by field conditions, by specifying guidelines for earphone placement, as well as for the background noise levels of the testing environment. However, there is still a question of how much the tone presentation from calibrated audiometers can vary at the ear between audiometers and between tests.
This study aimed to assess the level of variation between audiometer measurements under laboratory conditions, using a variety of different manual audiometers.
The study used four commercially available audiometers, which represent the cost range of typical industrial screening audiometers, costing from £995 GBP for the least expensive up to £4500 GBP for the most expensive. The sample size was constrained by budget and accessibility; however, the method was designed to give a representative sample of the performance of typical audiometers. Each of the audiometers had recently undergone certified traceable calibration by its recommended laboratory, meaning that the tone presentation from each should theoretically be identical.
The test system used a calibrated Bruël and Kjær ® Head and Torso Simulator (HATS) of type 4100, using a Bruël and Kjær ® 4231 field calibrator to calibrate the microphone input level for each test. The HATS is designed to represent the "average" shape of a human head and has realistic molded pinnae. This was used to more closely represent the fitment of headphones on a human listener than is possible with the ear simulator specified in IEC 60318.
Microphones were polarized using a Bruël and Kjær ® 2829 microphone signal conditioning unit and Bruël and Kjær ® type 2269 preamplifiers. Signals were measured and recorded using a NTi XL2™ Sound level meter. The tests took place in a Hemi-anechoic chamber with a noise floor of 16 dBA, with performance meeting the absolute noise criteria of ISO 3745:2012. 
The entire audiometers used factory calibrated TDH-39 headphones. One of the audiometers had attenuating cups fitted to the headphones, while the others had the standard (supra-aural) fitment.
| Method|| |
The regularity of performance of the audiometers was assessed by measuring the sound pressure level of a range of tones presented at specified thresholds by each audiometer. The absolute sound pressure level at the ear of the HATS was measured for each given tone presentation and the data analyzed for the degree of variation.
Each audiometer was measured at three presentation levels, over six frequencies. The levels were respectively 30, 50 and 80 decibels of hearing level (dB HL) and frequencies were 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, and 6 kHz.
Each test was run on the left then the right ear of the HATS. Each tone was presented continuously at its given level. Once the level at the ear had stabilized, the average sound pressure level dB LEQ (unweighted, fast response), was measured over a 5 s timed period.
The full test was completed three times for each device, replacing the headphones each time and allowing time for them to settle. This allowed assessment of the potential variation through placement of headphones on different users. Positioning of headphones was undertaken by a certified audiometrist to replicate headphone positioning in a clinical setting.
| Results|| |
[Table 1] shows the mean sound pressure level for each frequency and presentation level and also gives the range (max: min) of mean sound pressure levels recorded at particular presentation frequency/level combination. This range varies from 3 to 12 dB between audiometers.
Mean and standard deviation sound pressure level values for each audiometer at each tone presentation is shown in [Figure 1].
|Figure 1: Mean and standard deviation sound pressure levels for each audiometer (dB)|
Click here to view
Results of the analysis of variation within the set of audiometers, is presented in [Figure 2], showing variation both across different audiometers, and within individual audiometers. The full data can be found in [Table 2].
Variation within recordings from the same meter demonstrate a standard deviation around ±5 dB for the majority of tones, with considerably higher deviations, having a maximum deviation of around ±10 dB.
| Discussion|| |
The high level of variation in measured sound pressure levels at the ear, both within tests from the same audiometer, between different audiometers, and between left and right ears suggests that there is a significant margin of error in audiometric screening. While this sample size was small, the audiometers used are typical of the manufacturers and types of audiometer used in the UK and Europe, and each were laboratory calibrated to the appropriate standards. Repeated tests on each ear showed deviations of tone presentation in a "simulated clinical" setting, which indicates that this would be also likely to occur in clinical situations. As each audiometer should theoretically present identical tones to the ear, any significant deviation is the cause for concern.
A high degree of variability was exhibited between the results from different audiometers which were apparently presenting the same frequency and level to the "listener". Mean variations between audiometers varied between 3 and 12 dB as shown in the data set in [Table 3]. However, the maximum absolute variation between tone presentations was encountered at 6 kHz with presentation level of 80 dB at HL. At this presentation level, a range of 21 dB was measured between the absolute maximum and minimum values recorded in either ear across all audiometers.
As the systems were calibrated, and measured on a calibrated test system with identical microphones and preamplifiers in left and right ears, meeting the requirements for a Class 1 sound level meter, there should be no variation caused by actual acoustic output of the headphones or variation of the measurement system. It is therefore reasonable to assume that this increased variation is due to headphone positioning and acoustic coupling of the headphone to the auditory canal on the HATS, despite being fitted by a qualified audiometrist.
Subjectively, it was harder to position the attenuating cups accurately over the pinna due to reduced visibility of the transducer part of the headphone when fitting. This variation is also likely to occur on real subjects taking audiometric tests, as it is not possible to obtain identical fitment each time the headphones are replaced. This study identifies an issue between the "calibrated" level, in which headsets are placed on a smooth surfaced, symmetrical test system which perfectly couples the transducer to the microphone, compared to the "clinical" level, in which headphones are placed on the (asymmetrical) pinnae and head, and therefore the quality of acoustic coupling is significantly reduced, while potential for positioning error is increased.
Although the correlation between frequency and range of measured level is generally fairly weak, the 6 kHz tone in particular results in high variability across all the amplitude presentations, with 30 dB HL and 50 dB HL also giving maximum absolute variations of 20.8 dB HL and 20.9 dB HL respectively.
Further variation is found within the results of each audiometer. Absolute variation between tests by the same meter at a particular threshold in a single ear, varies between 0.2 dB for the best case (well below the threshold of audibility)  to 11.4 dB at the worst case, at the 6 kHz presentation.
It is suggested that these results are likely to be due to the highly directional nature of the headphones at 6 kHz, where a slight variation in positioning could result in some degree of attenuation by the structure of the ear, for instance by blockage by the tragus. Acoustic absorption of most materials increases with frequency,  and the potential for attenuation by a small degree of variation in placement is high. All audiometers performed similarly poorly at this frequency. This performance could also be linked to the headphone design itself, which is very dated, but a further study comparing different headphones will be needed to ascertain this.
Interestingly, the hearing threshold level at 6 kHz has been questioned by some authors for a number of years,  and it has been suggested that the threshold is set too high given the high proportion of patients who present a threshold shift at this frequency. The results of this study suggest that this high proportion of threshold shift could instead be linked to the variation in performance of the headphones when placed slightly differently or with insufficient tension on the patient's ears.
It is noteworthy that the audiometer which used attenuating cups had greater variation than the other audiometers in this test, with mean levels significantly different to two out of the three other audiometers (P = 0.02 and P = 0.01 respectively).
The audiometer using attenuating cups also has a significantly higher output level than the other meters in a number of frequencies. This is particularly pronounced at 250 Hz, 500 Hz, and 6 kHz, where the mean output levels range between 5 and 12 dB higher than the mean output level of the other audiometers (P < 0.01 in all cases).
The reason for this is unclear; however the use of attenuating cups on this type of headphone is likely to create a calibration error. This is due to the fact that the calibration methods specified by IEC 60318-1  dictate the use of a standardized coupler or artificial ear, and use of these systems requires removal of the supra-aural TDH-39 headphones from the attenuating cups. When they are replaced, the attenuating cups themselves are likely to introduce a degree of resonance which will change the frequency response of headphones to the ear.
A potentially important cause of variation between audiometers is the use of different headband designs and tensions. The headphones on Audiometer 4 were subjectively the loosest fitting of the standard TDH-39 types, and those on audiometer 3 the tightest. This could be a contributing factor to the relatively large standard deviations of the results from Audiometer 4 compared to the lower variation of Audiometer 3. Audiometer 3 (the most expensive of the units on the test) was the most consistent performer, with generally low standard deviations, as can be seen from [Figure 1].
Another important consideration is that this effect is likely to be exacerbated in clinical testing due to the variability of the human head size and shape. Although this study simulated clinical testing, it made use of a HATS, in which each measurement is undertaken on an unmoving, stationary head of identical proportions. This implies that that results could be even worse when dealing with human patients, where variation in head size and shape and the size and shape of the pinnae will affect the accuracy of transducer placement, acoustic coupling and headband tension. Laboratory calibration is undertaken with a specific force applied to the transducer to ensure good coupling between the headphone and the artificial ear (4.5 N ±0.5).  Slack headband tension could reduce the coupling between the transducer and the ear, varying the sound pressure level, and the British Society of Audiology identifies it as a problem, stating that headband tension has an impact on the sound levels delivered. 
In this study, the HATS has a constant shape and size and remains perfectly still during testing. In a clinical environment, fitment to different patients, as well as some movement of the head during testing will cause slight variation in position of the transducer in relation to the auditory canal.
| Conclusion|| |
Results suggest that there is still considerable room to improve the performance of audiometry, and that test results from conventional screening systems need to be carefully assessed for the possibility of error or misdiagnosis. In order to improve the accuracy of audiometry, headband tension needs to be sufficiently high to ensure good coupling between the headphone and ear. This may require the use of higher headband tensions or different headsets appropriate to different sizes of head. As this is potentially an important contributing factor, further research needs to be done on the exact impact of headband tension on results. However, based on this study, it is suggested that measurements of headband tension should form part of the calibration process for audiometers with a minimum level of tension specified.
Even the median variation in sound pressure at the ear could contribute an error of 4 dB in hearing threshold values, which is sufficient to cause misdiagnosis on an audiogram. Where the degree of variation is at its highest, there is a potential error of 20 dB, which even in a single frequency band could lead to the misdiagnosis of a patient due to its contribution to the values used to categorize hearing loss. 
A key finding of this study is the effect of attenuating cups on audiometry with significantly increased variability in frequency response and high test-retest variability. It would be recommended that audiometrists avoid the use of attenuating cups.
These results also support the ideal proposed by McNeill et al.  of moving away from the use of TDH-39 type headphones and standardizing on a more modern and uniform headphone design, with the potential for improved frequency response and better sound isolation without the need for attenuating cups.
Overall the results suggest that the current standards for audiometry calibration do not take into account the issues faced by clinical practice, and there is a need to address aspects which may cause significant variation in results, such as headband tension and transducer positioning. This further indicates that the results of pure tone screening need to be carefully considered alongside other tests such as speech testing.
| References|| |
|1.||American National Standards Institute. Methods for Manual Pure-tone Threshold Audiometry. New York: ANSI/ASA; 2004. |
|2.||International Organisation for Standardization. ISO 8253-1:2010 Acoustics-Audiometric Test Methods. Part 1: Pure Tone and Bone Conduction Audiometry. Geneva: ISO; 2010. |
|3.||Franks JR. Hearing measurement. In: Goelzer B, Hansen CH, Sehrndt GA, editors. Occupational Exposure to Noise: Evaluation, Prevention and Control. Geneva: World Health Organisation; 2001. p. 183-231. |
|4.||Sindhusake D, Mitchell P, Smith W, Golding M, Newall P, Hartley D, et al. Validation of self-reported hearing loss. The Blue Mountains Hearing Study. Int J Epidemiol 2001;30:1371-8. |
|5.||Roeser RJ, Clarke J. Pure tone tests. In: Roeser RJ, Valente M, Hosford-Dunn H, editors. Audiology: Diagnosis. New York: Thieme Medical; 2000. p. 238-260. |
|6.||Action on hearing loss. Action on hearing loss; 2012. Available from: http://www.actiononhearingloss.org.uk/your-hearing/about-deafness-and-hearing-loss/types-and-cause-of-hearing-loss/noise.aspx. [Last accessed on 2013 Oct 27]. |
|7.||Butterfield D. Measurement of Noise Levels that Staff are Exposed to at Live Music Events. Norwich: HMSO. 2006. |
|8.||American Speech-Language-Hearing Association. American Speech-Language-Hearing Association; 2013. Available from: http://www.asha.org/public/hearing/disorders/noise.htm. [Last accessed on 2013 Oct 25]. |
|9.||Borchgrevink HM. Does health promotion work in relation to noise? Noise Health 2003;5:25-30. |
|10.||Nelson DI, Nelson RY, Concha-Barrientos M, Fingerhut M. The global burden of occupational noise-induced hearing loss. Am J Ind Med 2005;48:446-58. |
|11.||Barlow C, Castilla-Sanchez F. Occupational noise exposure and regulatory adherence in music venues in the United Kingdom. Noise Health 2012;14:86-90. |
|12.||Health and Safety Executive. UK Statutory Instrument 2005 no 1643. The Control of Noise at Work Regulations 2005. Norwich; 2005. |
|13.||McNeill HA, Toor GR, Sherwood TR. Differences in the Performance of Metal- and Plastic-Cased tdh-39 and tdh-49 Audiometric Earphones, and Consequences for their Calibration. Teddington: National Physical Laboratory; 1995. |
|14.||Vogel, McCarthy PA, Bratt GW, Brewer C. The clinical audiogram: Its history and current use. Commun Disord Rev 2007 1:2; 83-94. |
|15.||Feldmann H. A History of Audiology; a Comprehensive Report and Bibliography from the Earliest Beginnings to the Present. Translations of the Beltone Institute for Hearing Research. Vol. 22. No. 17. Chicago: The Beltone Institute 1970. |
|16.||ASHA. Guidelines for Manual Pure-tone Threshold Audiometry. Rockville, MD: American Speech-Language-Hearing Association; 2005. |
|17.||International Electrotechnical Commission. IEC 60645-1:2001. Electroacoustics: Audiological Equipment. Part 1: Pure Tone Audiometers. Geneva: IEC; 2001. |
|18.||British Society of Audiology. Recommended procedure-pure-tone air-conduction and bone conduction threshold audiometry with and without masking. Reading, BSA; 2011. |
|19.||British Standards Institute. BS EN ISO 389-1: 2000 Acoustics - Reference zero for the calibration of audiometric equipment. Part 1: Reference equivalent thresholds for pure tones and supra aural headphones: British Standards Institute; 2000. |
|20.||American National Standards Institute. ANSI S3.6-2004 Specifications for Audiometers. New York: ANSI/ASA; 2004. |
|21.||International Electrotechnical Commission. IEC 60318-1:2009. Electroacoustics-Simulators of Human Head and Ear-Part 1: Ear Simulator for the Measurement of Supra-aural and Circumaural Earphones. Geneva: IEC; 2009. |
|22.||International Organisation for Standardisation. ISO 3745:2012. Acoustics. Determination of Sound Power Levels and Sound Energy Levels of Noise Sources Using Sound Pressure. Precision Methods for Anechoic Rooms and Hemi-anechoic Rooms. Geneva: ISO; 2012. |
|23.||Howard DM, Angus J. Acoustics and Psychoacoustics. Oxford: Focal Press; 2005. |
|24.||Robinson DW. Threshold of hearing as a function of age and sex for the typical unscreened population. Br J Audiol 1988;22:5-20. |
|25.||McBride, D and Williams, S ′Characteristics of the audiometric notch as a clinical sign of noise exposure′, Scandinavian Audiology 2001;30:106-111. |
Dr. Christopher Barlow
School of Technology, Southampton Solent University, East Park Terrace, Southampton SO14 0RD
Source of Support: This project was supported by the Technology Strategy Board, UK 2013. Grant reference: SKTP 1000821,, Conflict of Interest: None
[Figure 1], [Figure 2]
[Table 1], [Table 2], [Table 3]