Published ahead of print on May 13, 2004, doi:10.1164/rccm.200307-968OC
© 2004 American Thoracic Society
Occupational Screening for Obstructive Sleep Apnea in Commercial DriversCenter for Sleep and Respiratory Neurobiology and Division of Sleep Medicine, Department of Medicine, University of Pennsylvania Medical Center, and Pulmonary and Critical Care and Sleep Section, Philadelphia Veterans Affairs Medical Center, Philadelphia, Pennsylvania Correspondence and requests for reprints should be addressed to Indira Gurubhagavatula, M.D., M.P.H., Center for Sleep and Respiratory, Neurobiology Hospital of the University of Pennsylvania, 9th Floor, Maloney Building, 3600 Spruce Street, Philadelphia, PA 19104-4283. E-mail: gurubhag{at}mail.med.upenn.edu
Excluding the presence of obstructive sleep apnea in commercial drivers is valuable, as the syndrome may increase their risk of sleepiness-related accidents. Using polysomnography as the criterion standard, we prospectively compared accuracies of five strategies in excluding the presence of severe sleep apnea and, secondarily, any sleep apnea among 406 commercial drivers. These strategies were as follows: (1) symptoms; (2) body mass index; (3) symptoms plus body mass index; (4) a two-stage approach with symptoms plus body mass index for everyone, followed by oximetry for a subset; and (5) oximetry for all. For excluding severe apnea, the two-stage strategy was highly successful, with 91% sensitivity and specificity, and a negative likelihood ratio of 0.10. This strategy was comparable in accuracy to oximetry, which had a negative likelihood ratio of 0.12, and was 88% sensitive and 95% specific. If we avoided oximetry altogether, then symptoms together with body mass index were 81% sensitive and 73% specific, with a negative likelihood ratio of 0.26. On the other hand, excluding any apnea could not be done with reasonable accuracy unless oximetry was used. We conclude that two-stage screening is likely to be a viable means of excluding severe sleep apnea among commercial drivers.
Key Words: nocturnal pulse oximetry polysomnography questionnaire Obstructive sleep apnea (OSA) with daytime sleepiness affects 24% of Americans (1). Untreated OSA may lead to decreased cognitive function (2), psychomotor impairment (3), decrement in driving skills (4), including increased off-road deviations in driving simulators (4) and increased risk of vehicular accidents (57). In the occupational setting, then, OSA is a salient issue for commercial drivers. Among commercial drivers, severe apnea, defined as having an apneahypopnea index (AHI) of 30 events or more per hour (8) in a sleep study, may lead to marked sleepiness and impaired task performance (9). Moreover, treating severe sleep apnea with positive airway pressure (10) improves alertness (11), crash risk (12), and performance assessed by a driving simulator (13), benefits that are not realized after administering placebo (13). However, data for treating mild to moderate sleep apnea, with an AHI between 5 and 30 events per hour (8), are less compelling (1418). Thus, identifying and then treating severe OSA in commercial drivers are of particular interest. Identification of sleep apnea has long relied on in-laboratory polysomnography as the diagnostic standard (19). However, polysomnography is expensive (20) and not easily accessible, and remains unsuitable for occupational screening. Other case identification strategies are readily available and avoid reliance on a specialized laboratory, but have never been evaluated in this occupational setting. We evaluated several such strategies alone and in various combinations, each with increasing complexity. The simplest strategy we explored depended on response to questions about three apnea-related symptoms. We also chose body mass index (BMI), because obesity is a major OSA risk factor (20). In addition, we looked at a risk score that combined information about these symptoms with BMI as well as age and sex, in a tool we developed and called the multivariable apnea prediction index (21). Another strategy we chose combined multivariable prediction with nocturnal oximetry in two stages, limiting oximetry to a subset of drivers. The final strategy used oximetry for all drivers, counting the desaturation frequency as a measure of sleep-disordered breathing. In a large cohort of commercial drivers, we compared how well these strategies could exclude individuals with severe sleep apnea. Secondarily, we determined how well they exclude any sleep apnea, defined as an AHI of five events or more per hour. We have presented results of this study in abstracts (9, 2225), and a manuscript describing results of performance tests in this sample is in review (Pack AI, Staley B, Pack FM, Rogers WC, George CFP, Dinges DF, Maislin G. Impaired performance in commercial drivers: role of short sleep durations and sleep apnea [submitted manuscript]).
See the online supplement for additional detail regarding subject selection, diagnostic studies, and data analysis.
Subject Selection
Multivariable Prediction
Pulse Oximetry
Polysomnography and Mitigating Bias in Relationships between Predictive Variables and Apnea Status We assessed the comparability of survey responders, nonresponders, and in-laboratory participants to evaluate our degree of success in mitigating participation bias. We compared age, sex, and ZIP code (a surrogate for ethnicity and socioeconomic status); the role of these variables in OSA has been reviewed previously (28). We could not compare body mass indices, as these data were unavailable through Pennsylvania Driver Licensing Services.
Two-stage Strategy
Determination and Comparisons of Optimal Cut Points for Single-stage Strategies and Optimal Parameter Set for Two-stage Strategy We determined the discriminatory characteristics of symptoms, BMI, multivariable prediction, and oximetry (29) by computing the area under the curve (AUC) (30, 31) for receiver operating characteristic (ROC) curves. These were constructed by computing sensitivity and specificity at various cut points, using the ROCKIT software package (University of Chicago, Chicago, IL [32]). The optimal sensitivity and specificity, and the associated cut point, were derived by extrapolating from the ROC curve at the point where the slope equaled 1 (33). This slope was selected on assigning relative weights to false-positive and false-negative diagnoses (see the online supplement). Negative likelihood ratios were computed as (1 sensitivity) divided by the specificity at this optimal cut point. For the two-stage method, we used SAS (Cary, NC) programming to compute sensitivity and specificity for each of 180 combinations of upper bound, lower bound, and ODI threshold (29). To do this, for each combination of lower bound, upper bound, and ODI threshold, we compared the binary prediction of the screening strategy against the AHI value of each subject's polysomnography. We computed the total number of false-positive and false-negative predictions. Sensitivity was defined as 1 the false-negative rate, whereas specificity was calculated as 1 the false-positive rate. We plotted these values of sensitivity against 1 the specificity, and computed the AUC (SigmaPlot; Rockware, Golden, CO) (see the online supplement for details). We determined an optimal parameter set for the two-stage strategy, using a procedure analogous to that applied for the one-stage strategy. We found one parameter set for excluding severe apnea, and a different set of values for any apnea. Using bootstrap resampling (34), we computed nonparametric 95% confidence intervals around the sensitivity, specificity, negative likelihood ratio, and AUC (see the online supplement).
Demographics and Apnea Occurrence Among respondents, 93.5% were male, with an average (± SD) age of 44.4 (± 11.2) years. The sample contained 85% white individuals, 12.5% African-Americans, and 1.9% Hispanics. The data are summarized in Table 1 for 247 high-risk subjects, 159 low-risk subjects, the weighted average of both groups, and the 1,329 respondents. The proportion of OSA in the weighted sample was 28.1%, using an AHI of five or more episodes per hour to define any apnea, and 4.7%, using an AHI of 30 or more episodes per hour to define severe apnea. The weighted averages were computed as (0.42 x higher risk mean) + (0.59 x lower risk mean).
Distribution of BMI Figure 2 shows the BMI frequency distribution of the weighted sample for the following categories: BMI less than 25 kg/m2, 2529.9 kg/m2 (overweight), 3034.9 kg/m2 (obese, Class I), 2539.9 kg/m2 (obese, Class II), and 40 kg/m2 or more (extremely obese). Approximately half were obese, with a BMI of 30 kg/m2 or more, another 38% were overweight, with a BMI of 2529.9 kg/m2.
Two-stage Strategy: Determining Optimal Parameters To identify severe apnea, an upper bound of 0.9, a lower bound of 0.3, and a desaturation threshold of 10 events/hour are the optimal parameters. To identify any apnea, these optima are as follows: upper bound, 0.9; lower bound, 0.2; and desaturation threshold, 5 events/hour.
Discriminatory Power of Screening Strategies
For severe sleep apnea, the negative likelihood ratio is highest (0.62) when symptoms alone are used. The ratio improved and decreased further with the complexity of the strategy: it was 0.33 for BMI, 0.26 for multivariable prediction, 0.10 for the two-stage strategy and 0.12 for oximetry. We looked at whether increasingly complex strategies increased discriminatory power by increasing sensitivity or specificity. We report these results for single cut points, which were specifically chosen using the optimization strategy we described (see the online supplement). This strategy takes into account two salient criteria: OSA prevalence, and a ratio of false-positive to false-negative diagnoses. These ratios were assigned so that missing a case was considered more important than wrongly labeling a normal driver as having apnea, particularly so in the case of severe apnea. Using these optimized cut points, symptoms alone were least sensitive and specific; BMI was more sensitive and specific than symptoms alone. Multivariable prediction augmented sensitivity offered by BMI alone, from 7781%, and the two-stage strategy raised sensitivity and specificity to 91%. Oximetry enhanced specificity, raising it to 95%, with sensitivity similar to that of the two-stage strategy. The two-stage strategy missed few cases of severe apnea, preserved specificity, and had the lowest negative likelihood ratio. All strategies identified any apnea less accurately than severe apnea (see Table 3) . Again, symptoms alone were not particularly useful, with a negative likelihood ratio of 0.71. This ratio again improved (i.e., decreased as complexity of the strategy increased) with the two-stage strategy and oximetry having better discriminatory power (negative likelihood, 0.29). No strategy, however, excluded drivers with AHIs of 5 or more per hour with acceptably high sensitivity.
Identifying commercial drivers with severe apnea, defined as an AHI of 30 events per hour or more, was our primary goal because this cut point is associated with marked sleepiness and impaired task performance in our studies (9). In addition, patients with severe sleep apnea derive benefits from positive airway pressure therapy (10) including reduction in crashes (12) and improved driving performance as measured by a driving simulator (13). In this regard, our analysis shows that a two-stage strategy combining symptoms and BMI with oximetry performed very well, with 91% sensitivity and specificity, and yielded the best negative likelihood ratio, 0.10. Applying a standard Bayesian nomogram (35), if this strategy predicts low risk for severe apnea, then given our pretest probability of 4.7%, the likelihood of having severe apnea is below 0.5%. This excludes severe apnea with high confidence (36). This result is particularly useful, because confirmatory polysomnography testing is expensive and often inaccessible. In addition, the strategy predicted that oximetry is not necessary in 31% of our sample, nor is polysomnography necessary in 86%. The false-positive rate was 8.9%, the false-negative rate was 0.5%, and the negative predictive value was 99%. Oximetry applied to every subject maintained the negative predictive value at 99%, but is more expensive and less convenient than the two-stage strategy, and did not improve the negative likelihood ratio. Thus, to exclude severe apnea, the two-stage strategy that we proposed previously (29) is optimal for this sample. All of our strategies performed relatively worse in the prediction of any apnea, compared with severe apnea (see Table 3). Again, symptoms alone had little value in this population, with a negative likelihood of 0.71, and oximetry offered minimal additional predictive advantage to the two-stage strategy, with a negative likelihood of 0.29. For our two-stage strategy, we chose multivariable prediction, which combined symptoms with BMI, rather than using BMI alone. Although a BMI of less than 25 kg/m2 was successful in excluding all but 3 cases of apnea, only 46 subjects (20% of the weighted sample) met this criterion, whereas 88 (38%) drivers had a multivariable prediction of less than 0.3. Thus, addition of symptoms was more useful in this setting than use of BMI alone, because more subjects could potentially be prevented from requiring oximetry. Evaluating these strategies in this group of commercial drivers is likely to be important for public health reasons, as U.S. accident reports indicate that in 2001, large trucks were involved in 429,000 crashes. Nearly 5,000 of these crashes were fatal, responsible for 12% of all traffic deaths, whereas an additional 130,000 victims suffered nonfatal injuries. Commercial crashes are also expensive, costing on average $75,637 per crash and $3.54 million per fatal crash (37). Sleepiness has been shown to account for 3141% of major crashes of commercial vehicles (38, 39). Although we know little about the role of OSA in crashes in commercial vehicles, studies of passenger cars have shown increased crash risk in drivers with apnea (57) and that effective treatment of OSA reduces such risk (12). Our strategies assess the risk of OSA, rather than OSA syndrome, in which the presence of sleep-disordered breathing is accompanied by subjective sleepiness. We did not limit our analysis to those drivers who reported sleepiness in our study. Such sleepiness may itself be subject to reporting bias, particularly in an occupational setting. Therefore, we propose that enrollment of drivers regardless of the report of sleepiness remains one of the study's strengths. Without taking sleepiness into account, the proportion of OSA in our population was similar to that reported by Young and coworkers, who reported that 24% of middle-aged men had an AHI of at least 5 events per hour and 9.1% had an AHI of at least 15 events per hour (1). Stoohs and coworkers (40) reported a much higher value of 78% with an oxyhemoglobin desaturation index at or exceeding 5 per hour, and 10% with oxyhemoglobin desaturation index at or exceeding 30 per hour. In our population, we note that 29% of the weighted sample had an oxyhemoglobin desaturation index of 5 or more per hour, and 3% had an oxyhemoglobin desaturation index of 30 or more per hour. Diagnostic ascertainment methodologies differed between these studies; similar to Young and coworkers, we used full sleep study data, whereas Stoohs and coworkers used a less rigorous definition of snoring with 3% desaturation. Because electroencephalogram and airflow data were not recorded during this study, desaturations without airflow limitation could be scored and raise the reported prevalence considerably (41), particularly because this group had a high prevalence (44%) of smoking. This rate is higher than the smoking rate we report in our sample (see Table 1). Pulmonary function data, were they available, might explain a high prevalence of desaturation. In addition, the subjects were recruited from a single employer, rather than being drawn from a more representative, community-based population. We address whether this difference in proportion could be an indication of differential participation in our studynot merely on the basis of a given predictor, but rather on both the predictor and apnea status simultaneously (e.g., greater participation among obese subjects who also have OSA, versus obese subjects without OSA). Although this bias may limit generalizability, more serious is the potential threat to internal validity. To mitigate this bias, then, given the value of any predictor, participation must be independent of apnea status. Specific design strengths limit, but do not eliminate, the likelihood of such bias in our study: prospective data collection, objective disease ascertainment, and blinded scoring of results. We also conducted a nonresponder analysis to assess the likelihood of such bias, were it to occur. Because we performed sleep studies only after questionnaire administration, reporting symptoms on the basis of a priori knowledge of apnea status was unlikely. In addition, an overnight sleep study is the objective standard for apnea diagnosis. Moreover, technicians who scored the sleep studies had no knowledge of any other screening test result. Finally, our comparison of age, sex, and sociodemographic factors as assessed by ZIP code between responders, nonresponders, and in-laboratory subjects yielded no statistically significant differences. Indeed, the age and sex distributions were nearly identical (data not shown). Although this comparability does not guarantee the absence of participation bias, we are encouraged that responders and nonresponders were similar in the variables we assessed. The availability of BMI data in nonresponders would have further strengthened our analysis. Despite these limitations, this is the first and largest scale study to address screening for OSA in any high-risk population, where public safety remains a distinct concern. We also note that we validated our strategy in the same cohort in which it was derived, which may artificially inflate its predictive value, a phenomenon known as regression toward the mean (42). Developing the strategy in a subset of our population and validating it in another would have been a stronger approach. However, this split-sample approach is not viable in this study because of insufficient numbers of subjects with severe apnea. Oximetry was conducted concurrently with polysomnography in our study, raising the question of whether the two tests could be scored independently. However, our scorer of oximetry data had no prior knowledge of sleep study data. In addition, rescoring of a random 10% sample of oximetry tracings by a second as well as by the original interpreter showed no significant differences in scores. Testretest and interrater reliabilities were high, with intraclass correlation coefficients of 99 and 97%, respectively. Performing oximetry and sleep studies together in the laboratory also ensured that the driver was sleeping in an identical position and was in an identical stage of sleep for both studies. Future studies need to evaluate oximetry done independently, and also at home. Although we chose oximetry for Stage II of our two-stage strategy, we also considered the selection of multivariable prediction for the first stage of this strategy, rather than BMI. Although BMI may be a simpler alternative, multivariable prediction incorporates other information in addition to BMIincluding age, sex, and symptom frequency. The predictive utility of these additional variables becomes most important when the subject is not obese (21). Thus, we expect our algorithm to have incremental value among populations with lower prevalence of obesity compared with using BMI as the Stage I screen. Even in this relatively obese population, however, multivariable apnea prediction provided substantial improvement in discriminatory power over BMI alone. Moreover, the optimal cut points we selected for BMI and multivariable prediction, which take into account prevalence and misclassification rates, show that multivariable prediction excluded a much larger proportion of drivers from requiring further testing as compared with BMI alone. Our study raises the question of whether commercial drivers should be screened routinely for severe OSA, perhaps during preemployment physical examination. Our study is the first step toward addressing this question; the two-stage screening strategy we propose is optimal and highly accurate in excluding severe apnea. Before instituting routine screening, however, we propose that additional data be acquired. A case-control study similar to that done among drivers of passenger cars (7) should be conducted first to evaluate the role of OSA as a risk factor for crashes of commercial vehicles, particularly those involving injury or death. A finding that an AHI of or exceeding 30 events per hour is a risk factor for crashes would strengthen our conclusion that the two-stage strategy would be a reasonable approach to screen commercial drivers for this disorder. However, if further studies indicate that an AHI below this level is a more appropriate cutoff, then the screening strategy to be adopted will require further refinement. Second, controlled, randomized trials should assess whether drivers identified by screening will use and benefit from therapy. The costs of screening need to be weighed against the costs of outcomes without screening. Confirming treatment benefits at acceptable economic costs would provide justification for institution of a routine screening program in this occupational setting. We conclude that among our community-based sample of commercial drivers, a two-stage screening algorithm that incorporates questionnaire data in all, followed by oximetry in a subset, is useful in excluding drivers with severe sleep apnea, a group that may be at risk for fall-asleep accidents judged by off-road deviations in driving simulators (4).
Supported by Trucking Research Institute contract DTFH61-93-C-00088 funded by the Federal Highway Administration (now the Federal Motor Carriers Safety Administration). The Trucking Research Institute is part of the American Trucking Association. Also supported by NIH grants 3-M01-RR00040, P01-HL-60287, and K23 RR16068-03. As part of the contract, both the Trucking Research Institute and Federal Highway Administration could comment on the manuscript, but could not mandate change. Nellcor, Inc., and Ohmeda, Inc., provided partial support for the study. This article has an online supplement, which is accessible from this issue's table of contents online at www.atsjournals.org Conflict of Interest Statement: I.G. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; G.M. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; J.E.N. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript; A.I.P. has a grant from ResMed, Inc., to study the relative role of ambulatory recording of sleep-disordered breathing as it compares with full sleep study and also receives royalties from Marcel Dekker for a book he edited, entitled Pathogenesis, Diagnosis and Treatment of Sleep Apnea. In addition, Nellcor, Inc. and Ohmeda, Inc. contributed oximeters in a study of sleep apnea in commercial drivers and A.I.P. has a patent pending related to the use of serotonin agonist to treat sleep apnea in mammals. Received in original form July 16, 2003; accepted in final form May 10, 2004
This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||