Published ahead of print on June 15, 2007, doi:10.1164/rccm.200612-1819OC
© 2007 American Thoracic Society doi: 10.1164/rccm.200612-1819OC
The Use of Gene-Expression Profiling to Identify Candidate Genes in Human Sepsis1 Department of Intensive Care Medicine, Nepean Hospital and Western Clinical School, University of Sydney, Penrith, Australia; and 2 Ramaciotti Centre for Gene Function Analysis and School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia Correspondence and requests for reprints should be addressed to Ruby C. Y. Lin, Ph.D., Ramacotti Centre for Gene Function Analysis, School of Biotechnology and Biomolecular Sciences (D26), University of New South Wales, NSW 2052, Australia. E-mail: rubyl{at}unsw.edu.au
Rationale: Our understanding of the pathophysiology of sepsis remains incomplete. Genomewide study offers an unbiased, system biology approach to examine the expression patterns of circulating leukocytes and may reveal novel insights into the host response to sepsis. Objectives: We examined whether gene-expression profiling of neutrophils could identify signature genes and important pathways in the clinical syndrome of sepsis. Methods: Gene-expression profiling was performed using oligonucleotide microarrays on peripheral blood samples of 94 critically ill patients (71 septic and 23 nonseptic). Using a supervised learning algorithm based on support vector machine, a molecular signature of sepsis was generated from a training set of 44 samples and validated in an independent set of 50 samples. The diagnostic performance of the signature genes was assessed against a reference standard based on the International Sepsis Forum Consensus Conference definition of infection.
Measurements and Main Results: A set of 50 signature genes correctly identified sepsis with a prediction accuracy of 91 and 88% in the training and validation sets, respectively. The diagnostic performance remained high regardless of patient's age, comorbidities, or prior antibiotic treatment. Compared with controls, genes involved in immune modulation and inflammatory response had reduced expression in patients with sepsis. In particular, the activation of nuclear factor- Conclusions: The signature genes reflect suppression of neutrophils' immune and inflammatory function by sepsis. Gene-expression profiling therefore provides a novel approach to advance our understanding of the host response in sepsis.
Key Words: microarray analysis sepsis syndrome
Diagnosing sepsis remains difficult in critically ill patients. Delayed diagnosis is associated with increased mortality (1). Diagnostic uncertainty also leads to indiscriminant use of antibiotics and the emergence of multiresistant organisms in critically ill patients in whom antibiotic use can be 10 times greater than in general hospital patients (2). With the global incidence of sepsis steadily increasing, there is a need to improve the management of patients with sepsis through more timely and accurate diagnoses (3). Although clinical evaluation has been the mainstay of sepsis diagnosis, the physical signs of sepsis can be nonspecific. For example, fever and new chest radiograph infiltrates can be found not only in pneumonia but also in pulmonary embolus or postoperative atelectasis. Traditional markers of infection, such as leukocytosis or C-reactive protein, are often unhelpful because they are frequently elevated in critically ill patients. In addition, patients are commonly sedated and ventilated and may have multiple comorbidities, adding further difficulties to differentiating sepsis from other serious illnesses. Microbiological culture is routinely used to assist diagnosis. However, it has several limitations, including poor sensitivity, a delay of up to 48 hours, reduced yield because of prior antibiotic therapy, mixing of colonizing organisms with pathogens, and difficulty in interpreting the importance of organisms normally of low virulence (4). As a result, a significant proportion of patients treated for sepsis do not have microbiological documentation of infection (5). For many years, researchers have searched for a diagnostic biomarker that is specific to the diagnosis of sepsis. Although many molecules have been studied, none has gained widespread acceptance (6). Given the complexity of the signaling network in sepsis, the pursuit of a single biomarker seems improbable. A more rational approach is to study the whole-genome expression profile of sepsis. The aims of the present study were twofold. First, we investigated whether whole-genome profiling of circulating leukocytes could identify candidate genes of sepsis. Second, we examined whether such candidate genes could distinguish patients with sepsis from those without sepsis. Microarray technology was used to perform whole-genome analysis of gene expression, an approach that has been used successfully for tumor diagnosis in oncology (7).
The study protocol was approved by the institutional review board of the hospital. Written, informed consent was obtained from all patients or their families. The reporting of study findings is in accordance with the Standards for Reporting of Diagnostic Accuracy (8, 9).
Study Population and Eligibility Criteria
Patients were recruited in two separate phases (Figure 1B). In the first phase, a "training set" of 44 samples was obtained from patients with retrospectively confirmed sepsis and control subjects according to reference diagnosis (see below). In the second phase, a "validation set" of 50 samples was obtained from prospectively enrolled patients with suspected sepsis. Eligibility criteria for the training set included sepsis patients in whom infection was confirmed and control patients in whom infection had been ruled out. The eligibility criteria for the validation set included patients who were admitted because sepsis was suspected and intensive care unit patients whose condition had deteriorated and in whom sepsis was the suspected cause.
Enrollment and Data Collection Data were collected prospectively using prespecified data entry forms that included information such as patients' demographics, comorbidities, clinician's diagnosis, APACHE (Acute Physiology and Chronic Health Evaluation) II scores, treatment, and microbiological results.
Diagnostic Categories We established the reference diagnosis (the presence/absence of sepsis) retrospectively at the end of the patient's hospital stay (or after death) by reviewing patients' medical records. The diagnosis was ascertained using all the information available in the medical records. The information included microbiological reports, polymerase chain reaction results, image studies (e.g., computed tomography scans), surgical findings, tissue histopathology reports, and patient's response to antibiotics. The investigator who determined the reference diagnosis was blind to the results of the microarray analysis. To minimize misclassification error, we followed all patients until discharge or death (in-hospital) and documented all subsequent microbiological results where available or clinical events that might suggest the presence of an infection that was initially missed on admission.
RNA Extraction and Gene-Expression Profiling A dual-channel, common reference design using oligonucleotide arrays was used for all microarray experiments. Universal human reference RNA (Stratagene, Santa Clara, CA) was hybridized to the reference channel and sample RNA (from patient) to another channel. The universal human reference RNA comprised RNA pooled from 10 cell lines for optimal broad gene coverage (13). Expression level of the sample RNA was then measured relative to the reference RNA. For each gene, the ratio of intensities of sample to reference RNA was measured and then log-transformed before analysis. The experimental design, RNA extraction, and microarray experiment in this study are all MIAME (minimum information about a microarray experiment)-compliant. The complete raw and normalized microarray data are available through the GeneExpression Omnibus of the National Centre for Biotechnology Information (http://www.ncbi.nlm.nih.gov/geo/, accession number GSE5772).
Statistical Analysis The misclassification error of the model within the training set was assessed by cross-validation using bootstrapping (14). To minimize bias, gene selection was repeated in each step of the cross validation (15). To ensure that the results were not due to random chance, 1,000 permutations of the dataset were performed to obtain a permutation P value. Finally, the signature genes were tested in the independent validation set to provide a further validation of the prediction accuracy of the model. Sample size calculation was performed for the training set and validation set separately. For the training set, we calculated the number of patients needed to identify genes with a twofold difference in expression level between sepsis and control. To have 95% power of detecting such a difference, the sample size required would be at least 44 patients, based on a sepsis–to–control patient ratio of 2:1 and the number of false-positive discoveries being no more than 0.1%. For the validation set, the required sample size would be 50 patients, assuming an expected prediction accuracy rate of at least 80%, with a 95% confidence interval of 68 to 92. More details of the development of the prediction model and statistical methods can be found in the online supplement.
A prospective, single-center observational study of 94 critically ill patients admitted to Nepean Hospital, Sydney, was conducted from October 2004 to May 2006. During the 18-month study period, a total of 109 patients were enrolled, with 94 patients included in the final analysis. Fifteen patients did not receive microarray analysis due to technical difficulties (Figure 1A). The mean age of the patients was 63.5 years; 60.6% were male. Baseline characteristics between the training and validation sets were similar except that the validation set had older patients and more episodes of sepsis (Table 1). Of the 94 patients, 92 met two or more of the criteria for systemic inflammatory response syndrome (SIRS), based on the international consensus definition (10). There was no difference in baseline characteristics between the 15 patients not included in the analysis and the 94 who were analyzed.
Reference Diagnosis All 94 patients received reference diagnosis. Of the 71 patients who were diagnosed as septic, 53 had infection confirmed by identification of causative pathogens, including specimens obtained using invasive procedures or at postmortem examination (Table E2). The remaining 18 patients had clinical evidence highly suggestive of infection. These were classified as probable infection, in accordance with the International Sepsis Forum Consensus Conference guidelines. (12) Twenty-three patients were classified as nonseptic control subjects.
Diagnostic Performance To influence clinical decision making, the signature genes must be shown to be robust when applied in a heterogeneous population of patients. The validation set represented such patients, consisting of a full spectrum of sepsis syndrome commonly seen in clinical practice. As expected, the diagnostic performance of the signature genes in the validation set decreased slightly (Table 2). However, the sensitivity and positive predictive value in the validation set remained high (0.95 and 0.91), thereby preserving a high overall prediction accuracy of 88%. To assess whether other clinical factors could affect diagnostic performance, we reanalyzed the dataset by grouping samples according to age, sex, coexisting diseases, or prior use of antibiotics. The results showed that the signature genes remain accurate even in older patients with multiple comorbidities who have received antibiotics (Figures 2B and 2C).
Misclassification bias occurs when the reference diagnosis incorrectly assigns a control patient as septic or a septic patient as a control. The impact of such bias was assessed by performing a sensitivity analysis. Based on published data (16), we estimated the likely false-positive rate in the 18 patients who were classified as septic but had no pathogens identified. Likewise, we estimated the false-negative rate in the 12 patients who were classified as control but had no follow-up microbiology available. We then recalculated the prediction accuracy. This resulted in an adjusted prediction accuracy of 86% in the training set and 82% in the validation set. (Further details of the sensitivity analysis are given in the online supplement.)
The Molecular Signature of Sepsis
The inflammatory response cluster constituted the largest group whose genes were suppressed in the presence of sepsis (Figure 3B). The second cluster included genes involved in immune regulation. Genes involved in the positive regulation of the immune system were less expressed in patients with sepsis than in control patients, whereas genes involved in the negative regulation of the immune system were more expressed in patients with sepsis (Figure 3C). These findings suggest that sepsis might have an inhibitory effect on immune regulation. The third cluster included genes involved in mitochondrial functions such as oxidative phosphorylation and ATP synthesis. The extent of inhibition was significantly less in patients with sepsis (Figure 3D).
To further elucidate the effect of sepsis on host response, we undertook gene set comparison analysis. This method identified cellular pathways (contained in each gene set) that were differentially expressed between patients with sepsis and control subjects. The pathways were defined based on Gene Ontology categories and other publicly available databases, such as KEGG pathways and BioCarta pathways. This method tests the hypothesis that the degree of differential expression for a pathway is no different than would be expected by chance (details of the statistical method are given in the online supplement). Using this method, the nuclear factor (NF)-
This study has shown that gene-expression profiling of neutrophils could reliably distinguish sepsis from other noninfectious conditions in critically ill patients, particularly those with SIRS. It is important to discriminate sepsis from SIRS because the treatments for the two conditions are different. Although the gene-expression profilings of sepsis and SIRS have been described (17, 18), the findings of such studies have not found application in clinical practice due to the difficulty of handling hundreds of potential candidate genes to find useful diagnostic markers. In this study, we have identified such diagnostic markers in neutrophils and showed that the identified genes could accurately distinguish sepsis from SIRS. We validated our findings in an independent set consisting of a heterogeneous patient population that resembles what clinicians would encounter in routine clinical practice, thus providing further support of the clinical applicability of these findings. The signature expression profile consists of 50 highly predictive genes. These 50 genes provide a unique gene expression profile that is pathognomonic of sepsis and reveal important biological insights into the host response mediated by neutrophils. For example, RAF1 (v-raf-1 murine leukemia viral oncogene homolog 1) encodes a protein that regulates the mitogen-activated protein kinase pathway, which plays an important role in inflammation (19). The reduced expression of genes involved in inflammatory pathways in patients with sepsis suggested that infection may have suppressed the neutrophils' natural inflammatory response to pathogens. Inhibition of mitochondrial function in systemic inflammation has been well described (20, 21). In the case of overwhelming inflammation, reduced mitochondrial function is believed to be an adaptive host response to preserve cellular integrity and enhance the chance of subsequent recovery (22). Our findings showed that the mitochondrial inhibition was significantly less in patients with sepsis, providing the first in vivo evidence that this adaptive response in neutrophils may be impaired in the presence of infection.
Findings from the hierarchical clustering and the pathway analysis provided further evidence to suggest that infection inhibited host immunity, in particular the NF- There are several strengths of our study. First, two independent cohorts were used, with one for developing the prediction model (training set) and the other for testing the prediction model (validation set). This study design allowed a stepwise, systematic progression in the assessment of the index test (gene-expression profiling) on a continuum of diagnostic uncertainty: from the training set, where the index test was developed in an ideal situation, to the validation set, where its performance was tested in more realistic clinical situations (32). Such a study design optimized the validity and generalizability of our findings. Second, we used enriched neutrophils instead of whole blood to optimize the sensitivity and precision of our gene-expression profiling. Whole blood contains a mixed population of leukocytes, the proportion of which varies depending on the stage of sepsis and between individuals. Using purified leukocytes allowed us to minimize tissue heterogeneity and improve the signal-to-noise ratio of our experiments. Third, we enriched neutrophils in preference to other leukocytes because marked neutrophilia are part of an early host response to sepsis. Changes in neutrophils therefore reflect changes that occur early in sepsis and consequently yield valuable diagnostic information. There are limitations of this study. First, the findings were based on a single institution, and the sample size of our study is relatively small. As a result, the generalizability of our findings is limited. Further studies, preferably with larger sample sizes, are needed to assess the diagnostic performance of the predictive genes in other populations of critically ill patients. Second, there is no gold standard for sepsis diagnosis. Misclassification bias is therefore inherent in any diagnostic study. However, we undertook follow-up of all patients to minimize misclassification error. In addition, we performed a sensitivity analysis that showed that, after adjustment for misclassification error, the accuracy of gene-expression diagnosis remained high. Third, the number of subjects without sepsis in our study was small (n = 23). However, our sample size calculation showed that the study was adequately powered to identify predictive genes. For example, there were 13 patients without sepsis in the training set. This provided 95% power of detecting differentially expressed genes. Fourth, we did not use healthy subjects as controls in our study because we were interested mainly in the gene expression differences between patients with and without sepsis, not between patients with sepsis and healthy subjects. Including healthy subjects as controls would have helped us quantify the extent of gene up-regulation and down-regulation with greater precision. Fifth, it would have been interesting to investigate the relationship between the timing of blood sampling and the onset of sepsis. We did not have such data because our study was neither designed nor statistically powered to investigate such a relationship. Future studies are required to pursue this further. Last, our cohort comprised of infection of different types and in different sites, giving rise to significant clinical heterogeneity. However, such heterogeneity is inherent in most sepsis studies. Furthermore, the presence of such heterogeneity in our validation set was to our advantage because we wanted to assess the applicability of the signature genes in a heterogeneous population. The implications of this study are twofold. First, sepsis research has focused on a few molecules as targets for therapeutic intervention or as biomarkers for early diagnosis. This study demonstrated that a genomewide system biology approach may yield greater discovery than the traditional "one molecule, one diagnosis/intervention" method. Second, the smaller number of genes in our signature set makes them amenable to simple, conventional assays, such as quantitative reverse transcriptase–polymerase chain reaction (33). With the increasing availability of polymerase chain reaction in routine laboratory diagnosis, this has opened up the possibility of applying genomic testing using our signature genes in routine clinical settings. The findings in this study are limited to neutrophils. It is uncertain whether similar changes occur in other leukocyte subtypes (e.g., lymphocytes and macrophages). Because in sepsis these cells play different roles than neutrophils, their expression profiles are likely to be different. Future studies on these cell types are needed if we are to gain a more complete understanding of the host response in sepsis. In summary, we have found that gene-expression profiling has identified important functional pathways in neutrophils that contribute to the clinical syndrome of sepsis. Genomic technology has therefore provided a novel approach to gain important biological insights into the cellular response of sepsis in critically ill patients.
The authors thank Chris Nguyen for his help in providing hardware and database support for this analysis.
Supported by Nepean Critical Care Research Fund, National Health and Medical Research Council of Australia and Australian Research Council. This article has an online supplement, which is accessible from this issue's table of contents at www.atsjournals.org Originally Published in Press as DOI: 10.1164/rccm.200612-1819OC on June 15, 2007 Conflict of Interest Statement: None of the authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript. Received in original form December 15, 2006; accepted in final form June 14, 2007
Related articles in AJRCCM:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||