Introduction

In late 2019, the world started to face a new pandemic disease, the Coronavirus Disease of 2019 (COVID-19) caused by the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV 2). Interstitial pneumonia is the most important clinical manifestation of COVID-19, leading to severe acute respiratory failure and high mortality rates [1, 2].

The rapid and devastating onset of this disease required a re-organization of hospitals and healthcare facilities, with ordinary wards managing less severe COVID-19 patients and with a progressive increase in the number of Intensive Care Unit (ICU) beds. However, given the unexpectedly high rates of patients requiring non-invasive ventilation, also Internal Medicine wards were re-organized for the management of higher intensity patients.

Several studies reported typical patterns of laboratory tests in patients with COVID-19 and some independent predictors of disease severity and mortality, such as C-reactive protein (CRP), ferritin, and D-dimer, were identified [1, 3,4,5]. Likewise, individual patient characteristics such as advanced age and male sex, and comorbidities such as obesity, hypertension, diabetes, and coronary heart disease were found to be associated with mortality in COVID-19 patients [6,7,8,9,10].

The possibility to apply a clinical prediction model which includes factors associated with patient prognosis could allow clinicians to stratify COVID-19 patients and to rapidly identify the optimal management strategy. Two models including clinical, laboratory, and imaging variables were proposed by two Chinese groups of researchers, but none has been successfully implemented in clinical practice, as of yet [11, 12]. We aimed to derive and validate a new simple score using three different statistical approaches to be used for the early stratification of COVID-19 patients.

Methods

A retrospective, observational, multicentre registry of patients hospitalized for COVID-19 in Italian hospitals was designed and promoted by the Italian Society of Internal Medicine (SIMI). Five centers participating in this registry contributed to the present study, two in Milan, one in Varese, one in Verona, one in Modena, Italy. As the registry aimed to record standard local practices, no specific treatments, tests, or procedures were mandated by the study protocol. All participating centers received approval from the local Ethics Committees.

Each participating center enrolled patients with a diagnosis of COVID-19 aged 18 years or older admitted to the Emergency Department or to a Medical Ward directly from the Emergency Department between February 17th and May 8th 2020. All patients were followed up for the duration of hospitalization. All data were collected using electronic medical records and gathered in an anonymized case report form (CRF). The completeness and accuracy of data collected from the patient medical records were checked by the registry-coordinating center.

Nasopharyngeal and oropharyngeal swab was collected on the day of admission or in the morning of the day after. Specimen analysis was carried out with reverse-transcriptase polymerase chain reaction (RT-PCR) method.

For the purpose of this study, information on demographic variables (age and sex), delay from symptoms onset to hospitalization, and medical history (hypertension, diabetes, chronic obstructive pulmonary disease, and coronary heart disease) was collected. Data on the following laboratory findings were included: white blood cells, lymphocytes, neutrophils, alanine transaminase (ALT), aspartate aminotransferase (AST), serum creatinine, D-dimer, and CRP levels were included. These data were acquired by physicians and were the results of an examination on the first day after admission.

Information on patient outcomes was collected until discharge. Severe outcome was defined as the composite of need for non-invasive ventilation, need for orotracheal intubation, or death, whichever came first. All other patients were classified as having a non-severe outcome.

Statistical methods

Categorical variables are expressed as frequencies and percentage; continuous variables as mean and standard deviation or as medians and interquartile range, as appropriate.

Due to the high correlation between white blood cells, lymphocytes and neutrophils, we considered as potential predictors only white blood cells and neutrophils to lymphocytes ratio.

We used multiple imputation to deal with missing data. The missing values of all covariates were imputed by assuming that data were missing at random with 20 imputations; discriminant function and predictive mean matching were applied to impute binary responses and continuous variables, respectively.

To build the score, we considered as development cohort the patients belonging to the centers of Varese and Milan and as validation cohort the patients from Verona and Modena (geographical validation).

For the continuous predictors, the relationship with outcome was studied. We founded that linear relationship was a good approximation for white blood cells, ALT and creatinine; for the others, we utilized restricted cubic spline to assess optimal cut-offs.

Selection of predictors was made using three different techniques: (a) multivariate logistic regression with backward selection, (b) penalized logistic regression (Least Absolute Shrinkage and Selection Operator, LASSO, method), and (c) Random Forest (variable selection based on accuracy, mean minimal depth and times a root parameter). All strategies started with the full model.

Odds Ratio together with 95% confidence interval for derivation, validation and complete datasets were computed. The Akaike information criterion, Schwarz criterion and area under the curve (AUC) were evaluated for each model and the best model was chosen by comparing these criteria.

Score assignment for each predictor variable was found on its associated regression coefficient.

In addition, for each risk scores, receiver operating characteristics (ROC) curves were displayed and sensitivity, specificity, positive and negative predictive values calculated.

Model's calibration was assessed by Hosmer–Lemeshow C-test.

Analyses were conducted using SAS (Version 9.4, SAS Insitute, Cary, NC) and R (R Core Team, 2015).

Results

A total of 610 patients were included in the analysis, 313 (51.3%) had a severe outcome and 297 (48.7%) had a non-severe outcome. Of patients with severe outcome, 145 required non-invasive ventilation (and 45 of them subsequently died), 39 required intubation (and 7 of them died), and a total of 181 patients died. The subset for the derivation analysis included 335 patients, of whom 173 (51.6%) had a severe outcome; the subset for the validation analysis included 275 patients, 140 with a severe outcome (50.9%). Baseline characteristics, comorbidities, laboratory results for each group are reported in Table 1. Briefly, in both derivation and validation cohorts patients with severe outcomes were older than patients with non-severe outcomes and the prevalence of male sex, comorbidities, as well as the levels of most laboratory values, with the exception of ALT and albumin, tended to be higher. The validation cohort was older (mean age 72 vs. 65 years), had more hypertensive patients (59% vs. 43%), and higher mean values of PCR (13.1 vs. 9.8 mg/L), creatinine (1.3 vs. 1.1 mg/dL) and D-dimer (1115 vs. 917) than the derivation cohort. However, the Hosmer–Lemeshow test showed a good calibration, which suggests that these differences did not impact the score.

Table 1 Baseline characteristics of the study participants—multiple imputation

Selection of predictors

Using Backward selection analysis, the following variables predicted severe outcome: CRP levels greater than 2.0 mg/dL, AST greater than 35 U/L, and D-dimer equal to or greater than 917 μg/L (Table 2). Overall, the area under the curve was 0.72; 0.78 in the derivation set and 0.66 in the validation set.

Table 2 Backward selection—multivariate logistic regression

The LASSO selection identified 6 variables that predicted severe outcome: age older than 50 years, history of coronary heart disease, CRP levels greater than 2.0 mg/dL, AST greater than 35 U/L, D-dimer equal to or greater than 917 μg/L, and neutrophil/lymphocyte ratio greater than 3.3 (Table 3). Overall, the area under the curve was 0.76; 0.79 in the derivation set and 0.80 in the validation set (Fig. 1).

Table 3 LASSO selection—multivariate logistic regression
Fig. 1
figure 1

Area under the curve of the SIMI score (Lasso selection)

Finally, the Random Forest selection identified the following three variables: CRP greater than 2.0 mg/dL, AST greater than 35 U/L, and N/L ratio greater than 3.3 (Table 4). Overall, the area under the curve was 0.72; 0.75 in the derivation cohort and 0.70 in the validation cohort.

Table 4 Random Forest selection—multivariate logistic regression

Selection of the predictive scores

Based on the results of the multivariate analyses, a score was built after each method was used. After backward selection analysis, we obtained a score that included three variables and a total of 11 points (Table 6 available in the Appendix). Using a cut-off of six in the validation set, the score resulted in a sensitivity of 0.88, a specificity of 0.34, a positive predictive value of 0.58 and a negative predictive value of 0.73.

After the LASSO selection, we obtained a score with six variables and a total of 13 points (Table 5 and Table 6 in the Appendix). Using a cut-off of seven in the validation set, the score resulted in a sensitivity of 0.93, a specificity of 0.34, a positive predictive value of 0.59 and a negative predictive value of 0.82.

Table 5 SIMI score

With Random Forest selection, a score with three variables and a total of nine points was obtained (Table 6 in the Appendix). Using a cut-off of five in the validation set the score resulted in a sensitivity of 0.86, a specificity of 0.38, a positive predictive value of 0.59 and a negative predictive value of 0.73.

Discussion

Using the database of an Italian registry, we aimed to derive and validate a simple score to stratify the risk of adverse outcomes in patients hospitalized for COVID-19. After comparing three different statistical approaches, the score obtained using the LASSO selection resulted as the best performing score, with an area under the curve of 0.80 in the validation cohort, a sensitivity of 0.93 and a negative predictive value of 0.82. The accuracy of the score in the validation set was nearly identical to that observed in the derivation set. With six variables, four of which obtained after laboratory testing, this score named SIMI score after the name of the society that promoted the registry (Società Italiana di Medicina Interna), may accurately identify lower risk patients who can be conservatively managed in lower intensity settings. In our cohort, these patients survived after hospitalization without requiring Continuous Positive Airway Pressure or orotracheal intubation.

COVID-19 has become an important challenge for health organizations in particular because of the need for intensive or sub-intensive care management for a relevant number of patients. Due to the lack of a sufficient number of beds in the Intensive Care Units, several Internal Medicine wards dedicated to COVID-19 patients were partially re-organized to manage patients requiring ventilatory support with non-invasive ventilatory support. During the peaks of the pandemic, it becomes crucial to preserve these beds for higher risk patients and to find alternative paths for lower risk patients, including early discharge or admission to dedicated post-acute facilities. In our study, these lower risk patients represented slightly less than 50% of the population. For the remaining, non-low-risk patients, a correct stratification may allow to more timely start adequate management strategies such as ventilatory support and pharmacologic treatment with the aim to reduce the need for Intensive Care Unit beds.

Two studies carried out in China have proposed risk assessment models for COVID-19 patients. A study from Wuhan included 377 patients and found age, neutrophils to lymphocytes ratio, CRP, and D-dimer as predictors of the severity of COVID-19, defined as severe pneumonia and non-severe pneumonia [11]. Severe pneumonia was defined by the presence of respiratory rate of greater than 30 breaths/min, severe respiratory distress, or SpO2 of less than 90% on room air. Patients with acute respiratory distress syndrome (ARDS), sepsis, or septic shock were also included in the definition. The proposed model resulted in a negative predictive value of 0.93, a positive predictive value of 0.41, a specificity of 0.70 and a sensitivity of 0.89. In a larger study from China including 1590 patients for the derivation set and 710 patients for the validation set, 10 variables were identified as independent predictive factors for adverse outcome defined as admission to the Intensive Care Unit, need for invasive ventilation, or death [12]. These variables included chest radiographic abnormalities, age, hemoptysis, dyspnea, unconsciousness, number of comorbidities, cancer history, neutrophils to lymphocytes ratio, lactate dehydrogenase, and direct bilirubin. The Area Under the Curve in the validation cohort was 0.88. The SIMI score has the advantage of a clear definition of the outcome, as in the latter score from China, and of the use of fewer variables, which potentially makes it more suited for daily clinical practice in the Emergency Room. A clinical prediction model with 8 variables was recently proposed by an Italian group [13]. Despite promising results, the model was tested on a small sample of patients and requires further validation.

This study has some limitations that need to be acknowledged. First, the sample size for the derivation set and the validation set was small and the results need to be confirmed in larger cohorts. Second, the study was conducted in a single country, and subsequent validation in different geographic areas with different health organizations would be required. Third, given the observational nature of the study, we may have missed other variables that are not routinely tested in these patients on admission and that may also result to be predictive of adverse outcomes. Finally, the clinical impact of this score also needs to be assessed in management studies which randomize patients or centers to the use of the score or to gestalt.

In conclusion, we here propose a simple score based on six variables to assist clinicians in the stratification of the risk of patients admitted for COVID-19. The SIMI score has a good accuracy, and, in particular, a good sensitivity to identify patients at low risk for adverse outcome during hospitalization. Future studies should assess the clinical impact of this score.