Article Content
Abstract
ROC curve analysis is often applied to measure the diagnostic accuracy of a biomarker. The analysis results in two gains: diagnostic accuracy of the biomarker and the optimal cut-point value. There are many methods proposed in the literature to obtain the optimal cut-point value. In this study, a new approach, alternative to these methods, is proposed. The proposed approach is based on the value of the area under the ROC curve. This method defines the optimal cut-point value as the value whose sensitivity and specificity are the closest to the value of the area under the ROC curve and the absolute value of the difference between the sensitivity and specificity values is minimum. This approach is very practical. In this study, the results of the proposed method are compared with those of the standard approaches, by using simulated data with different distribution and homogeneity conditions as well as a real data. According to the simulation results, the use of the proposed method is advised for finding the true cut-point.
1. Introduction
The ROC curve is a mapping of the sensitivity versus 1 − specificity for all possible values of the cut-point between cases and controls. To measure the diagnostic ability of a biomarker, it is common to use summary measures such as the area under the ROC curve (AUC) and/or the partial area under the ROC curve (pAUC) [1]. A biomarker with AUC = 1 discriminates individuals perfectly as diseased or healthy. Meanwhile, an AUC = 0.5 means that there is no apparent distributional difference between the biomarker values of the two groups [2].
ROC analysis provides two main outcomes: the diagnostic accuracy of the test and the optimal cut-point value for the test. Cut-points dichotomize the test values, so this provides the diagnosis (diseased or not). The identification of the cut-point value requires a simultaneous assessment of sensitivity and specificity [3]. A cut-point will be referred to as optimal when the point classifies most of the individuals correctly [4, 5].
AUC, sensitivity, and specificity values are useful for the evaluation of a marker; however they do not specify “optimal” cut-points directly. In the literature, related to the subject, there are many approaches using both sensitivity and specificity for cut-point selection [4–9]. One of the commonly used method is the Youden index (J) method [5]. This method defines the optimal cut-point as the point maximizing the Youden function which is the difference between true positive rate and false positive rate over all possible cut-point values [6, 7]. Another approach is known as the point closest-to-(0,1) corner in the ROC plane (ER) which defines the optimal cut-point as the point minimizing the Euclidean distance between the ROC curve and the (0,1) point [4]. A third approach is based on the maximum achievable value of the chi-square statistic (minP) which is driven using the cross-tabulations of true disease status and categorized new variables that separate the biomarker into two categories according to all possible cut-point values [8]. A more recent approach was proposed by Liu [9], which defines the optimal cut-point as the point maximizing the product of sensitivity and specificity (CZ). In the literature, there are studies comparing optimal metrics derived from the sensitivity, specificity, agreement, and distance [10, 11]. In these studies, it is generally recommended that researchers should select one that is most clinically relevant.
In this study, a new approach is proposed for the identification of the optimal cut-point value in ROC analysis. The approach is based on the area under the ROC curve (AUC), sensitivity, and specificity values. It defines the optimal cut-point value as the point minimizing the summation of absolute values of the differences between AUC and sensitivity and AUC and specificity provided that the difference between sensitivity and specificity is minimum.
In the following section, first the background methodologies of previous methods are summarized, and, then, the proposed method is introduced. In Section 3, in order to compare the performance of the previous methods with that of the proposed one, generated data under the assumption of normal distribution and gamma distribution models for the biomarker are used. Then, in Section 4, using data from a real-world study of heart-failure patients [12], the cut-points for pulse pressure, plasma sodium, LVEF, and heart rate in prediction of mortality are calculated by applying the proposed and the previous methods. Finally, in Section 5, conclusions are given.