Receiver Operator Characteristics (ROC) Curve Analysis

In Biology things are rarely absolute. The molecules of life are seldom present or completely absent. Rather, those molecules can operate over a range of concentration, which is different from one individual to another. The concentration also varies within the same individual over time; after a meal rich in carbohydrates, the blood sugar concentration increases and that variation is still within normality. Thus, rather than a normal value, there is a range of normal concentrations and we say that a person has a “normal” value if it falls within that range.

If we plot the concentration of a given blood molecule, say sugar, from a group of people, we obtain the familiar bell shaped curve (Gaussian distribution) shown in Figure 1.

Normal distribution

Figure 1

This curve simply reflects the fact that the points tend to lump around a Mean or Average, at the top of the curve (in Figure 1 , the Mean is 2). The farther we go on the sides, the fewer the points, meaning that the probability of finding normal cases with very high or very low values is reduced.

A similar curve is obtained if instead of blood sugar we were considering blood RECAF, the cancer marker upon which we base our technology. Cancer markers are molecules that are produced in much higher amounts in cancer cells than in normal cells. However, there is a small amount produced by normal cells which is multiplied by the trillions of cells in an adult human and therefore there is a basal, low level of RECAF in the blood of normal people. This, by the way, also happens for other cancer markers, such as CEA for colon, CA125 for ovary, or PSA for prostate.

If we plotted the normal values of RECAF or any other of these cancer markers (the example also works for blood sugar), we would also find the bell-shaped curve shown above. Cancer patients, who have a higher blood RECAF value would also exhibit a bell-shaped curve1 but shifted towards higher values as shown in Figure 2.

Overlapping ROCS

Figure 2 (points are fictitious)

Are these two curves really different? As you can see, there is an overlap in the middle and if a cancer patient has say 2.9 units he or she could very well be considered as normal, but such a test could be still adequate if the overlap is not too big. How do we determine when it is too big? To answer the question, there is a statistical test called “independent t-test” that measures the difference between the two curves. It does that by assuming a Gaussian distribution of the points and it calculates the chance that the values will fall where they actually do just at random2. If that chance (expressed using the letter p for probability) is less than 5% (p<.05) then, in medicine, (but not in a parachute factory!) the difference is considered as significant.

Unfortunately, a t-test is limited to establishing whether or not two populations – in our case normal vs. cancer patients – are different; it does not provide much detail in terms of how good the discrimination is and it does not calculate where the best cutoff value – i.e. the point at which patients are considered as having cancer (this point is represented by the arrow in Figure 2). To do that, we use a procedure called ROC analysis.

The principle for a ROC analysis is simple: We take all the value (for both normal and cancer samples) and we look at the range they cover (0.4-5 in Figure 2). Then we divide that range into say 20 “thresholds” or properly called, cutoff values. For example, between 0.4 and 5.0 we could consider 20 cutoff values each 0.23 units higher than its predecessor. Next we choose the first cutoff value (in the example it would be 0.4 + 0.23 = 0.63) and we count the number of patients that we know have cancer and are above 0.63. Those are true positives (TP) for that cutoff value because they have cancer and they test positive. We also count the number of normal people below 0.63. Those are true negatives (TN) because we know they do not have cancer and they tested negative at that cutoff value. We might also have known cancer patients below 0.63 and those are false negatives (FN). Finally, we might have normals that test higher than 0.63 and those are false positives (FP). With those four parameters, we can then calculate the Sensitivity and the Specificity of the test for each cutoff value: The Sensitivity of the test is the number of cancers we catch (which is TP above) in all the cancer samples (detected or TP plus not detected or FN). Thus, Sensitivity = TP/(TP+FN). The Specificity of the test is the number of samples that test negative in the population of known normals and it can be expressed as Specificity = TN/(TN+FP).

Thus, we can easily calculate the Sensitivity and Specificity of the assay for each cutoff value. The data is usually represented in a table such as the one below:

Cutoff
Sensitivity
Specificity
TP
TN
FP
FN

2.5

100.0%

1.0%

162

1

102

0

3.1

100.0%

14.6%

162

15

88

0

3.6

100.0%

27.2%

162

28

75

0

3.8

100.0%

43.7%

162

45

58

0

4.5

97.5%

86.4%

158

89

14

4

5.0

95.7%

94.2%

155

97

6

7

5.5

88.3%

100.0%

143

103

0

19

Please note that as the cutoff value increases, so does the Specificity, at the expense of the Sensitivity, which moves in the opposite way. In actuality, the cutoff values are not selected at equal intervals but rather, using the actual points in the data set. This translates in a more accurate description of the cutoff value for a given set of data,  but the cutoff values for each data set slightly different. It is then simple to select the best cutoff value above which we shall say a person tested positive: All we have to do is to find an acceptable Sensitivity and Specificity combination and use the corresponding value in the cutoff column.

It does not escape the attention of the reader that the larger the number of samples, the more accurate the cutoff value and the sensitivity and specificity results.

Since the tables are usually very large and numbers difficult to interpret at a glance, there is a way to plot these results called ROC (Receiver Operator Characteristics). In a ROC, for each cutoff value, the Sensitivity is plotted vs. the Specificity only that for visual purposes, the 100% (or 1) Specificity value is placed on the left and 0% is placed on the right. We then get a curve such as the one below.

ROC curve

Figure 3

Please note that a complete lack of discrimination between cancer and non-cancer results in a diagonal line (represented in gray). On the other hand, if the assay discriminated perfectly well (100% Sensitivity with 100% Specificity), then the curve would go up parallel to the vertical axis until it reached 100% (or 1) and then horizontally until the 100% value in the horizontal axis is reached. This is represented in light blue.

The actual curve splits the plot area in two and the area under the curve (AUC) can then be as low as 0.5 or as high as 1.0 of the total plotting area. Thus, the AUC is a measure of the quality of the test.

Now, by looking at the curve one can tell how good the discrimination is and by looking at the x and y values at the point of inflexion (if any), one gets a rough idea of the Specificity and Sensitivity of the test. Figure 4 shows an actual ROC curve for prostate cancer using PSA in its different flavors and Figure 5, the equivalent for prostate cancer obtained with RECAF. The AUC for PSA = 0.68 and for RECAF = 0.93.

PSA ROC

Figure 4

As we mentioned above, any combination of Sensitivity and Specificity can be obtained simply by tracing lines from the axes to the curve. For example, if we trace a line crossing the y axis (Sensitivity) at 0.8 (80%) for PSA and then from the point that line hits the curve we trace a vertical line, it will cross the x axis at approximately 0.65. Since y = 1 - Specificity, then 1 - 0.65 = 0.35, or 65%, which is the Specificity for a Sensitivity of 80%. 

RECAF ROC

Figure 5

The curve is very different for the RECAF marker: It climbs upwards very close to the y axis and at approximately 85% Sensitivity, it goes almost horizontally to the right. Tracing the lines as described above, one gets 85% Sensitivity at 95-100% Specificity. 


1 In certain cases, the distribution might not be Gaussian and it might require a transformation for example to a logarithm prior to statistical analysis.

2 To visualize that, imagine someone blindfolded separating a group of test tubes with blood coming only from the normal individuals on the left side curve. A blindfolded man separates the samples in two groups: A and B. Then he sends the two sets to a lab for testing. One would expect the laboratory to conclude that the values in groups A and B are more or less the same. However, there is a small possibility that the blindfolded man placed, just by chance, most of the high values in group A and left most of the low values in group B. In such a case, the lab would report a significant difference between the groups, informing group A as cancer and group B as normal. With an independent t-test we can calculate the probability (p) of that happening; in other words, how many times the blindfolded person has to separate the samples at random and send them to a lab in order to get a significant difference. This explanation is very relevant because it now becomes obvious that the more samples our blindfolded subject has to sort, the lower the chances of stacking the results; they are more at random. The t-test does take the number of samples into consideration when it calculates that probability and therefore if it detects a significant difference, increasing the numbers only makes it even more significant. Here is where common sense comes into play: If a t-test shows that possibility to be 1:100,000 with 30 samples, it might increase to 1:1,000,000 but what does it matter in practical terms; 1:100,000 is very good already. In statistics, the words: “the probability that the difference is due to chance” are substituted by the letter p (for probability). In Biology and Medicine, values of p lower than 5% (p<0.05) are considered as significant. Since p represents the possibility of being “wrong” in the assumption that the two groups are different, then p=0.05 is the same as stating that the certainty of being right in that assumption is 0.95 (95%).