How Do I Diagnose Ruptured Membranes? Bayesian Statistics at its Best

The diagnosis of ruptured membranes in pregnancy is clinically very important. Decisions about delivering a pregnancy, hospitalization, and even termination of pregnancies, often depend on being correct about this diagnosis. Understanding how to diagnose ruptured membranes is fundamental to the basic practice of obstetrics; but understanding the clinical reasoning and statistics that underlie diagnosing ruptured membranes is fundamental to all of clinical medicine, and may be the most important skill in clinical medicine!

Understanding the backbone of the clinical reasoning involved in diagnosing ruptured membranes can simply and efficiently demonstrate the link between Bayesian inference and its implications for clinical medicine.

Numerous cognitive errors and distortions lead to mistakes in diagnosis and management of patients. But there is no bigger problem than the non-intuitive nature of statistics and a poor understanding of Bayesian probability, and the subsequent misapplication of clinical data.

First, let’s look at tests that are commonly available for the diagnosis of ruptured membranes:

History. Yes, history is a type of test; and, as we will see, it’s necessary to use history to determine the pretest probability for our other clinical exams and tests.
Nitrazine or pH test. The pH of amniotic fluid is more basic than vaginal fluids, so the nitrazine test is a quick screen of the vaginal pH. Amniotic fluid will turn the test positive, but so might blood, semen, urine, and some vaginal infections.
Ferning test. This is a microscopic test of amniotic fluid that has been allowed to dry on a slide. The picture at the top of the post is an example of the arborization or ferning of NaCl crystals in amniotic fluid. False positives are commonly due to fingerprints on the slide.
Vaginal pooling. Placing a speculum in the vagina (or sometimes just seeing fluid overflow out of the vagina) is fairly specific for ruptured membranes, but it might be due to other things, like vaginal infections, urine, or vaginal transudate.

To use these tests, we first need to know how good they are. We can understand this by knowing the Sensitivity and the Specificity (in fact, you shouldn’t use any test or exam without having at least a general knowledge of these test parameters).

Sensitivity (or the True Positive Rate) is the percent of patients who have the disease who will test positive.
Specificity (or the True Negative Rate) is the percent of patients who don’t have the disease who test will negative.

Here are some numbers for these particular tests:

	Sensitivity	Specificity
Nitrazine	93	83
Ferning (in labor) Ferning (not in labor)	98.0 51.4	88.2 70.8
Pooling	Low	High

Knowing this, let’s ask a simple question: If a woman presents with questionable leakage of fluid and you perform a ferning test that is positive, what is the chance that she actually has ruptured membranes? Unfortunately, most clinicians are tempted to answer either 88.2% or 70.8% depending on whether she is in labor or not, since this is the “specificity.” In other words, most clinicians, in their thinking, understand the specificity of a test as being roughly equivalent to the test’s accuracy or predictive value. But nothing could be farther from the truth. Recall that specificity is actually the true negative rate.

The actual answer to the question is: We need more information. Neither sensitivity nor specificity reflect the prevalence of the disease. The question we really want an answer to is, What is the post-test likelihood that the patient’s membranes are ruptured? This is also known as the positive predictive value (PPV). The corollary of this is the negative predictive value (NPV). We don’t know enough yet to determine this.

In simplest terms, the PPV = number of true positives/(number of true positives + number of false positives).

So what’s a true positive (TP)? It’s the sensitivity (the true positive rate) x the prevalence (the proportion of the population having the disease). And what’s a false positive (FP)? It’s the same thing, but based upon the specificity or the true negative rate. A false positive = (1-spec) x (1-prev).

Therefore, the PPV = TP/(TP+FP) = (sens x prev)/(sens x prev) + (1-spec) x (1-prev).

What we are missing from this formula is the prevalence of the disease. How do we know the prevalence?

Well, prevalence is easy if we are talking about a well-defined population for a screening test like a mammogram; for example, the prevalence of breast cancer in asymptomatic white women in their 40s. We can look that prevalence up in a study if we are screening an asymptomatic white woman in her 40s.

But usually in clinical medicine, we are not talking about a well-defined population. We are instead considering a patient who presents with symptoms. The prevalence of ruptured membranes depends on the patient’s history! It would be silly to use the prevalence of ruptured membranes in pregnant women as a general population. So what we need to do is determine the pretest probability of ruptured membranes in the individual patient before us, based on history and other clinical information, including risk factors.

Imagine that Patient #1 presents at 22-weeks gestation, when she noted a small 3 cm wet spot in her underwear after getting home from exercising. There has been no subsequent leakage. She has no contractions, bleeding, or cramping.

Patient #2 presents at 41-weeks gestation and reports a large gush of fluid while shopping for pickles, with fluid running down into her shoes. She reports continued leakage and she is now contracting every 3 minutes, painfully. She also reports that some of the fluid she has seen is green in color, looking very much like meconium. She is 8 cm dilated.

Is there a difference in the pretest probability of ruptured membranes in these two patients? Absolutely. This is the whole point of clinical medicine. It is the purpose of the history (and, in many cases, the physical). We don’t know the exact pretest probability of ruptured membranes in these two patients; but, we know that the probability of ruptured membranes in Patient #1 is very low, say 5% or less, and we know that the probability of ruptured membranes in the second patient is very high, at least 95%.

Now we can plug this information into the formula and determine the positive (and negative) predictive values of one of our tests applied to each, individual patient. Let’s say that we do a nitrazine test on both patients – the same test, performed in the same way, by the same doctor, on the same day. Each test has the exact same performance characteristics, determined by its sensitivity and specificity. If the test comes back positive in both cases, what is the positive predictive value in each case? (There are calculators to do this for you, so don’t worry about the formula)

Patient #1: PPV = 22.4%
Patient #2: PPV = 99.05%

The false positive rates are simply calculated as = 100%-PPV, or 77.6% and 0.95% respectively.

What if the test were to come back negative in each case? What would be the negative predictive value (NPV)?

Patient #1: NPV = 99.56%
Patient #2: NPV = 38.4%

The false negative rates are = 100%-NPV, or 0.44% and 61.2% respectively.

All of this makes sense intuitively. In fact, Bayesian inference is how we naturally interpret the world. In the case of Patient #1, if the test had come back positive, we would assume it was likely a false positive and not trust the result. And in the case of Patient #2, if the test had come back negative, we would assume that it was a false negative. The statistics bear this out, showing that our assumptions are statistically, or probabilistically, correct.

It should be clear that you cannot interpret tests – any test – without knowing the prevalence of the disease or the pretest probability of the disease based on clinical information, including risk factors. In fact, this is the whole reason we learn about risk factors of particular conditions. For screening tests, we use the prevalence of the disease in the population being screened. For symptomatic patients, we use our clinical knowledge and the likelihood of things on our differential diagnosis, to inform the pretest probability. This, in turn, combined with the performance characteristics of the test (the sensitivity and specificity), determines the positive and negative predictive values of the test, which is the knowledge we need to actually utilize the test.

In the diagnostic process, many tests are often used; as we learn new information we change our understanding of the probability of the disease in which we are interested. It’s a continuous, dynamic process that is changing with each new fact we learn or each new test result.

Another important point to observe is that patients can be harmed by false negatives and false positive tests. Patient #1 might be harmed by a false positive exam and Patient #2 might be harmed by a false negative exam. In reality, intuitively, you would not trust either of those false results and would treat the patient without respect to the test result. So this begs the question, Why order the test in the first place? The answer is, you should not. Tests are meant for patients with indeterminate probabilities, not for patients with extremely low or extremely high probabilities. If you are going to treat the patient regardless of the test outcome, don’t order the test in the first place unless it is important to gain confirmation.

Tests are not positive or negative. They are only useful for statistical modeling and at some point we have to gain the confidence, statistically, to either treat for a condition or not treat. But we are never certain.

We will talk another day about Bayes’ theorem and the ideas of Bayesian inference. Understanding these ideas well will forever change your view of knowledge and the world around us. But for now, one more clinical example to illustrate the principles we have learned.

Cell-free DNA (cfDNA) is a revolutionary new test used to detect fetal aneuploidies, like Down Syndrome. It directly tests fetal DNA in the maternal blood, and has remarkable performance characteristics. Sensitivities and specificities vary by the study, but, for Down Syndrome, the sensitivity is about 99.5% and the specificity is about 99%. This is not only an excellent test, but these performance characteristics are extraordinary. Almost no test in any field of medicine functions this well. So how does this extraordinary test actually perform? Let’s again imagine two patients. The first is 20 years-old and the second is 40 years-old. We can look up the risk of Down Syndrome for those ages on a table (they are about 1/1000 and 1/70 respectively).

Now each patient is sitting in front of you with a positive test result for Down Syndrome; again, the same test, performed on the same day, analyzed by the same lab. What do you tell them? Remember, this is the best test in medicine. Each patient is considering terminating the pregnancy in the event of Down Syndrome. So what’s the PPV? Here are the results:

20 year old patient: PPV=9%
40 year old patient: PPV=59%

This result is very much nonintuitive. Most obstetricians are shocked by these results. Most obstetricians would recommend termination for the 40 year old patient without further testing (like CVS or amniocentesis) and many, in fact, would recommend termination for the 20 year old. If this is one of the best tests in medicine, imagine the issues with other tests that perform much poorer, like CT scans, CBCs, TSHs, X-rays, and even biopsies. We will talk about some of these in later posts.

If you would like to play around with these ideas for other tests, check out getthediagnosis.org.