Uncontrolled variables are a huge problem with every scientific study. When we compare two blood pressure medications to one another, the results will have more validity if the two groups of patients are very similar in every characteristic. Differences in age, severity of preexisting disease, ethnicity, gender, etc. could make whatever conclusions are found next to worthless. Randomization and blinding are our best available tools to control for variables – both those we recognize and those we don’t – as well as for controlling for biases that we are aware of and those we haven’t even imagined.
Yet studies comparing one drug to another – fraught as they are with complexity and unknowns – are still incredibly simple when compared to studies that compare surgical techniques to one another. Not only are randomization and blinding major issues, but the individual skill of the operator (which can’t be controlled for) may turn out to be the biggest variable, and there are serious concerns about the general applicability of the findings of any surgical technique study.
Imagine we were comparing the quality of kitchen cabinets built with two different machines, say two different types of wood shapers. We select a variety of outcomes to follow. Some are subjective, like beauty, sales price, and machine operator satisfaction (compare to pain scores, cosmesis ratings, or surgeon satisfaction), while other are objective, like time to manufacture, material wastage, and cost (compare to operating room time, complication rate/blood less, and cost of procedure). The tricky bit comes next. In order to produce a higher quality study, we would like to track several thousand cabinets of different sizes and styles, and we would like to do so in a short amount of time.
We could find two different factories that already use the two different methods and look at their outcomes, but that’s not very satisfying. There are many hundreds of other important differences in the two factories that might actually be responsible for the different outcomes, and adjusting for even the known variables is an almost impossible task, let alone the unknown variables. So we can either have the same factory use the same methods and then switch to another method, have different factories using a third method switch to new methods, or some other mishmash. What’s worse, individual machine operators within each factory may have vastly different levels of skill and expertise.
A very competent and seasoned machine operator may get excellent results out of both machines, while I (having never used either) may get bollocks from both. Or, someone very competent with one machine may struggle adapting to the other machine and vice versa. A new method or machine may have a steep learning curve, and learning curves vary widely. Someone naive to both machines may still have inferred skill from another similar process with which he has competency. A group of naive operators may produce better results overall with the machine with a shorter learning curve, even though in the long term (after the study has ended), the other machine is vastly superior. What’s more, the folks who design and implement the study may favor one machine over another (perhaps their company makes that machine) and when they teach operators how to use the two machines, they are just simply better at teaching their own machine (because they have more experience with it, more familiarity, truly believe in it, etc.).
Such are the problems with surgical technique/tool studies. It should suffice to say that an excellent surgeon is likely to produce better results with the worst technique on most days than a bad surgeon will produce with the very best techniques on the best days; and, in most cases, a surgeon will do much better with a technique with which she is familiar and accomplished than she will with a new technique with a steep learning curve. It turns out, people truly are the most important variable in such studies.
Randomization is difficult. Usually in such a study, different operators who use different techniques are being compared, so it is difficult to truly randomize the technique to the patient. Blinding is nearly impossible; obviously the surgeon knows what method she is using. These pitfalls automatically tend to land studies about surgical technique near the bottom of the quality evidence pile.
Even if these limitations can be overcome, the general applicability of a technique may be wanting. Just because I’m really, really good at something – and demonstrate it with super-awesome reports of my amazingness – doesn’t mean that everyone else can read a paper or watch a video or attend a weekend workshop and all of a sudden share in those amazing outcomes. A technique may be amazing, but if it is not generalizable to a large population of average-skilled surgeons (and assistants), it doesn’t mean a whole lot.
Take this study for example, which found that laparoscopic hysterectomy (LH) was associated with less pain, less need for pain medicine, and a shorter length of stay than vaginal hysterectomy (VH). This study was heavily promoted by the surgical equipment industry, since it purported to show a definitive advantage of LH over VH. The patients were randomized to receive one of the two surgical approaches and the surgeries were performed by the same team, who presented themselves as adept at both approaches. But were they? Are these finding generally applicable?
For starters, they did not use two techniques of VH that are known to decrease postoperative pain (an energy sealing device for sealing pedicles and intraoperative paracervical blockade). But aside from this, the most telling statistics presented in the paper are the average lengths of stay: 1 day for LH and just over 2 days for VH. This is simply an amazing statistic for VH length of stay in an era where same day discharge for VH is common (I personally have sent hundreds of VHs home within 5 hours of surgery). This bizarre finding tells me that the surgeons were simply more skilled with LH than VH.
What was purported as a strength of the study (the same surgeons performing both approaches) is actually a weakness when we realize that they are not equally adept at both techniques. The article reports no conflicts of interest, but a simple Google search reveals that the lead author (Ghezzi) has a financial relationship with Karl Storz GmbH & Co, KG, whose products he endorses in this and other articles. Hmm.
Not all great surgeries are generalizable. Here is Part 1 and Part 2 of an awesome straight-stick, laparoscopic extraperitoneal aortic lymph node dissection. This guy is fantastic and if I were a woman with cancer I would let him operate on me. But his skill level and seeming ease in doing a complex surgery are not necessarily teachable to average surgeons. Fun to watch, but I don’t expect the average surgeon to be doing this anytime soon. Sometimes the techniques that get published for certain surgeries or a certain series of patients reflect outcomes and complication rates not attainable by we mere mortals. That’s okay, but it reinforces the idea that sometimes the best surgery is the one we can all do well.
The point of all of this is that when it comes to surgery, the surgeon (in most cases) is by far the most important factor in outcome differences. This excellent piece discusses some research that drives home this point (and shows some cool videos). If you are interested in finding out who the “good” surgeons are, by the way, don’t waste too much time looking. With a few noted exceptions, such transparent data simply isn’t available. Because surgeons fight against such transparency, there is sometimes an idea that we are all interchangeable cogs of a machine; scientific studies need to assume this for standardization and employers and payers don’t always recognize the importance of high quality physicians.
But the skill of the surgeon is likely the single most important variable of any surgery. Are all board-certified OB/Gyns equal? Of course not. There is a wide variation in competencies and outcomes. Cesarean delivery rates range from around 10% to over 60% among obstetricians in similar communities who are supposedly all following the same evidence-based labor management guidelines. Vaginal hysterectomy rates vary from 0% to 95+% among board-certified gynecologists. Are we all equally competent? Hardly.
The surgical skill of surgeons, like most things in life, tends to fall along a bell-shaped curve. If we want to improve the quality of care provided to a wide variety of our patients, there is only so much that we can do in terms of increasing the surgical skill of surgeons. Residency programs are providing fewer opportunities than ever to develop the surgical skills of our future physicians (too much too learn, too little time to do it, and fewer patients who need complex surgeries). Some skills are being lost to history as the techniques and practice of them are going extinct with a retiring generation of physicians (e.g., breech delivery, Scanzoni maneuver) while other skills are threatened species that exist only in some zoos and preserves (e.g., vaginal hysterectomy, external cephalic version, forceps delivery).
To increase the quality and safety of surgery, we need to address improving the surgical skills and education of our residents and young physicians in practice. But we also need to focus on enabling technologies.
An enabling technology is an innovation or invention that can be used to enhance the ability of a user. The personal computer is an example of an enabling technology. I can do a lot more things today (make movies, edit photos, write this blog, do calculus, make music, search the world’s libraries, etc.) than those who lived a generation before me; and I can do them better, quicker, and cheaper.
Ted Anderson (Vanderbilt University) has described this concept extensively in the field of gynecology. He points out that global endometrial ablation devices (like the NovaSure) are an example of an enabling technology. Ted is an expert in rollerball endometrial ablation. Yet it is unrealistic that he will be able to train the vast majority of gynecologists to be as good as he is; apart from innate skill, there just aren’t enough cases available for learners to gain sufficient experience. The outcomes of rollerball ablations performed by his trainees are considerably subpar compared to outcomes of his own series of hundreds of rollerball ablations. But with endometrial ablation devices, his learners can achieve similar outcomes. What’s more, the safety of endometrial ablation devices, cost, length of surgery, etc., are all superior. So without having to make surgeons dramatically better, we can extend the safe, quality outcomes of endometrial ablation to millions of women, not just the few thousand who have access to high quality surgeons like Dr. Anderson.
The mid-urethral sling is another example of an enabling technology. Prior to transvaginal (TVT) and transobturator tapes (TOT), incontinence procedures required a considerably greater amount of surgical skills and were more morbid for patients. Only a small percentage of gynecologists were good at things like retropubic urethropexies (i.e., Burches and MMKs) or pubovaginal slings with harvested autologous materials. I enjoyed doing laparoscopic Burch procedures and always thought it was a fun surgery; but I would much rather teach a resident how to do a TOT. I can teach almost any resident to do a TOT, but a laparoscopic Burch to only a handful. A TOT is therefore an enabling procedure: it greatly expands the reach and safety of incontinence procedures.
Use of an energy-sealing device (like the Ligasure) for vaginal hysterectomy is my favorite enabling technology. The device greatly expands the number of gynecologists who can competently perform vaginal hysterectomy and it also greatly expands the number of women who are candidates for vaginal hysterectomy. Patient outcomes are uniformly better, and even though we have to spend money on the device, total cost goes down due to shorter surgeries, fewer complications, and shorter lengths of stay.
An enabling technology might also be a particular technique for a surgery. An enabling technology doesn’t necessarily provide the best outcome in the best hands, but it provides the best outcome in average hands. There are dozens of similar examples. Yet, many surgeons are opposed to using enabling technologies. Imagine if you went to work for a typesetter who refused to let you use your computer and laser printer because “real” typesetters should know how to make lead-type by hand or use a Linotype machine. It may make him feel superior to all the ‘hacks out there using Macs’ but he is providing more expensive work to fewer clients for higher cost. He will soon find himself without any clients (or students).
We should all embrace enabling technologies. The hallmarks of a great surgeon are not stubbornness and anachronism but rather flexibility and innovation.