Bioinformatics Still (Mostly) Runs on Old Plumbing

Some of the most valuable work happens in the tools underneath the tools

Jun 17, 2026

Pen-and-ink illustration in blue ink. Above ground, a busy city with a skyline, trees, and people walking and biking. Below ground, a dense network of water pipes, valves, and a subway tunnel. The pipes and fittings are labeled with widely used bioinformatics tools: samtools, GATK, Picard, htsjdk, Snakemake, Nextflow, IGV, GLIMPSE, libdeflate, and fgbio. The everyday work of the field runs on an unseen layer of shared infrastructure. — Bioinformatics runs on a hidden layer of shared infrastructure. samtools, GATK, Picard, htsjdk, Snakemake, Nextflow, and the libraries beneath them carry a huge share of day-to-day work. Most of it stays out of sight, like city plumbing.

Bioinformatics has a weakness for novelty. That is understandable enough in a field built around publications that prefer new methods, new assays, and new ways of extracting signal from data, but it leaves a blind spot. A lot of the work still runs through old libraries, old file-format code, old command-line utilities, and old assumptions about what counts as acceptable performance. Once those things become standard, people stop looking at them very hard. They learn the rough edges, budget around the delays, and treat the friction as part of the landscape.

I don’t think that is a great habit.

If a tool sits in the path of thousands of workflows, any inefficiency in that tool gets paid again and again. The same goes for awkward implementations, stale assumptions, and missing features that everyone has worked around for long enough that they no longer seem strange. Familiarity has a way of lowering standards. A lot of core tooling in bioinformatics gets treated as settled long after it has stopped being current.

One thing that makes this especially odd is that people do recognize the value of this work when it shows up in a product. Nobody is confused about why faster software is worth paying for. Products like DRAGEN and Sentieon make that point well enough. What gets less attention is the open-source tooling underneath a much larger share of the field’s day-to-day work. The value there is just as real, but it is spread across enough users and enough workflows that it often goes unclaimed.

I have been spending a fair amount of time on that layer of the field lately, partly because I think it is undervalued and partly because the leverage is often better than people realize. When you improve a method that gets used once in a niche workflow, you have improved a method. When you improve a library, file format implementation, or utility that other tools build on, you end up improving a much larger slice of the field all at once.

File formats and I/O still deserve real engineering effort

A recent example is the work that went into HTSJDK. I put a lot of effort into a recent release that adds CRAM 3.1 writing and makes CRAM and BAM read/write substantially faster. That work lands in one library, but the effect carries into Picard, GATK, IGV, fgbio, and a long tail of other tooling that depends on it. There is nothing glamorous about faster BAM and CRAM handling, but there is also no serious argument that it is unimportant when so much other software is standing on top of it.

One small part of that work was building jlibdeflate, Java bindings for libdeflate. Libdeflate has been around for close to a decade and is widely recognized as one of the fastest block-deflate implementations, so it was surprising to realize the Java ecosystem still lacked a clean way to use it. That is a pretty good example of the kind of gap I mean here: not some grand new method, just an obvious missing piece in heavily used tooling that, once filled in, improves a lot of downstream software.

There is a tendency to talk about work like this as though it were just maintenance. Sometimes it is maintenance. Sometimes it is finally correcting an obvious deficiency in widely used tooling that had gone unaddressed for years. I would put a lot of this in the second category.

The boring steps in a workflow still run on every dataset

The same pattern shows up lower down in the pipeline. I am getting ready to release chelea, a faster, more accurate tool for short-read adapter trimming. My co-founder, Nils, has released mako, which sorts BAM files substantially faster than samtools sort. Nobody is going to confuse adapter trimming or BAM sorting with the exciting part of genomics, but they are steps that show up constantly across real production workflows, where inefficiency compounds very quickly.

In one sense this is no different from why people value DRAGEN or Sentieon. Speed, better implementation, and fewer operational headaches are easy to appreciate when they show up in steps you run all the time. The difference is that open-source plumbing often has no obvious owner, even when the user base is broader and the cumulative waste is larger.

A lot of this work lingers for the same reason. The payoff is broad, but not concentrated. Faster compression, better BAM and CRAM handling, a faster adapter trimmer, a better BAM sorter, or a speedup in a heavily used imputation or QC tool can improve a lot of workflows at once. However, no single group usually feels enough of the pain to justify taking it on, so the work sits there until somebody gets annoyed enough to do it for the broader user base.

Some of the best work is improving tools other people already use

Not all of this shows up as a new release from us. Some of it is just contributing back to tools that are already broadly useful and already embedded in other people’s work.

I have open pull requests into GLIMPSE that improve single-sample low-pass imputation performance in glimpse2_phase by about 30 to 50 percent. I have also had pull requests merged into verifyBamID that make the compute-heavy optimization phase roughly 20 times faster and add support for non-human genomes. Nils has also been spending time on bwa-mem3, a fork of bwa-mem2 that is faster, includes a number of quality-of-life improvements, and supports non-Intel architectures. None of that fits neatly into the usual story people like to tell about innovation, but it is the sort of work that makes existing workflows better in ways users feel immediately.

I don’t think every old tool needs a rewrite, and I don’t think starting over is automatically admirable. In plenty of cases the better answer is to speed up the thing people are already using, fix the part that is wasteful, or add the capability that should have been there in the first place. There is a lot of value in meeting the ecosystem where it is instead of pretending value only appears when you create something brand new.

We underrate accumulated drag

Part of the reason this work gets less attention is that accumulated drag is hard to see. A new method is easy to point at. An old dependency that is 30 percent slower than it needs to be, or a file-format implementation that has not kept up, tends not to announce itself with the same clarity. People absorb the cost in small increments. They wait a little longer, provision a little more compute, put up with some awkwardness, and move on.

A lot of the value in this layer of the stack is real, but diffuse. Faster BAM and CRAM handling in HTSJDK, Java bindings for libdeflate, a better adapter trimmer, a faster BAM sorter, a speedup in GLIMPSE, or a much faster optimization loop in verifyBamID all make real workflows better. The problem is that the gain often lands a little bit everywhere, which makes it harder for any one group to justify doing the work. So a lot of it sits there until somebody gets annoyed enough to fix it for the broader user base.

AI changes that calculus some. A lot of this plumbing work used to be easy to defer because it was tedious, sprawling, and hard to justify against more visible priorities. Lowering the cost of implementation makes more of it tractable. That does not make the work glamorous, and it does not remove the need for careful engineering, but it does make it more practical to go fix things that many people depend on and no one had quite gotten around to fixing.

Some of the most valuable work happens in the tools underneath the tools

The field does not need less ambition. It does need a broader view of what ambitious work looks like.

Sometimes it is a new method. Sometimes it is a new model. Sometimes it is a much faster BAM/CRAM implementation, a missing binding that should have existed years ago, a better adapter trimmer, a faster sort, or a pull request that takes a painful optimization loop and makes it 20 times faster. The common thread is leverage. When a tool sits low enough in the stack and gets used widely enough, improving it pays off across a lot of other work.

That is why I think this layer deserves more attention than it gets. Not because it is fashionable, and not because every old tool is secretly broken, but because a lot of scientific software still depends on code that can be made materially better with focused effort.

That will never be the glamorous part of bioinformatics.

It is still some of the most useful work you can do.

Tim Fennell is a Founding Partner at Fulcrum Genomics, where he builds bioinformatics tools and pipelines for the genomics community. He is a creator of Picard and a co-author of the SAMtools paper. You can find him on LinkedIn or reach Fulcrum at contact@fulcrumgenomics.com

Fulcrum Genomics is a bioinformatics consulting firm built by scientists at the forefront of large-scale genomic research, with deep expertise in sequencing technology, pipeline engineering, and genomic data analysis for biotech, pharma, and academia. Engage us through project-based work, fractional R&D, or hourly consulting. Contact us to discuss your project.

Bioinformatics Still (Mostly) Runs on Old Plumbing

Some of the most valuable work happens in the tools underneath the tools

File formats and I/O still deserve real engineering effort

The boring steps in a workflow still run on every dataset

Some of the best work is improving tools other people already use

We underrate accumulated drag

Some of the most valuable work happens in the tools underneath the tools

Ready for more?