Backups, Parquet, and the Margarine Tub
What I learned about disaster recovery from a 1970s commercial
This was not the next essay I planned to write, but it practically wrote itself in the car on my drive home. I knew I had to write it and publish it now.
As I’m trying to wind things down at work, I’ve been sharing the last bits of context with the folks taking over my projects. One of the things I mentioned was that before running a new CI/CD pipeline against production, I wanted to take a backup. Not because I expected anything to go wrong, but because I’ve been doing this long enough to know that “just in case” is not paranoia. It’s experience.
For context, I’ve been building out a dev and production Azure Data Explorer cluster and the databases that go with it. Azure Data Explorer is a fast, distributed analytics engine designed for large volumes of log and telemetry data. It’s fantastic for querying at scale, but it does not behave like a traditional relational database. Which brings me to the problem.
Azure Data Explorer does not have built-in backups the way SQL Server does. No right-click, no “backup database,” no warm fuzzy feeling. The simplest way to create a backup is to export the data to Parquet.
Parquet, for anyone who hasn’t run into it, is a columnar storage format that’s efficient, compressed, and great for analytics. It’s also pronounced par-kay, which immediately made me think of the old Parkay margarine commercial from the 1970s. The one where the tub talks. Here’s the commercial if you’ve never seen it.
Anyway. I asked the folks who will own this going forward whether they wanted me to build out a continuous Parquet export for BCDR. BCDR is business continuity and disaster recovery, which is the grown-up way of saying “when something breaks, how fast can we get back online.”
The response was, “I don’t know either of those. I’ll need to research.”
Fair enough. Not everyone lives in the land of backups and disaster scenarios. But this is one of those little old lady things I think is important.
Part of what I built was the ability to create the entire database with one button. That’s partly the CI/CD pipeline, but it’s also because I checked all the reference data into the repo. The pipeline pushes it to storage, and Azure Data Explorer auto-ingests it. All the data where we are the source of truth is already in blob storage, and I have a re-ingestion script ready to go. The rest of the data comes from daily snapshots of other systems.
So if we were to lose anything, it would be those daily snapshots. That’s where the Parquet backup closes the gap. Done in the right order, we could be live again in an hour.
The backup I took before the pipeline run was a one-time snapshot. What I proposed to the team is different: a continuous Parquet export, so that gap closes automatically going forward instead of depending on someone remembering to take a manual backup before every change.
Yes, these are things I think about at the beginning of a project. I would love to have all of this in place before any data hits the database, but at this point we’re close.
You may think, why is this important. And I get it. It’s extra work. It’s not glamorous. It doesn’t demo well. But adding it later is a major pain. I know you’re trying to get something out fast. You’re just playing around. But what I’ve learned, over and over, is that three seconds after I show a prototype to someone, I have ten people asking for access and immediately taking a dependency.
It does not matter how many times I say, this isn’t reliable, you can’t trust this, I’m still playing around. They are all over it. And then when something goes wrong... sigh.
Many years ago I built a prototype for someone. I showed it to them. They loved it. They said, “OK, ship it.” I said, “No, this is bubble gum and bailing wire.” They didn’t care. I have learned this lesson more often than I care to admit because I love playing with tech. But I try.
So yes, I think about backups. I think about BCDR. I think about Parquet. I think about the Parkay margarine tub insisting on its own name. And I think about the future engineer who will thank past me for making sure they can recover the system in an hour instead of a week.
That’s what coding like a little old lady looks like.
Alison + The Inner Chamber


