TL;DR: Firefox Test Pilot is becoming a statically-generated site from content in flat files. We're moving away from Django and PostgreSQL, and it's been a bit of a journey.
I've been working on Firefox Test Pilot for over a year, but I haven't written about it here before now. Mostly because I've been busy and lazy and busily shaving yaks.
But, there have been big things afoot lately, and I figured they were worth writing about - if only because they're invisible, behind-the-scenes things that nonetheless took a lot of work to accomplish.
Be prepared - but for what?
When we started building Test Pilot last summer, we based the server-side on Django & PostgreSQL. We had assumptions about the future:
We'd need to collect measurements from experiments.
We thought experiments would need some active server-side resources provided by the mothership.
We'd need to manage user profiles & preferences, so we required sign-in with a Firefox Account.
A year later, these assumptions didn't quite pan out:
Rather than reinvent the wheel by collecting & analyzing measurements ourselves, we took advantage of Google Analytics and the efforts of the Firefox Telemetry team.
We found it's best to stay out of the way of teams building Test Pilot experiments - let them manage their own services as necessary, rather than be tied to the delivery cadence of the core project.
The sign-in requirement turned away many potential users. But, we didn't need accounts to facilitate experiment participation anyway. Our metrics are anonymous and a Firefox add-on manages opt-in.
Accounts ended up being private data we had to keep secure, but only used for email notifications. We have better ways to manage email subscriptions across Mozilla - so one less wheel to reinvent!
Didn't need that server anyway
There was just one last reason to use Django & PostgreSQL on Test Pilot: A web-based content management system to update the site without heavyweight server deployments & database migrations.
But, wait a minute: If the other reasons for a server dropped away - why do we need complex deployments?
Furthermore, why maintain content in a database at all?
The whole Test Pilot team knows their way around text editors and GitHub - so let's make that our CMS. We can bake the whole site from flat files. Deployment is running a build script and uploading the result to a web server. We get revision control & collaboration along with the rest of the project. And as a security bonus, we stop shipping the tools to change the site along with the deployed site itself.
None of this is revolutionary. Aaron Swartz's "Bake, Don't Fry" is over 14 years old: Why fry up a new web page for every visit when you can pre-bake the whole site ahead of time? I used Bloxsom back in the day and Gulp bakes this blog now. Static site generators are numerous & popular - GitHub itself offers GitHub Pages powered by Jekyll.
It sounds obvious in retrospect, but it took awhile to realize our site could be stripped down to so little. We assumed we needed all those moving parts - or would need them someday. But, it appears that we can get away with being nearly serverless. And if someday a feature requires more, we can stand up some loosely-coupled microservices - or better yet, find that another team at Mozilla has already solved the problem.
The show must go on
But, having realized all of this, we couldn't just burn down the site and start over. Because we're working on a vehicle in motion, we've been doing this in increments over the summer:
We switched data sources for displaying the number of folks participating in experiments from our own Django API to a Telemetry-based resource.
Next, I implemented a feature flag in Django to substitute static JSON for content from the database. Thus, we can start managing content in YAML now, maintaining our current infrastructure until we work out a new stripped-down deployment process.
Soon, we'll be able to update the site by pushing to the appropriate branch on GitHub. We've got tasks to generate stub pages for all the front end app routes. We're also looking into enforcing a requirement to sign our commits and tags on the way to release.
After that, we plan to go even further with static site generation. Test Pilot is currently a single page app that pulls content from JSON. But, we can do better by pre-rendering those HTML pages in our build process ahead of time.
There's a funny thing about all of this: If we're successful, no one visiting the site should notice anything different. We're developing some new features & experiments - but all this work to rid our infrastructure of Django & PostgreSQL should ideally be a non-event for anyone visiting the site. This is the least glamorous sort of work one occasionally has to do on a software project - change everything, but don't break anything.
The real benefit will be that we're able to do a lot of things faster and more easily. For instance, there are now fewer places that need changes to display a new piece of information on a page. We don't have to monitor as many third-party dependencies - which we weren't doing very well to begin with.
Our development stack shrinks from Docker containers with Django & PostgreSQL & Node.js - down to just Node.js v6.2.0. The whole system has gotten simpler and more direct.
But, wait, there's more: Along with totally changing our server-side infrastructure, we've also rewritten the front end of the site to switch from Ampersand to React & Redux. It should make static site generation easier. It's also eased development on a handful of new features in the past week or so.
It's a big deal - and another thing that, in retrospect, seems more obvious now than it did a year ago. But, I'm going to save writing about that for my next post.