Fixing the broken build

Context: Most of our projects are written in Ruby and run on JRuby due to some legacy system integration requirements. A build script in Jenkins uses the warbler gem to create a .war package that gets deployed to a Tomcat server.

A few days before I got back from vacation another dev merged a handful of changes into a new release to be QA’d. Some of the changes were mine, and some were his, but they had all been through code review. Release 1.x.14 was already in production, and this was going to be 1.x.15. The changes weren’t very big and nothing was suspicious, just adding a Rake task and some new Vue components. The build script succeeded, but when the build was deployed the app failed to boot. The app hadn’t loaded completely, so the logs weren’t configured at this point. We were given an error message and a line number, but it just indicated a system exit raised by Bundler without giving a reason.

Up to this point I had never deployed a .war package to Tomcat (I didn’t even have Tomcat on my machine). Locally we use Puma, and we depend on scripts to deploy to servers outside of development. It took a bit of setup, but it wasn’t complicated to get Tomcat installed and running on my machine so I could test a similar deployment without relying on the QA server. Before long I was able to deploy the generated .war to my server, and it raised the same error as QA.

It turns out Bundler was catching an error that occurred when it couldn’t find a gem and logging details to STDERR, but they didn’t show up in the Tomcat logs. There is probably some configuration to make that work, but rather than try to dig into unfamiliar configs I commented out the rescue block so the error bubbled up to where I could see it. Bundler claimed it couldn’t find the gem http-parser, which is a dependency of the http gem that had been added in a previous release.

Initially I removed the gem since I didn’t think it was being used, and the server booted successfully. But a tester quickly discovered that we did, in fact, use the gem, and one a feature broke when I removed it. 🤦🏻‍♂️ After a lot more digging I finally discovered that the version of http-parser it relied on didn’t have a Java extension, so Bundler couldn’t find a version it could use. My final solution involved rolling the http gem back to a previous version that didn’t have the http-parser dependency.

I’m still unsure how the app can run locally but not on the server or what caused the dependency to break (I hope to find out in the future) but at least we can move forward with testing new features.

Takeaways

  • If the way your server runs in development is significantly different than how it runs in QA, staging, production, etc. you’re going to have a hard time tracking down the cause of server issues. I’m not saying you need to compile assets and run in production mode all the time (that’s actually a pretty bad idea), but the hardest issues to fix are the ones you can’t recreate.

  • Relying on something that “just works” (like our deployment process) is great… until something breaks. You don’t need to be an expert at the internals of every dependency you use (a big reason to use a dependency is because it does something so you don’t have to figure it out on your own), but be prepared to do a deep dive into it if something breaks in the future.

TIL

  • Always double check before removing a dependency you think is unused. Tests passing and the server booting don’t mean you didn’t break something. 😅

Listening to: