We love Heroku. It makes deployment so easy and quick. However, it can start to get pricey when you add additional dynos at $35 each a month.
With a small amount of work, you can get a lot more out of your Heroku hosting whilst drastically improving the performance of your site. You might need to spend a little bit of cash on other services, but a lot less than if you simply moved the dyno slider up a few notches, and the result will be much better scalability.
So how do we max out the performance of our Heroku apps? First we stop using Heroku for things it’s bad at, then we let it do more of what it is good at, running your application code.
By default a Heroku dyno is responsible for serving all the assets for your site, so every page load will involve multiple requests to the dyno.
The asset_sync gem modifies asset pre-compilation to sync all of your assets to an Amazon S3 bucket from where they are served directly and freeing up your dyno to handle more requests.
If you want to speed things up even more, you can slap Amazon’s Cloudfront CDN in front of your S3 bucket with multiple subdomains. Michel Sikkes has an excellent guide to serving you assets with S3 and cloudfront. Your assets will be served through Cloudfront from multiple subdomains (e.g. assets[0-3].myapp.com), all of which point to the same bucket. Not only will your assets be served through CloudFront’s speedy global CDN, but most will be downloaded in parallel. Browsers make a limited number of concurrent requests per host name (2 for IE, more for other browsers) so using multiple CNAMEs increases the number of concurrent connections, significantly reducing the page load time for users with good connections.
The cost of serving assets from S3, even with CloudFront, is very cheap and scales directly with the amount of data. Compared to adding another Heroku dyno this is great value, and has the added benefit of speeding up overall page loads.
If you use something like CarrierWave or Paperclip, by default the uploading and processing of images is done by your dyno. While this is happening your dyno is completely tied up, unable to handle requests from any other user. If one person uploads a 2Mb image on a slow connection everyone else will be locked out for the duration.
To prevent this from happening you need to decouple the upload process from your dyno. The CarrierWave Direct gem does just this. With a bit of client-side magic it uploads files to S3 directly, rather than through the dyno. The images then get resized by background processes using DelayedJob or Resque. This obviously has the downside that you’ll need a worker running, but there are ways to manage these cost-effectively which I’ll talk about next.
Another option, which we’ve used recently, is the awesome Cloudinary service. They provide direct image uploading, on-demand image processing (including face detection, which even seems to work on cats) and a worldwide CDN all in one package. There is a free tier to get you started, and for $39 (slightly more than one Heroku dyno) their Basic plan will be more than enough for many sites. Obviously you could just spend money on another dyno, but that just scales your performance linearly, without really solving the fundamental performance bottleneck.
Background processing with Delayed::Job is a great way of speeding up your web requests. Potentially slow tasks like image processing or sending signup emails can happen outside of the request-response cycle, making it much snappier and freeing up your dyno to handle more requests. The downside is that you need to run a worker dyno at $35/month.
Michael van Rooljen’s HireFire modifies Delayed::Job and Resque to automatically scale the number of worker dynos based on the jobs in the queue. Because Heroku charge by the dyno/second, spinning up 10 workers for one minute costs the same as one worker for ten minutes, so with HireFire you can potentially get things done quicker while paying less than you would if you ran a dedicated worker dyno.
HireFire does have one limitation, it only works for jobs scheduled for immediate execution. If that is an issue Michael has a HireFire service that will monitor your application for you, so jobs scheduled in the future will be run.
If you have an application that needs to perform complex searches over large datasets don’t do it in your application directly. If searches regularly take a long time (a couple of seconds or more) consider using something like Solr (available as a Heroku plugin), Amazon CloudSearch, or one of the many Search as a Service providers. You’ll not only get faster search performance in many cases, but you’ll save vast amounts of development time trying to optimise your in-app search. Of course, if you have a simple application with straight-forward search then this probably won’t be worth it the cost, but it’s something worth considering.
If you’ve not encountered caching in Rails, stop reading this article right now, go read the Rails Guide to Caching and then DHH’s short guide to key based cache expiry. Caching in Rails 4 will be even better, with improved support for “Russian Doll” caching.
View caching in Rails can have a profound effect on your application’s response time. In the past we have found that rendering pages, especially complex ones with lots of partials, can easily account for two-thirds of the total processing time, much more than you might expect.
Simply using caching will help speed up your application, but the default cache store is not shared between dynos so the benefits are limited. In contrast, a Memcache store is shared between your dynos so they all benefit from any cached item. Heroku has two add-ons that let you very easily add memcache to your project. The Memcachier addon gives you 25mb for free, and is pretty reasonably priced from there on up. Just adding a small cache store of 25mb can make a significant difference to the load time of your pages.
So after spending a little bit of time, and a relatively small amount of cash, we’ve offloaded much of the work that was being done by our web dyno and onto services that are better suited to it; drastically speeding up our request-response cycle. Our single dyno can now handle significantly more users per minute, who are happier because they get a much faster response from the site.
However the default Heroku dyno configuration only handles one request at a time. If you wanted to increase your level of concurrency in the past you would have to increase the number of dynos. That’s all changed with the release of the Rack server Unicorn which can handle multiple concurrent connections. For most applications a single dyno should be able to handle between two and four connections at a time. The main constraint will be memory (limited to 512mb per dyno), so keep an eye on the gems you are loading in your production environment. Florian at Rails On Fire has done a great introduction on setting up Unicorn on Heroku. If you’ve followed the previous steps you should be using less memory on your web dynos, allowing you to use more threads.
At the end of all this we’ve freed up our Heroku dyno from doing things it’s not very good at like serving static files and uploads, and juiced up its performance when doing what it’s great at, serving Rails application requests with no sys-admin in sight.
Each technique can be easily applied to your existing applications, but if you develop with them in mind from the start you get all the benefits with almost no additional work. On their own each one will help the performance of your application, but combining them together will significantly extend the amount of time before you have to start forking out for lots more dynos, and when you do you’ll get much more bang for each of your thirty-five Heroku bucks.
If you’ve got any other tips for getting the most out of a Rails application, whether or not it’s on Heroku, we’d love to hear about it them!