Setting up a CDN is something that almost any site can benefit from. Up until fairly recently however, they were cost prohibitive for many sites. With Cloudflare and Cloudfront -- among others -- becoming more mainstream, there's no reason not to put part of your site on a CDN. There are many pitfalls, however, and setting CDNs up isn't always straightforward. This blog post will talk about some of the basics of CDNs, and some lessons we've learned while setting them up.
What is a CDN, and why do I want one?
First off, let's talk about what CDNs are, and why to use one. CDN stands for Content Delivery Network, and its primary use is to make serving content to your users faster. It does this by having many copies of your content spread throughout the world. With various routing and DNS tricks, a CDN has your users download the content from a server that is very close to them, instead of having to download from your actual webserver, which may be far away from the user.
In addition, the servers that CDNs use are highly optimized for serving static content. They are able to do this typically much faster than your webserver, which must also deal with other types of content. You also get to leverage much more hardware to serve content faster.
Types of CDNs
There are 2 primary types of CDNs, reverse proxy and object storage. In both cases, you need to either update DNS or use a new hostname for content that you want to go through your CDN.
Object storage
Object storage CDNs are probably the most straightforward in terms of how they work. Just like you would upload images, JS, CSS and other static content to your website, you instead upload it to the CDN. The CDN takes care of delivering it for you. This sounds great, but has a couple of downsides:
- Need to update your deploy process to send copies of your content to the CDN so that it can serve it.
- Need some way to clean up old assets that are no longer going to be served.
A relatively common object storage CDN is using Amazon S3 backed by Cloudfront. Rails has a gem called asset_sync that helps take care of uploading the assets for you as part of asset precompilation
Reverse proxy
Reverse proxy CDNs are somewhat more complicated, but can be easier to maintain long-term. They work by receiving a request for content and checking to see if they have that content in their cache. If they do, they just go ahead and serve it directly to the user. If they don't have a copy, it makes a request to your server (called the origin) for the content, caches it, and then serves subsequent requests for that content directly to the user. This has a couple of benefits:
- No need to send assets to the CDN. It grabs them from you on request.
- No need to manually expire old content on the CDN. It handles all of that for you.
There is one kind of big downside to reverse proxies though. Since they make requests to your servers and cache those requests, how your server responds is very important. It's possible to get bad or incorrect content cached, or the way your server responds might be such that the CDN won't cache the copy, and will instead make the request to your server every time, defeating the point of the CDN in the first place!
Despite these challenges, we at PLM decided to opt for a reverse proxy CDN. After some perils getting the configuration right, it has been serving us quite well.
Before you start
Before you even start configuring a CDN, make sure that your static content is being served in an appropriate manner locally. This means:
- Far-futures headers with both Expires and Cache-Control headers.
- Assets are being served compressed when appropriate.
- The asset pipeline is generally working, assets have fingerprints in the filenames, etc.
- Assets are being served from SSL as appropriate.
These steps are all important for general performance of your site in the first place, and you'll want to make sure that you can turn off the CDN temporarily should you need to without your server crushing under load.
General process
First of all, pick a CDN provider. Our needs were satisfied by Cloudfront as a reverse proxy, so that's what we went with. Next up you want to configure the CDN, and get a domain that maps from your CDN to your origin. In the case of cloudfront, that's something like abc123.cloudfront.net. MAKE SURE that it's mapped directly to the origin domain that you are serving assets from, without redirects. If your origin takes a request from the CDN and redirects somewhere else, the CDN will not cache the result!
Dealing with a combination of www and SSL redirects can make this trickier than you expect, especially if your site is accessible by both SSL and non-SSL but always redirects to SSL. CDNs typically have a different set of settings for dealing with SSL redirects, so you'll want to research how the settings work and experiment a bit.
It's also recommended to set up the CDN on a staging environment first, as sometimes some content you serve might not like being served from a different domain. We had some problems with TinyMCE, for example, and needed to set up some exceptions for that.
Once you think you have the domain mapped correctly, use curl and execute some sample requests. You are looking for any headers sent by your CDN to signify cache hits versus cache misses, and you are looking to ensure that you are receiving 200 requests, and not 3xx requests.
Once you've done some testing on the domain, you are ready to enable it for your stage environment. In Rails, the setting you are looking for is config.action_controller.asset_host. More information about asset_host is here. We used the proc mechanism to serve TinyMCE from our servers, and everything else from Cloudfront.
Once you've done some testing on the stage environment, and are satisfied, you are ready to go into production! Redo the asset_host configuration for production, do a deploy, and watch the requests for static assets on your servers disappear!
What to do if things go badly?
If things go wrong, like your CDN has an outage, or some bad content gets cached, or something else, you can change asset_host to point back to your webserver and remove the CDN from the loop. You should also review the documentation for your CDN provider to handle manually expiring content should the need for that arise.
Downsides to CDNs
CDNs are often very useful, there are a couple of things to keep in mind, however.
- They aren't free
- They can require some upkeep, maintenance, and documentation
It's typically worth figuring out how to make this work though. Your users will have a better experience on your site because of it.