Recently I have been busy working on wagtail-storages. It is a third-party package that makes using S3 as a storage back-end in Wagtail a breeze. Especially, if you seek an efficient way to serve public documents while still allowing for editors to share private documents using Wagtail, e.g. restricted with password or to a certain user group.

Why

Currently people seem to just use a vanilla django-storages setup. That is not enough for websites that need to have both truly private documents and public documents at scale on the same Wagtail instance. The best option currently is to just use the default Wagtail document serve view which proxies the files. It is not efficient because it unnecessarily blocks resources of the application server to download the files and then to pass them onto users. On platforms like Heroku there is a 30 seconds limit for request, so in case of big files that may be too long to download. The question comes to mind - why not let users download files from S3 directly?

Wagtail 2.7 seems to address this problem by adding the “WAGTAILDOCS_SERVE_METHOD” setting. However, it only allows for redirect - no consideration for public vs private files. It is better than the previous way of having the proxy view but to allow truly private documents, all of the public documents need to be served straight from S3 with signed URLs - it is wasteful.

You can also use CDN and make all your documents public and not use your application server at all. The issue with that approach is that you are now taking away the option to have private documents on a Wagtail instance. Also, if document is re-uploaded, it is not purged from the CMS. It is generally fine to assume that all documents are public on most content websites, but why not make it work properly? Why give editors an interface to restrict privacy of files and not actually apply the restrictions? Probably most Wagtail websites these days are hosted in the cloud and use S3, and probably use one of the botched ways to serve the documents.

What

wagtail-storages is a collection of signal handlers that do the following:

  • On document or its privacy control change, update object ACLs on S3, i.e. if it has restrictions set object ACL to private and if it does not, set ACL to public-read.
  • Purge the document from front-end cache on changes. There is an option to specify an extra front-end cache invalidator configuration besides the one used for purging pages. This feature uses the already excellent Wagtail’s built-in module so it was a simple job.
  • When document is served, the document view takes a decision to either redirect user to the CDN or create a one-off signed URL for private documents.

It is a very simple package but having to copy and paste boilerplate code between projects to achieve the same level of functionality, without having proper unit tests, was just a no-go and a hassle.

The project can be found on GitHub and PyPi. If you use Wagtail and S3 with django-storages, give it a try. Also, the README file explains the recommended S3 setup for a Wagtail project if you want to learn about that.