Using GruntJS to sync with Amazon's S3

S3 continues as a great place to host static files for web use. It’s relatively cheap, has a good uptime and when integrated with Cloudfront it becomes a full content delivery network (CDN) with many geo-located edge servers. Getting development files to S3 and what to do once they are there makes for a long day and should never be manual process. I have a setup that I use on personal projects and at work that solves a majority of the problems ever faced with using S3 in development for static cached files. Previously I was using bash-fu or Phing for this, but recently switched to NodeJS.

A quick example config for uploading files to S3 in Grunt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
cfg['s3'] = {
options: {
key: cfg.Ss.s3.key,
secret: cfg.Ss.s3.secret,
bucket: cfg.Ss.s3.bucket,
access: 'public-read',
headers: {
// Two Year cache policy (1000 * 60 * 60 * 24 * 730)
'Cache-Control': 'max-age=630720000, public',
'Expires': new Date(Date.now() + 63072000000).toUTCString()
}
},
prod: {
sync: [{
// The regular js files
src: path.join(cfg.Ss.path.release, 'build/cdn/js/**/*.js'),
dest: 'js',
rel: path.join(cfg.Ss.path.release, 'build/cdn/js')
}, {
// The gzip js files
src: path.join(cfg.Ss.path.release, 'build/cdn/js/**/*.js'),
dest: 'jsgz',
rel: path.join(cfg.Ss.path.release, 'build/cdn/js'),
options: { gzip: true }
}, {
// The regular css files
src: path.join(cfg.Ss.path.release, 'build/cdn/css/**/*.css'),
dest: 'css',
rel: path.join(cfg.Ss.path.release, 'build/cdn/css')
}, {
// The gzip css files
src: path.join(cfg.Ss.path.release, 'build/cdn/css/**/*.css'),
dest: 'cssgz',
rel: path.join(cfg.Ss.path.release, 'build/cdn/css'),
options: { gzip: true }
} ]
}
};

The pattern I use

The pseudo basics are simple to follow. If the file exists on S3, do not upload a file named the same. However, if you want to verify the file, check its MD5 hash and compare that with the local file, also check the last modified time. If the hash is the same, then we are dealing with the same file and can skip it. If the hash is different and the local file has been modified more recently, then we upload the local file to S3, even though the name is same. It’s always best to name a file that contains the version of that file. Similar to something like jQuery-1.3.2.js (my favorite version personally). That way it’s a simple check to verify if the file is on S3 or not. In my code, a files SHA hash is in the name, which is what I consider the version, or revision, solving all caching issues down the line. A file will then end up having a name similar to this.

1
selectivizr-5a98e629a16ea2c02c322b4a35be249766ae2edf.js

If you have not used GruntJS before than this article is a good starting point. An out of the box, robust, capable set of libraries that can be configured in a single file; GruntJS easily accomplishes everything a web build process needs.

Primer

GruntJS installs using NPM and a package.json file (sudo npm install -g grunt-cli; sudo npm install grunt). I began by using the plugin Grunt-S3 when I was working on pushing files to S3 for work, but it was missing the critical feature; synchronization of files. I took a look at the Grunt-S3 on Github.com. It’s fairly active, so I decided to add the sync feature myself. I needed to only push a file if it did not exist already. At the time I was relying on my bash-fu and curl to cover my bases. I implemented my process into Grunt-S3 and now all of the pieces fit.

A common first step in a build process is to prepare the files by concatenating them then minimizing them using something like UglifyJS or Google Closure Compiler. My next step is renaming files based on the content that is in the file. Basically, adding the SHA1 hash to the filename using the file content itself.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

hash = function(type) {
var sha = crypto.createHash('sha1');

// get the files that are needed to be sha'd
var files = Ss.path.temp + type + '/**/*.' + type + '.prehash';

var filesFound = grunt.file.expand(files);
filesFound.forEach(function(file) {
var rfile = grunt.file.read(file);
sha.update(rfile);
var hash = sha.digest('hex');
console.log(hash);
});
// sha then and then rename the file
};

Using Grunt-S3 is a simple config away

At this point, you can refer to the Grunt-S3 plugin for moving this code onto S3 using the new “sync” feature I contributed. The beauty is that when you do update your web code, all users will immediately get the new files on the next code refresh (because you are literally serving a new file due to the filename change), rather than needing to wait for a cache timeout. I hope this feature is useful for all, not just me. I’ve also recently added some documentation for the feature I added, as it’s obviously cryptic and difficult to always expect user to read source files.