Automating tasks based on file changes

Triggering scripts to run when a file or folders change is a common advanced task. Recent applications like Dropbox and BitSync have grown from the days of rsync magic, but what about triggering code when a sync occurs. It could be as simple as emailing a photo to your parents once 5 pictures are updated in Dropbox or as advanced as rebuilding a static blog based on content changes of configuration files in a folder. This post focuses on the Linux tool for handling these events that relies on inotify; but there are similar (not covered here) ways on BSD/OSX/Windows too (FSEvents/kqueue).

I ran into this when setting up a Dropbox sync to cause a Grunt process to build a stage server. The Dropbox sync would need to trigger an Assemble.io process to turn Yaml files (that are held in the Dropbox path) into HTML.

Inotify is the systems way to event filesystem changes, but to be able to tap into inotify events it’s easiest to install another daemon called incrond (inotify cron daemon) that abstracts inotify system calls in a manageable way.

Installing InCron

Type the following command under RHEL / Fedora / CentOS Linux:

$ sudo yum install incron

Type the following command under Debian / Ubuntu Linux:

$ sudo apt-get install incron

Once you have incron installed, you should configure it. The users needing to take action ( ie. run scripts ) will determine what configuration you will want to use.¬†Begin by taking¬†a look at ‘/etc/incron.conf’ to make global changes to incron. Configurations options should speak for themselves; but here they are to be clear.

system_table_dir
This directory is examined by incrond for system table files.
Default:/etc/incron.d
user_table_dir
This directory is examined by incrond for user table files.
Default:/var/spool/incron
allowed_users
This file contains users allowed to use incron.
Default:/etc/incron.allow
denied_users
This file contains users denied to use incron.
Default:/etc/incron.deny
lockfile_dir
This directory is used for creating a lock avoiding to run multiple instances of incrond.
Default:/var/run
lockfile_name
This name (appended by ‘.pid’) is used for creating a lock avoiding to run multiple instances of incrond.
Default:incrond
editor
This name or path is used to run as an editor for editting incron tables.
Default:`no editor` is given, system editor used, this option overide this.

To start using incron first add the users you want to the allowed_users (/etc/incron.allow) file. Each line should be a new user. For instance, if you want to run a script as Root ( please don’t for security reasons) then the file would have a single line with ‘root’ on it.

Next, as the user you are wanting to use to watch files, you can do two things. Either edit the appropriate config file or run the command `incrontab -e`. Running the incrontab command can be easier but it’s also, IMO, restrictive.

Example

Create an autoscript user (on linux)

$ sudo -i
$ adduser -M -s /dev/null -G "www-data" autoscript
$ passwd autoscript
*******

Edit the appropriate incron.d config. Add the folders you wish to watch, the commands to watch for and the command to run when they are triggered.

$ vi /var/spool/incron/autoscript
$ chown autoscript:autoscript /var/spool/incron/autoscript

Here is what the file could contain

/home/someuser/Dropbox/somefolder/ IN_MODIFY,IN_ATTRIB,IN_CREATE,IN_DELETE /bin/bash /home/someuser/update.sh
/home/someuser/Dropbox/somefolder/someinnerfolder/ IN_MODIFY,IN_ATTRIB,IN_CREATE,IN_DELETE /bin/bash /home/someuser/update.sh

You may have noticed in the example above that the folders are not recursive; if you are updating folders frequently, you should automate the creation of the rules for folders to watch. Then you can run your script to update the watching of files. Also, you can target individual files as folders are not required.

The options

In the example above there were some options not talked about yet (IN_###, etc). Those options tell incron what events you are wanting to trigger on. There isn’t a need to list them all here since there are plenty of sites that detail the options (like here).

Wrapping up

The final step to this is turning it all on. Probably the easiest part depending on your linux flavor/version

$ service incrond start
$ chkconfig incrond on

If chkconfig isn’t installed or used on your system (it’s an older way of doing things) you can check out some other options listed on StackOverflow. Notably, newer linux distributions use SystemD to control daemons.

Copying Files to a Vagrant VM from the Host

I ran into a case where I wanted to have a file that exists on my local also exist on a Vagrant VM. I didn’t need it to be shared; using a shared directory. I also couldn’t move the file to the Vagrantfile path and use the default mounted directory to move it. Instead, I needed a way to use `scp` to push a file from the host to the guest. Vagrant provides an API that is key to making `scp` work. With some piping and tools; providing the right credentials for `scp` is easy.

I also commented on StackOverflow

Quickly Remove a Known-Host Entry

I’ve had plenty of scenarios in development where I have updated a development machine and the credentials are updated causing SSH errors. This happens frequently in Vagrant as well. But it’s simple and quick to remove the old known signature.

# ssh-keygen -R hostname [-f known_hosts_file]
# ssh-keygen -r hostname [-f input_keyfile] [-g]
 
#e.g.
ssh-keygen -R 192.168.99.99
ssh-keygen -R mytestdomain.vagrant

When updating your known_hosts; make sure that you understand what you are doing. The known_hosts entries are there to protect you by cryptographically validating a server really is who they say they are. IE: Do Not Delete Things You Don’t Trust.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is

The Case for Critical Assets

When it comes to first impressions, it can be said a website is made or broken by them. A modern website that relies heavily on scripting is likely self inflicting a performance hit. This hit can be caused by the generic advice given for speeding up a website, ironically. Many web speed tests give an immediate failing grade for not using a content delivery network (CDN) for every single one of your assets. But in the case of critical scripts and styles, this can be the exact wrong advice.

Critical scripts and style are the assets that must load prior to effectively loading content. For instance, a site might use Modernizr to sniff for features to further load content or other scripts. It might use jQuery to create DOM elements and place them on the page. It also might use @media queries to alter, import or change the layout of the page. If any of these situations exist, a new or returning user will have to wait for that file to load on the page prior to being able to view any content. I’m not going to argue if a site *needs* to be doing those things first, but if it is, then it’s going to cause a ‘hiccup’ for first time rendering. A cached asset may negate this on further loads and that’s why caching and cache control is a good thing to use. But this does nothing to help a first impression.

DNS lookup times are killing time
DNS lookups are killing time

Continue reading The Case for Critical Assets

Javascript Frame Busting or Proper Apache Headers

I’m a fan of David Walsh; whom recently posted a snippet of JavaScript to block an iFrame.

if (top.location != self.location) {
    top.location = self.location.href;
}
View this on gist.github.com

But you can and should invest in a better solution. As one of the solutions pointed out on Stack-Overflow shows you can add the `SameOrigin` header at the server level. It works and works well. You can even allow certain pages over others.
A quick Apache solution looks like this:

<VirtualHost *:80>
  # ...
  <IfModule mod_headers.c>
   # Allow some urls, block all others; whitelisting
   <LocationMatch ^((?!(firstUrlAllowed|secondUrlAllowed)).)*$>
    # Block any site from applying an iframe.
    Header always append X-Frame-Options SAMEORIGIN 
   </LocationMatch>
  </IfModule>
</VirtualHost>
View this on gist.github.com

This technique works in all browsers and is something you can’t just turn off by disabling JavaScript (ie. It’s more secure).

Also note the Apache parameters for the the whitelisting. If you want to block your entire site from iFrames, then you do not need the LocationMatch. Otherwise, any strings that you put in the regex, if found in the url, will not block iFrames. This is useful if you do not want to block a page thats purpose is to be in a frame (like a bookmarklet script).