Elastic Beanstalk Periodic Worker Tasks

2015-03-05

Recently, AWS announced support for "periodic worker tasks”, a.k.a. cron jobs, a.k.a. scheduled tasks in ElasticBeanstalk worker tiers. This is a feature I’ve been missing in Elastic Beanstalk, and I was unjustifiably excited when I read the announcement. This post is a record of my thrilling journey to setting up and using a periodic worker task. I had three problems:

  1. Periodic Worker Tasks are only supported on the Worker tier of Elastic Beanstalk
  2. cron.yaml is not documented by AWS (yet)
  3. Exactly how my application code would be invoked by the task was not clear

Worker Tier?

For starters, I was puzzled why this feature is supported only for the worker tier. It appears the feature involves the sqsd application, used in the worker tier to relay messages from SQS queues to HTTP POSTS. I don’t appreciate the restriction, but that’s the way it is. There are certainly many workarounds to creating your own cron job on web tiers.

cron.yaml

Next, I did not find any AWS documentation about the format of the cron.yaml file that drives periodic worker tasks. Their original blog posts says you should put cron.yaml in the root of your code, that the format is from cron, and links to Wikipedia’s cron page. Last time I checked, crontab is not YAML, however, so some documentation of how this format gap should be crossed seems warranted. But AWS isn’t the first company to come up with cron.yaml, and Google was kind enough to provide documentation.

Google’s documentation seems close to the AWS format, but I got an error from Elastic Beanstalk about the version. The following cron.yaml was derived from Google’s example, an AWS forum post, and a run through YAML Lint:

---
cron:
  -
    name: heartbeat
    schedule: "* * * * *"
    url: /schedule
    description: "heartbeat for logging"

I don’t know if this format is truly correct, but it works well enough to result in a message once per minute.

Placing the file at the root of your project means the root of the deployed Elastic Beanstalk package. This means not in the .elasticbeanstalk folder with other customization files.

After deploying the worker app with my cron.yaml, I see the text "Successfully loaded 1 scheduled tasks from cron.yaml" in the log stream, a possible sign of victory: ElasticBeanstalk Dashboard

Where Did My Task Go?

I was not exactly sure how my worker app would receive the scheduled message. It turns out to be posted to the schedule's url parameter, not to your environment's regular message queue processing URL. You just need to either handle a different HTTP POST route or distinguish between your normal messages and the scheduled messages posted to the same route. You have the option. SQSD adds some headers to the HTTP POST to match your cron.yaml settings as follows:

cron.yaml FieldsHTTP Headers
namex-aws-sqsd-taskname
urlx-aws-sqsd-path

Key takeaway: you can distinguish between scheduled messages and regular message either by HTTP route or by looking for the x-aws-sqsd-taskname header.