Learning Assemble, Step 9 - Sitemap

2014-11-03

Our static site needs a sitemap to help search engines find our awesome content. Sitemaps define where content exists in your site, with some metadata about when it last changed, how frequently it is likely to change, and the relative priority of each page.

I have previously tried to generate a sitemap in an Assemble project using Handlebars templates, but had to give up when it became apparent how difficult XML output is in Assemble. Today, we're going to cheat by installing and configuring the appropriately named assemble-middleware-sitemap, Assemble middleware that will generate our XML sitemap for us.

Sitemaps are fairly simple XML documents that look like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://www.batchiq.com/</loc>
        <lastmod>2014-10-15</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://www.batchiq.com/features.html</loc>
        <lastmod>2014-10-15</lastmod>
        <changefreq>weekly</changefreq>
        <priority>0.8</priority>
    </url>
</urlset>

Building a Sitemap with assemble-middleware-sitemap

We are going to use assemble-middleware-sitemap to build our sitemap. This is a plugin that will run as part of the Assemble task. Following the instructions, we install it:

npm install assemble-middleware-sitemap --save-dev

And then update our Gruntfile to include the middleware (a.k.a. "plugin"):


        assemble: {
            hello: {
                options: {
                    data: 'source/data/*.{json,yml}',
                    layoutdir: 'source/templates/layouts',
                    layout: 'default.hbs',
                    partials: 'source/templates/partials/**/*.hbs',
                    collections: [
                        { name: 'navTags', inflection: 'navTag' }
                    ],
                    helpers: ['source/helpers/**/*.js'],
                    plugins: ['assemble-middleware-sitemap']
                },
                files: [
                    { expand: true, cwd: 'source/templates/pages', src: '**/*.{hbs,md}', dest: 'output/' }
                ]
            }
        },

Now when we run the Assemble task, it generates a sitemap file at output/sitemap.xml. It's a very basic sitemap, because we haven't configured anything yet, but it shows how easily you can get started with assemble-middleware-sitemap.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>undefined/output/about.html</loc>
    <lastmod>2014-11-01T00:29:57.826Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
  </url>
  <url>
    <loc>undefined/output/blog.html</loc>
    <lastmod>2014-11-01T00:29:57.827Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
  </url>
...

Customizing the Sitemap

To get more than just the basic sitemap, we have to do some actual work. We will want to customize our sitemap in the following ways:

  1. Fix the URLs so they correctly point to our deployment location
  2. Exclude some pages from the sitemap
  3. Customize the sitemap data fields for some pages

assemble-middleware-sitemap makes all of this fairly straightforward through configuration in our Gruntfile and the individual page YAML Front-Matter.

Fixing the URLs

The URLs should point to the deployed location of our site, where a search engine would look for them. We'll use a fake name, http://awesome-site.bogus, for now.


        assemble: {
            hello: {
                options: {
                    data: 'source/data/*.{json,yml}',
                    layoutdir: 'source/templates/layouts',
                    layout: 'default.hbs',
                    partials: 'source/templates/partials/**/*.hbs',
                    collections: [
                        { name: 'navTags', inflection: 'navTag' }
                    ],
                    helpers: ['source/helpers/**/*.js'],
                    plugins: ['assemble-middleware-sitemap'],
                    sitemap: {
                        homepage: "http://awesome-site.bogus"
                    }
                },
                files: [
                    { expand: true, cwd: 'source/templates/pages', src: '**/*.{hbs,md}', dest: 'output/' }
                ]
            }
        },

We've added the configuration section for the sitemap middleware, and specified one option for now, "homepage". This has the partial prefix URL for our site. No trailing slash. Try running Assemble again and see the URLs change. Here is a select listing:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://awesome-site.bogus/output/about.html</loc>
...
  <url>
    <loc>http://awesome-site.bogus/output/index.html</loc>
...
</urlset>

That's not quite right. The URLs do have our bogus domain name, but you might notice they include the 'output' folder (http://awesome-site.bogus/output/index.html). That's not what we want, we want the URLs to be relative to the output folder as root. Fortunately, there is another configuration option for this:


        assemble: {
            hello: {
                options: {
                    data: 'source/data/*.{json,yml}',
                    layoutdir: 'source/templates/layouts',
                    layout: 'default.hbs',
                    partials: 'source/templates/partials/**/*.hbs',
                    collections: [
                        { name: 'navTags', inflection: 'navTag' }
                    ],
                    helpers: ['source/helpers/**/*.js'],
                    plugins: ['assemble-middleware-sitemap'],
                    sitemap: {
                        homepage: "http://awesome-site.bogus",
                        relativedest: true
                    }
                },
                files: [
                    { expand: true, cwd: 'source/templates/pages', src: '**/*.{hbs,md}', dest: 'output/' }
                ]
            }
        },

And now we get correct URLs:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://awesome-site.bogus/about.html</loc>
...
  <url>
    <loc>http://awesome-site.bogus/index.html</loc>
...
</urlset>

Excluding Pages

You might not want to include all pages in the sitemap, so assemble-middleware-sitemap includes an option to exclude certain pages. We'll exclude our Diagnostics page, just to demonstrate how this works in the Gruntfile settings:


        assemble: {
            hello: {
                options: {
                    data: 'source/data/*.{json,yml}',
                    layoutdir: 'source/templates/layouts',
                    layout: 'default.hbs',
                    partials: 'source/templates/partials/**/*.hbs',
                    collections: [
                        { name: 'navTags', inflection: 'navTag' }
                    ],
                    helpers: ['source/helpers/**/*.js'],
                    plugins: ['assemble-middleware-sitemap'],
                    sitemap: {
                        homepage: "http://awesome-site.bogus",
                        relativedest: true,
                        exclude: ['diagnostics']
                    }
                },
                files: [
                    { expand: true, cwd: 'source/templates/pages', src: '**/*.{hbs,md}', dest: 'output/' }
                ]
            }
        },

After running Assemble, the Diagnostics page no longer appears in the sitemap.

Customizing the Data Fields

Last, we should customize the sitemap data fields to match our site. At the very least, I recommend changing the priorities, but you might also want to maintain the modified date and change frequency. In any event, we want to see how this works. The sitemap middleware will use values from three sources in order of most preferred to least preferred:

  1. Individual page YAML Front-Matter
  2. Gruntfile sitemap options
  3. Middleware default

For now, we are getting the middleware default for all priority, lastmod, and changefreq values. Let's set a new default changefreq for our site by specifying this in the Gruntfile:


        assemble: {
            hello: {
                options: {
                    data: 'source/data/*.{json,yml}',
                    layoutdir: 'source/templates/layouts',
                    layout: 'default.hbs',
                    partials: 'source/templates/partials/**/*.hbs',
                    collections: [
                        { name: 'navTags', inflection: 'navTag' }
                    ],
                    helpers: ['source/helpers/**/*.js'],
                    plugins: ['assemble-middleware-sitemap'],
                    sitemap: {
                        homepage: "http://awesome-site.bogus",
                        relativedest: true,
                        exclude: ['diagnostics'],
                        changefreq: 'monthly'
                    }
                },
                files: [
                    { expand: true, cwd: 'source/templates/pages', src: '**/*.{hbs,md}', dest: 'output/' }
                ]
            }
        },

But we definitely want to customize our homepage, index.hbs, because it is so, so, special:

---
title: "Home"
message: We have lots of cool stuff
promotions:
 one:
  title: "Promotion 1"
 two:
  title: "Promotion 2"
 three:
  title: "Promotion 3"
navTags:
 - header
 - footer
navLabel: Home
navSort: 0
changefreq: daily
priority: 1
updated: 2014-10-30
---
<h1>{{title}}</h1>
<p>{{message}}</p>
{{log promotions}}
<ul>
    {{#each promotions}}
        <li>{{title}}</li>
    {{/each}}
</ul>

The sitemap now looks pretty good.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://awesome-site.bogus/about.html</loc>
    <lastmod>2014-11-01T00:46:21.430Z</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
  </url>
...
  <url>
    <loc>http://awesome-site.bogus/index.html</loc>
    <lastmod>2014-10-30T00:00:00.000Z</lastmod>
    <changefreq>daily</changefreq>
    <priority>1</priority>
  </url>
...
</urlset>

In this partial listing, we can see the benefits of our configuration. The About page has a 'monthly' changefreq, using our new site default. But our Home page is 'daily', and has a priority of 1. Excellent.

For the code, please see the learning-assemble repository on GitHub. The branch step09-sitemap contains the completed code from this post.

Next: Learning Assemble, Step 10 - Pages from Data