Query CloudTrail Events with Athena


I was recently building a NiFi Flow for CloudTrail events that enriched the events with IP geolocation data, then wrote them to an S3 bucket to query with Athena. But I wondered, is it possible to use Athena to query CloudTrail records directly from S3 without reprocessing them?

The answer is yes, as long as some tortured SQL syntax doesn't bother you.


Hidden Fields in Apache NiFi FlowFiles


NiFi users quickly learn that FlowFiles have built-in fields like uuid and filename, because these are obviously visible in the UI, and referenced in many Expression Lanaguage examples.

Only after I read Mark Payne's answer to a StackOverflow question about the lineageStartDate field did I appreciate that there might be ways to reference internal fields from Expression Language that are not obvious from the UI. So I hunted through some code to find out which fields they were and where they came from.

While you can do a lot in NiFi without knowing these fields, an exploration of where they come from develops a better understanding of how the fields visible in the UI match to NiFi internals.


First Look at Apache NiFi


I recently checked out Apache NiFi for the first time to run through a "big data" processing demo. NiFi is an environment for running flow-based data processing programs. Although it is new to the Apache Server Foundation, it lived a previous life as the NSA's "Niagara Files" project. I didn't know the NSA shared, but aparently they do.

NiFi is interesting, different, and surprisingly... fun. Drag and drop a flow-based program! But as with all technologies, understanding the NiFi mindset is more important than a strict analysis of its current capabilities. And the NiFi mindset is both distinctive and challenging. Distinctive in that NiFi emphasizes developing, monitoring, managing, and troubleshooting a running system in production. Challenging in that most current systems emphasize the dev -> test -> prod promotion pattern that NiFi seems to ignore.


Monitoring Advice Distilled


After being asked for my quick list of top monitoring advice more than once recently, I thought I would just write it down. I've written more about troubleshooting than monitoring, and it's been interesting to consider how the two are related, yet different. The really short version is:

  • Define the standards you are monitoring to
  • Focus on humans and organizational measures before worrying about tools
  • Prefer monitoring transactional data over logs
  • Prefer monitoring success rate over error rate
  • Make sure you can separate happy quiet from scary silence

Using CodeCommit and GitHub Credential Helpers


I have git repos on both GitHub and AWS CodeCommit, but I found CodeCommit's HTTPS credential management to be a bit problematic. CodeCommit's credential helper does not follow the typical name/password pattern, and the default git credential helpers installed for both Windows and OSX do not naturally play nice side-by-side with CodeCommit.

  1. I followed Amazon's documentation for setting up CodeCommit's credential helper and overwrote my GitHub-compliant credential helper configuration.
  2. I tried to configure them side-by-side with CodeCommit's helper namespaced to the CodeCommit HTTPS domain. I learned about git credential helper configuration settings figuring out the earlier problem of getting CodeCommit to work with EC2 Role Credentials. So I felt really smart, and I was proud of myself for figuring out domain scoping of credential helpers for a few minutes -- until this stopped working because credential helpers are executed in a cascading chain.
  3. Finally, I configured both helpers by HTTPS URL scope, so they play nicely side-by-side.



API Gateway Permissions Model is Special


Amazon's new API Gateway service has great potential, probably an avenue for future expansion of AWS, and certainly something I'm trying to get up to speed on. But API Gateway definitely has some quirks.

Today, I ran into a strange aspect of the permissions model for API Gateway while answering a StackOverflow question about why the documented permissions didn't work in the IAM Policy Simulator , and it's still bothering me. Amazon made an intersting choice with the permissions model that seems consfusing to developers and fateful with respect to future services. In short, I would call it "special", for not being with the same program every other AWS service used to define their permissions.

The bottom line is that API Gateway has its client and admin/management permissions broken out under different service names. When you look at the list of services for permissions, you see:

  • API Gateway - Permissions for clients, currently the only action is `execute-api:invoke`.
  • Manage - API Gateway - Admin permissions for configuring the API Gateway, which has CRUD actions fitting the `apigateway:*` spec.

CodeCommit with EC2 Role Credentials


I had trouble getting AWS CodeCommit to work using an EC2 instance's role/instance profile credentials. My goal was to use a cloud-init script to pull down source from CodeCommit at instance launch time, then kick off some processing. I spent more time that I wanted to struggling with CodeCommit credentials, and I'd like to share the results in the hope of saving someone else the anguish.

My initial desire to use CodeCommit rather than github was that it used IAM credentials. An EC2 instance with credentials from an IAM Role seems like a pretty vanilla use case to me, but is not covered in the CodeCommit docs for setting up a git client. The documentation focuses on interactive or desktop use rather than automated agent access.


There were three configuration challenges and one obvious task:

  1. I ran into errors running git as root from the cloud-init script. I used git system-wide seetings as a workaround.
  2. The AWS CodeCommit credential helper for git must be configured with the "default" named profile to use the instance's role/instance profile credentials.
  3. The "default" named profile must be set to use the us-east-1 region by default, to match CodeCommit.
  4. The EC2 role needs to have permissions to the CodeCommit repo.

Below, I'll explain the setup in more detail and show a sample cloud-init script that puts it in action.


Elastic Beanstalk Periodic Worker Tasks


Recently, AWS announced support for "periodic worker tasks”, a.k.a. cron jobs, a.k.a. scheduled tasks in ElasticBeanstalk worker tiers. This is a feature I’ve been missing in Elastic Beanstalk, and I was unjustifiably excited when I read the announcement. This post is a record of my thrilling journey to setting up and using a periodic worker task. I had three problems:

  1. Periodic Worker Tasks are only supported on the Worker tier of Elastic Beanstalk
  2. cron.yaml is not documented by AWS (yet)
  3. Exactly how my application code would be invoked by the task was not clear

Why Static Web Sites?


I was a bit puzzled when I first heard the phrase "static web site". It sounded like a silly new label for something old, a marketing staple in the tech industry that usually provokes eye rolling from me. But after having a chance to try it out, and try alternatives, I have become a believer in the term, the fundamentals behind what makes it new, and it's potential for the future.

My working definition of the term "static web site" is a web site that has no server-side code when it is deployed. The deployed site consists only of static assets like HTML pages, CSS, images, Javascript files, etc. There can be lots of client-side code, if that's what you're into. But there is code to generate the static site, but this generator code is not deployed.

The promise of static web sites is 100% control of the code, good and cheap hosting, and total control over the tooling that makes it happen. If you are a web developer, this might sound awesome. If you are not a web developer, static web sites probably are not for you.


Learning Assemble, Step 12 - Navigation Helper


As part of my recent struggle building a list of 5 recent blog posts, I decided that I should just write a custom Handlebars Helper that did what I wanted. After I wrote the helper, I figured out that there was actually a built-in way to accomplish the task, such that my custom code became an unnecessary and possibly buggy addition. I was very disappointed.

But I felt obligated to flesh out my #withPages helper and share it. It is based on the built-in #withSort helper code, with some basic lodash action on the pages array, and a light topping of error messages. But it could be handy.


Learning Assemble, Step 11 - More Navigation


Back to navigation. In earlier posts in this series, we've looked at some page navigation and blog list navigation issues. Both of which centered around the challenge of selecting a subset of pages, and getting them formatted in the right sort order. I found several ugly ways of approaching these problems, but not a single, elegant way.


Learning Assemble, Step 10 - Pages from Data


Assemble is typically configured to build output pages from Handlebars templates or markdown files, where one source file maps to one output page. However, it is possible to dynamically generate pages from data rather than from individual source files. This technique is demonstrated in the assemble-blog-theme template. I found it to be quite mind-blowing at first, it challenged my understanding of Assemble and how I thought it should work. Experimenting with the technique helps build a better understanding of Assemble, and we'll try it here.

  • Understanding how Assemble pages may be dynamically generated from data
  • Implementing basic dynamic page generation

Learning Assemble, Step 9 - Sitemap


Our static site needs a sitemap to help search engines find our awesome content. Sitemaps define where content exists in your site, with some metadata about when it last changed, how frequently it is likely to change, and the relative priority of each page.

I have previously tried to generate a sitemap in an Assemble project using Handlebars templates, but had to give up when it became apparent how difficult XML output is in Assemble. Today, we're going to cheat by installing and configuring the appropriately named assemble-middleware-sitemap, Assemble middleware that will generate our XML sitemap for us.


Learning Assemble, Step 8 - Debugging Assemble


Assemble has its own debugging pattern, which can be a bit tough when you first pick it up. In this post, we'll look at some ways to help you troubleshoot problems and understand what is going on in Assemble.

  • Logging and debugging with buit-in Handlebars helpers
  • Creating diagnostic pages to view Assemble's data model

Learning Assemble, Step 7 - Custom Handlebars Helpers


If you don't like what Assemble does, you can probably change it with custom Handlebars Helpers. The best part is that there are already many good open-source helpers around, either for direct use or for study and modification. We'll look at how to use and author Handlebars helpers in this post.

  • About Handlebars Helpers
  • Writing a Simple Helper
  • Writing a Block Helper for Navigation Links

Learning Assemble, Step 6 - Blogging with Markdown


Every static site needs a blog that we can pump full of content marketing filler. In this post, we'll author blog posts using Markdown and figure out how to build an index page for our blog posts.

  • Markdown Blog Posts
  • A Blog Index
  • Gruntfile Modifications

Learning Assemble, Step 5 - Page Navigation


We used hard-coded navigation links earlier, and I promised that we would come back and build navigation links dynamically from the Assemble page data. Let's do that now. Some of the topics we'll cover include:

  • Simple page links in Assemble
  • Solutions for navigation link problems
    • Custom page collections with custom YAML Front-Matter
    • Custom nav data
    • Back to hard-coding?

Learning Assemble, Step 4 - Data


In this post we'll look at custom data used by Assemble and Handlebars templates.

  • Page data (YAML Front Matter)
  • Site data
  • Built-in data

Learning Assemble, Step 3 - Grunt Workflow


Our static web site project needs a lot more than just Assemble, and we're going to take a quick break to set up some non-Assemble tasks in Grunt. We need Grunt to set up a useful workflow, including some way of previewing and debugging our static site. In this post, we'll go through the configuration of some additional Grunt tasks for a basic static site workflow.

  • Watch
  • Connect
  • Combined Build and Server Tasks
  • Livereload
  • Clean

Learning Assemble, Step 2 - Basic Templating


Our first Assemble example was nicely minimal, but sadly not very useful or realistic. In this post, we'll flesh out our Assemble templating a bit more to convert from "Hello, Assemble!" to something just slightly more realistic. This will include:

  • Designing a simple folder structure
  • Adding some more pages
  • Using layouts and partial Handlebars templates