Hidden Fields in Apache NiFi FlowFiles

2016-08-26

NiFi users quickly learn that FlowFiles have built-in fields like uuid and filename, because these are obviously visible in the UI, and referenced in many Expression Lanaguage examples.

Only after I read Mark Payne's answer to a StackOverflow question about the lineageStartDate field did I appreciate that there might be ways to reference internal fields from Expression Language that are not obvious from the UI. So I hunted through some code to find out which fields they were and where they came from.

While you can do a lot in NiFi without knowing these fields, an exploration of where they come from develops a better understanding of how the fields visible in the UI match to NiFi internals.

Values in Expression Language

First, I tried to find how internal values of FlowFiles become available in Expression Language. Or remain unavailable? We need to take a look at the ValueLookup class, which is where FlowFile data is exposed to EL. In ValueLookup, you can see several sources of EL values: EL ValueLookup:66

  • Attributes - As part of the same user-modifiable attribute set, some attributes are added by default. uuid and filename are probably the best examples.
  • Properties - FlowFiles also have intrinsic properties, a select number of which are exposed through Expression Language. The names of these variables is set when they are added in ValueLookup, and the names may be different from the class variable names in code, and different from the descriptive names given in the UI.

I followed both default attributes and properties a bit more.

Hidden Fields?

NiFi users know that FlowFiles have built-in fields like uuid and filename, because these are obviously visible in the UI and referenced in many Expression Lanaguage examples.

Only when I read Mark Payne's answer to a StackOverflow question about lineageStartDate did I appreciate that there might be additional fields. I hunted through some code to find out what they were and where they came from. In fact, the Expression Language module loads both "properties" and "attributes" of FlowFiles.

  • Attributes - As part of the same user-modifiable attributes, some attributes are added by default. uuid and filename are probably the best examples.
  • Properties - FlowFiles also have intrinsic properties, a small number of which are exposed in Expression Language.

Core Attributes

Attributes are mostly left to the user to define, use, and abuse however they choose. But there are a small number of "Core Attributes" that are defined by the framework's CoreAttributes enumeration. An even smaller subset of these attributes are initialized with default values in the StandardProcessSession.create() method.

Core Attribute Description
uuid Read-only. Initialized to UUID.randomUUID()
filename Set by default to System.nanoTime()
path Relative directory portion of path, excluding the leaf file name. Default is ./
absolute.path Absolute directory portion of path, excluding the leaf file name. No default.
priority Used by PriorityAttributePrioritizer, no default
mime.type Widely used by processors. No default.
discard.reason No default
alternate.identifier No default

uuid is probably the most special, in that it also has protection from being overwritten. All attribute are of type String, even if they store numbers. And all are displayed in the UI when present, just like any user-specified attributes.

NiFi FlowFile Attributes

Properties

But wait! These attributes are in plain sight, didn't I promise you "hidden" stuff? Yes, and that's where we come to the FlowFile "properties" referenced in the Expression Language's ValueLookup class since NiFi 1.0.0. Properties are intrinsic data values of FlowFiles, some are made available in Expression Language:

Name in Expression Language Field Name (UI) Description
flowFileId (no shown) internal identifier
fileSize File Size Size of content, in bytes
entryDate File Size Milliseconds since flowfile entered this NiFi
lineageStartDate Used to calculate Lineage Duration Timestamp milliseconds
lastQueueDate Used to calculate Queue Duration Timestamp when last placed in queue
queueDateIndex Queue Position Offset in queue

In contrast to attributes, properties are less visible in that they are not echoed verbatim in the UI. Or at least less visible by their Expression Language names.

NiFi FlowFile Properties

Examples

There is no substitute for trying it out, here are some examples of Expression Language that use these fields.

Using lineageStartDate to capture the length of time a FlowFile has been in NiFi:

${now():toNumber():minus(${lineageStartDate}):format("HH:mm:ss")}

Size of a FlowFile in KB, by integer division:

${fileSize:toNumber():divide(1024)}

I've saved a NiFi template as a Gist that uses Expression Language to evaluate all of the fields and the examples above.