Cultivating Attributes
Cultivating Attributes
Each FlowFile is created with multiple Attributes, and these Attributes will change over the lifetime of the FlowFile. The FlowFile concept is very powerful and provides three main benefits. First, it allows users to make routing decisions within a stream so that FlowFiles that meet multiple criteria can be handled differently than other FlowFiles. This is done using RouteOnAttribute and similar Processors.
Second, Attributes are used to configure the Processor in such a way that the configuration of the Processor depends on the data itself. For example, the PutFile Processor may use Attributes to know where to store each FlowFile, while the Attribute directory and filename may be different for each FlowFile.
Lastly, Attributes provide very valuable context about the data. This is useful when reviewing Origin data for a FlowFile. This allows the user to search for Provenance data that matches certain criteria, and also allows the user to view this context when examining Provenance Event details. By doing this, users can then gain valuable insight into why data is processed one way or another, simply by glancing at the context that is brought along with the content.
Each FlowFile has a minimum set of Attributes:
- filename: A file name that can be used to save data to a local or remote file system.
- path: The name of the directory that can be used to save data to a local or remote file system.
- uuid: A Universal Unique Identifier that distinguishes a FlowFile from other FlowFiles in the system.
- entryDate: The date and time the FlowFile entered the system (that is, it was created). The value of this attribute is a number that represents the number of milliseconds since midnight, January 1, 1970 (UTC).
- lineageStartDate: Whenever a FlowFile is cloned, merged, or split, it results in a "child" FlowFile being created. When the children are then cloned, combined, or split, the ancestral chain is built. This value represents the date and time that the oldest ancestor entered the system. Another way to think about this is that this attribute represents the latency of the FlowFile through the system. Value is a number representing the number of milliseconds since midnight, January 1, 1970 (UTC).
- fileSize: This attribute represents the number of bytes taken up by the FlowFile Content. Note that the uuid, entryDate, lineageStartDate, and fileSize attributes are system-generated and cannot be changed.