Connecting Processors
Connecting Processors
Each Processor has a defined set of "Relationships" that can be used to send data. When the Processor finishes handling the FlowFile, it transfers it to one of these Relationships. Allows users to configure how to run FlowFiles based on Processing results. For example, many Processors define two Relationships: success and failure. The user can then configure the data to be routed through a one-way stream if the Processor successfully processes the data and routes the data through the stream in a completely different way if the Processor cannot process the data for some reason. Or, depending on the use case, it might just redirect both relationships to the same route through the stream.
Now that we have added and configured the GetFile processor and applied the configuration, we can see in the upper left corner of the Processor an Alert icon () which indicates that the Processor is not in a valid state. Hovering over this icon, we see that a successful relationship has not yet been determined. This means that we have not told what to do with the data transferred by the Processor to a successful connection.
We can now send the output from the GetFile Processor to the PutFile Processor.
Hover over GetFile Processor with a mouse, and Connection Icon () will appear in the middle of the Processor. We can drag this icon from GetFile Processor to PutFile Processor. This gives us a dialogue to choose which relationship we want to include for this connection. Since GetFile has only one association, success, it is automatically selected. Clicking on the Settings tab provides several options to configure
How should this connection behave:
We can name it Connection if we want. Otherwise, the Connection name will be based on the selected relationship. We can also set an expiration for the data. By default, it is set to "0 seconds", which indicates that data should not be expired. However, we can change the value so that when the data in this Connection reaches a certain age, it will be automatically deleted (and the appropriate EXPIRE Provenance event will be created).
The backpressure threshold allows us to determine how full the queue is allowed before the source Processor is no longer scheduled to run. This will enable us to handle cases where one Processor can generate data faster than the next Processor capable of consuming that data. Suppose backpressure is configured for every connection along the way, the Processor that brought data into the system will eventually experience backpressure and stop bringing in new data so the system can recover.
Finally, we have priorities on the right side. This allows us to control how the data in this queue is sorted. We can drag Priority from the "Available prioritizers" list to the "Selected prioritizer" list to enable priority. If multiple priorities are enabled, they will be evaluated in such a way that the first listed Priority will be evaluated first and if two FlowFiles are defined equal according to that Priority, the second Priority will be used.
We can simply click Add to add Connections to our graph. We will now see that the Alert icon has changed to a Stop icon (). However, the PutFile listener is now invalid because the success and failure relationship is not yet connected anywhere. Let's address this by signaling that the data routed for success by PutFile should be "Automatically Terminated Relationship," meaning that NiFi should consider FlowFile processing complete and "discard" the data. To do this, we configure the PutFile Processor. On the Settings tab, on the right, we can check the boxes next to success and failure Relationship to Auto Terminate data. Clicking OK will close the dialogue and show that both Processors are now stopped.