Adding Processors
Adding Processors
Processors are the basic block of data flow creation. Each processor has its own functionality, which contributes to the creation of the final flowfile.
The data flow below illustrates the data flow for retrieving files from one directory using the GetFile processor and storing them in another directory using the PutFile processor.
Let's start creating the data flow by adding a Processor to the canvas. To do this, drag the Processor icon from the top left of the screen to the center of the canvas (like a graph paper background) and drop it there. This will give us a dialogue that will allow us to select the processor we want to add:
GetFile
GetFile is used to fetch files of a specific format from a particular directory. There are also other options to control better the process of pulling files out.
GetFile Settings
The following are the different settings of the GetFile processor:
Name
In the Name setting, the user can specify any name for the processor either according to the project or create a more meaningful name.
Enable
The user can enable or disable the processor using this setting.
Penalty Duration
In a flow file failure, this setting allows the user to add a penalty time duration.
Yield Duration
This setting is used to determine the result time for the processor. Within this duration, the process is no longer scheduled.
Bulletin Level
This setting is used to determine the log level of the processor.
Automatically Terminate Relationships
It has a list of all available relationships of a given process. By checking the box, the user can program the processor to stop the flow file on that event and not send it further down the stream.
GetFile Scheduling
These are the following scheduling options offered by the GetFile processor:
Schedule Strategy
You can schedule a process based on time by selecting time-driven or a specific CRON string by selecting the CRON driver option.
Concurrent Tasks
This option is used to specify the number of concurrently running tasks for this processor.
Execution
The user can specify whether to run the processor on all nodes or only on the primary node by using this option.
Run Schedule
It is used to determine a time-driven strategy or the CRON expression for a CRON-driven strategy.
etFile Properties
GetFile offers several properties, as shown in the image below, ranging from mandatory properties such as Input directory and file filters to optional properties such as Path Filter and Maximum File Size. A user can manage the file retrieval process using this property.
GetFile Comments
This section is used to specify any information about the processor.
PutFile
The PutFile processor saves files from the data stream to a specific location.
PutFile Configuration
The PutFile processor has the following settings :
Name
In the Name setting, the user can specify any name for the processor either according to the project or create a more meaningful name.
Enable
The user can enable or disable the processor using this setting.
Penalty Duration
This setting allows the user to add a penalty time duration in a flow file failure.
Yield Duration
This setting is used to determine the result time for the processor. Within this duration, the process is no longer scheduled.
Bulletin Level
This setting is used to determine the log level of the processor.
Automatically Terminate Relationships
It has a list of all available relationships of a given process. By checking the box, the user can program the processor to stop the flow file on that event and not send it further down the stream.
PutFile Scheduling
These are the following scheduling options offered by the GetFile processor:
Schedule Strategy
You can schedule a process based on time by selecting time-driven or a specific CRON string by selecting the CRON driver option.
Concurrent Tasks
This option is used to specify the number of concurrently running tasks for this processor.
Execution
The user can specify whether to run the processor on all nodes or only on the primary node by using this option.
Run Schedule
It is used to determine a time-driven strategy or the CRON expression for a CRON-driven strategy.
PutFile Properties
The PutFile processor provides properties such as Directory to specify the output directory for file transfer purposes and others to manage transfers as shown in the screenshot below.
PutFile Comments
This section is used to specify any information about the processor.