Available Processor
Available Processor
To create an effective data flow, users must understand what types of Processors are available. NiFi contains many different Processors available. These processors provide the ability to ingest data from multiple systems, route, transform, process, split, and aggregate data, and distribute data across multiple systems. The number of Processors available increases with almost every NiFi release. Therefore, we will not try to name every available Processor, but we will highlight some of the most used Processors, categorizing them by function.
Data Transformation
- CompressContent: Compress or Decompress Content
- ConvertCharacterSet: Change the character set used to encode content from one character set to another
- EncryptContent: Encrypt or Decrypt Content
- ReplaceText: Use Regular Expressions to modify text
- TransformXml: Use XSLT transform to XML Content
- JoltTransformJSON: Use the JOLT specification to transform JSON Content
Routing and Mediation
- ControlRate: Reduce the speed of data flow through one section of the stream
- DetectDuplicate: Monitor duplicate FlowFiles, based on multiple user-defined criteria. Often used in conjunction with HashContent
- DistributeLoad: Load balance or sample data by distributing only part of the data to each user-defined Relationship
- MonitorActivity: Sends a notification when a user-defined period of time elapses without any data coming through a certain point in the stream. Optionally, send a notification when the data flow resumes.
- RouteOnAttribute: Routes a FlowFile based on the attributes it contains.
- ScanAttribute: Saves the user-defined Attribute set in a FlowFile, checking to see if any of the Attributes matches the terms found in the user-defined dictionary.
- RouteOnContent: Look for the Contents of the FlowFile to see if it matches the user-defined Regular Expression. If so, the FlowFile is routed to the configured relationship.
- ScanContent: Search FlowFile Content for terms that are in the user definition dictionaries and routes based on the presence or absence of the term. A dictionary can consist of textual entries or binary entries.
- ValidateXml: XML Content Validation against XML Schema; FlowFile route by whether or not FlowFile Content is valid according to user defined XML Scheme.
Database Access
- ConvertJSONToSQL: Convert a JSON document into a SQL INSERT or UPDATE command which can then be passed to the PutSQL Processor
- ExecuteSQL: Executes a user-defined SQL SELECT command, writing the result to a FlowFile in Avro format
- PutSQL: Updating the database by executing the SQL DDM statement defined by the contents of the FlowFile
- SelectHiveQL: Executes the user-defined HiveQL SELECT command against the Apache Hive database, writing the result to a FlowFile in Avro or CSV format
- PutHiveQL: Update Hive database by executing HiveQL DDM statement defined by FlowFile content
Attribute Extraction
- EvaluateJsonPath: The user supplies a JSONPath Expression (Similar to XPath, which is used for XML parsing/extraction), and this Expression is then graded against the JSON Content to replace the FlowFile Content or extract the values into a user-named Attribute.
- EvaluateXPath: The user supplies an XPath Expression, and this Expression is then evaluated against the XML Content to replace the FlowFile Content or extract the values into a user-named Attribute
- EvaluateXQuery: The user supplies an XQuery query, and this query is then evaluated against the XML Content to replace the FlowFile Content or extract the values into the user-named Attributes
- ExtractText: The user supplies one or more Regular Expressions which are then evaluated against the textual content of the FlowFile, and the extracted values are then appended as user-named attributes.
- HashAttribute: Performs a hashing function on concatenating a list of user-named Attributes.
- HashContent: Performs a hashing function on the contents of the FlowFile and adds the hash value as an Attribute.
- IdentifyMimeType: Evaluates the contents of a FlowFile to determine what type of file the FlowFile encapsulates. This processor is capable of detecting various MIME Types, such as images, word processing documents, text, and compressed formats.
- UpdateAttribute: Add or update a number of user-defined Attributes to a FlowFile. This is useful for adding statically configured values, as well as dynamically decrementing Attribute values by using Expression Language. The processor also provides an "Advanced User Interface," which allows users to conditionally update Attributes, based on defined user-supplied-rules.
System Interaction
- ExecuteProcess: Executes user-defined Operating System commands. StdOut Processes are routed in such a way that content written to StdOut becomes content of the outgoing FlowFile. This processor is a Source Processor - its output is expected to produce a new FlowFile, and system calls are not expected to accept input. To provide input to the process, use the ExecuteStreamCommand Processor.
- ExecuteStreamCommand: Executes user-defined Operating System commands. The contents of the FlowFile are optionally streamed to the StdIn process. Content written to StdOut becomes FlowFile content out hte. This processor cannot be used as a Source processor - it must be fed to incoming FlowFiles to do its work. To perform the same type of functionality as the Source Processor, see ExecuteProcess Processor. For a complete list and detailed descriptions of existing processors, see the Global Menu Help submenu.
Data Ingestion
- GetFile
- GetFTP
- GetSFTP
- GetJMSQueue
- GetJMSTopic
- GetHTTP
- ListenHTTP
- ListenUDP
- GetHDFS
- ListHDFS / FetchHDFS
- FetchS3Object
- GetKafka
- GetMongo
- GetTwitter
Data Egress / Sending Data
- PutEmail
- PutFile
- PutFTP
- PutSFTP
- PutJMS
- PutSQL
- PutKafka
- PutMongo
Splitting and Aggregation
- SplitText
- SplitJson
- SplitXml
- UnpackContent
- MergeContent
- SegmentContent
- SplitContent
HTTP
- GetHTTP
- ListenHTTP
- InvokeHTTP
- PostHTTP
- HandleHttpRequest / HandleHttpResponse
Amazon Web Services
- FetchS3Object
- PutS3Object
- PutSNS
- GetSQ:
- PutSQS
- DeleteSQS