- Print
- PDF
Using NiFi
- Print
- PDF
Available in Classic
NiFi is a Dataflow engine, which is a stable system useful for handling and deploying data between different systems. NiFi is one of the ETL tools used to load the data after collecting and handling, and is also an open source that implements FBP concept by collecting and handling vast data from a distributed environment.
NiFi is very suitable for real-time handling, having the advantage of data transfer without data loss.
NiFi components
NiFi is composed of FlowFile, Processor, Connector, and Controller, and the following describes each component.
- FlowFile: the data unit recognized by NiFi
- Content: the data itself
- Attribute: information related to the data and expressed as key and value
- Processor: a feature that saves, changes, and collects FlowFile
- A new FlowFile can be created after completing data handling
- Processor can operate in multiple parallel
- Connector: connects a processor with another processor and forwards the FlowFile
- It is the queue of the FlowFile
- Set the backpressure and priority to adjust the load
- Flow Controller: connects each process and manage the exchanged FlowFile
Use NiFi
The following describes how to create Data Flow to transfer the local file to HDFS. The following shows the order and the description of each order when the file created in the local directory is transferred to HDFS via NiFi Data Flow.
1. Create GetFile processor
2. Create local file
3. Create HDFS processor
4. Connect processor
5. Check for running and result
6. Troubleshooting
1. Create GetFile processor
The following describes how to create a GetFile processor in NiFi.
- Create a nifi-test directory in the /tmp directory from the local environment as shown below.
mkdir /tmp/nifi-test
chown nifi /tmp/nifi-test
- Drag the processor from the nifi web GUI component toolbar to the canvas.
- Create and set the GetFile processor in the nifi web GUI.
- Right-click the GetFile processor and click the [Configure] button.
- Click the [Properties] tab.
- Enter the location of the nifi-test directory from the input directory list and click the [Apply] button.
2. Create local file
The following describes how to create a temporary file in a local environment. Create a test1.txt file using vi as shown below.
[irteamsu@dev-nch271-ncl nifi-test]$ vi test1.txt
3. Create HDFS processor
The following describes how to create a HDFS processor in NiFi.
- Create an HDFS directory for the data to be saved in the local environment as shown below.
sudo -u hdfs hdfs dfs -mkdir -p /user/nifi
- Create PutHDFS processor on the canvas via NiFi GUI in NiFi web GUI.
- Right-click the PutHDFS processor and click the [Configure] button.
- Click the [Properties] tab and enter the property as shown below.
- Hadoop Configuration Resources : /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
- Directory : /user/test/nifi
- Click the [Relationships] tab and check all of the Terminate from Failure and Success.
4. Connect processor
The following describes how to connect the PutHDFS processor and the GetFile processor in NiFi.
- Move the mouse cursor to GetFile processor in NiFi web GUI and drag the connect icon to connect to PutHDFS processor.
- The complete Data Flow is shown below.
5. Check for running and result
The following describes how to check the NiFi Data Flow running in a local environment and the result.
Access NiFi web GUI.
Right-click the GetFile processor, and click the [Start] button.
Right-click the PutHDFS processor, and click the [Start] button.
To check the /tmp/nifi-test directory, enter the following in the local environment.
[irteamsu@dev-nch271-ncl nifi-test]$ pwd /tmp/nifi-test [irteamsu@dev-nch271-ncl nifi-test]$ ls
- The file from the local environment was transferred to HDFS via NiFi Data Flow, so the file from /tmp/nifi-test directory does not exist.
To check the transferred file enter as shown below in local environment.
[irteamsu@dev-nch271-ncl nifi-test]$ sudo -u hdfs hdfs dfs -ls /user/nifi Found 1 items -rw-r--r-- 2 root hdfs 4 2023-08-29 17:33 /user/nifi/test1.txt [irteamsu@dev-nch271-ncl nifi-test]$ sudo -u hdfs hdfs dfs -cat /user/nifi/test1.txt
- The file from the local environment was transferred to HDFS via NiFi Data Flow.
The file is automatically transferred to HDFS when the file is created within the /tmp/nifi-test directory of the local environment, and the file in the local environment is deleted.
6. Troubleshooting
When a problem occurs in Data Flow, you can troubleshoot by checking Data Provenance. The following describes how to check Data Provenance.
- Right-click the processor in the NiFi web GUI to check.
- Click the View data provenance button.
- You can see the Data Provenance information as below: