Using NiFi

Prev Next

Available in VPC

NiFi is a Dataflow engine, which is a stable system useful for handling and deploying data between different systems. NiFi is one of the ETL tools used to load the data after collecting and handling, and is also an open source that implements FBP concept by collecting and handling vast data from a distributed environment.
NiFi is very suitable for real-time handling, having the advantage of data transfer without data loss.

NiFi components

NiFi is composed of FlowFile, Processor, Connector, and Controller. The following describes each component:
chadoop-nifi-1-1_ko

  • FlowFile is the data unit recognized by NiFi.
    • Content: the data itself
    • Attribute: information related to the data and expressed as a key and value pair
  • Processor is a feature that collects, transforms, and stores FlowFile.
    • A new FlowFile can be created after completing data processing.
    • Multiple processors can operate in parallel.
  • Connector connects processors to each other and transfers FlowFile.
    • Represents the queue of FlowFile.
    • Controls load by setting priorities and backpressure.
  • Flow Controller connects each process and manages the FlowFile that passes between them.

Using NiFi

The following describes how to create Data Flow to transfer the local file to HDFS: The following describes the complete process and step-by-step explanation of how files created in the local directory are transferred to HDFS through NiFi Data Flow:

1. Create GetFile processor
2. Create local file
3. Create HDFS processor
4. Connect processor
5. Check for running and result
6. Troubleshoot

1. Create GetFile processor

To create a GetFile processor in NiFi, follow these steps:

  1. Create a nifi-test directory in the master node's /tmp directory, as shown below.
mkdir /tmp/nifi-test
chown nifi /tmp/nifi-test
  1. Drag the processor from the NiFi Web GUI component toolbar to the canvas.
    chadoop-nifi-1-2-3_ko

  2. Create and set the GetFile processor in the NiFi Web GUI.
    chadoop-nifi-1-2-4_ko

  3. Right-click the GetFile processor and click the [Configure] button.

  4. Click the [PROPERTIES] tab.

  5. Enter the location of the nifi-test directory from the input directory list and click the [APPLY] button.
    chadoop-nifi-1-2_ko

2. Create local file

To create a temporary file in a local environment, follow these steps: Create a test1.txt file using vi as shown below.

[irteamsu@dev-nch271-ncl nifi-test]$ vi test1.txt

3. Create HDFS processor

To create an HDFS processor in NiFi, follow these steps:

  1. Create an HDFS directory for the data to be saved in the local environment as shown below.
sudo -u hdfs hdfs dfs -mkdir -p /user/nifi
sudo -u hdfs hdfs dfs -chown nifi /user/nifi
  1. Create a PutHDFS processor on the canvas through the NiFi GUI in the NiFi Web GUI.
  2. Right-click the PutHDFS processor and click the [Configure] button.
  3. Click the [PROPERTIES] tab and enter the property as shown below.
  • Hadoop Configuration Resources : /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
  • Directory : /user/nifi
  1. Click the [RELATIONSHIPS] tab and check all of the Terminate from Failure and Success.
    chadoop-nifi-3

4. Connect processor

To connect the PutHDFS processor and the GetFile processor in NiFi, follow these steps:

  1. Move the mouse cursor to the GetFile processor in the NiFi Web GUI and drag the connect icon to connect to the PutHDFS processor.
  2. The complete Data Flow is shown below:
    chadoop-nifi-1-3_ko

5. Run and view result

To check the NiFi Data Flow running in a local environment and the result, follow these steps:

  1. Access the NiFi Web GUI.

  2. Right-click the GetFile processor, and click the [Start] button.

  3. Right-click the PutHDFS processor and click [Start].

  4. To check the /tmp/nifi-test directory, enter the following in the local environment:

    [irteamsu@dev-nch271-ncl nifi-test]$ pwd
     /tmp/nifi-test
    [irteamsu@dev-nch271-ncl nifi-test]$ ls
    
    • The file from the local environment was transferred to HDFS through NiFi Data Flow, so the file from /tmp/nifi-test directory does not exist.
  5. To check the files transferred to HDFS, enter the following commands in your local environment:

    [irteamsu@dev-nch271-ncl nifi-test]$ sudo -u hdfs hdfs dfs -ls /user/nifi
    Found 1 items
    -rw-r--r--   2 nifi hdfs          4 2023-08-29 17:33 /user/nifi/test1.txt
    [irteamsu@dev-nch271-ncl nifi-test]$ sudo -u hdfs hdfs dfs -cat /user/nifi/test1.txt   
    
    • The file from the local environment was transferred to HDFS through NiFi Data Flow.
Note

The file is automatically transferred to HDFS when the file is created within the /tmp/nifi-test directory of the local environment, and the file in the local environment is deleted.

6. Troubleshoot

When a problem occurs in Data Flow, you can troubleshoot by checking Data Provenance. To check Data Provenance, follow these steps:

  1. Right-click the processor in the NiFi Web GUI to check.
  2. Click [View data provenance].
  3. You can see the Data Provenance information below.
    chadoop-nifi-1-4_ko

chadoop-nifi-1-5_ko