Using NiFi
    • PDF

    Using NiFi

    • PDF

    Article Summary

    Available in Classic

    NiFi is a Dataflow engine, which is a stable system useful for handling and deploying data between different systems. NiFi is one of the ETL tools used to load the data after collecting and handling, and is also an open source that implements FBP concept by collecting and handling vast data from a distributed environment.
    NiFi is very suitable for real-time handling, having the advantage of data transfer without data loss.

    NiFi components

    NiFi is composed of FlowFile, Processor, Connector, and Controller, and the following describes each component.
    chadoop-nifi-1-1_ko

    • FlowFile: the data unit recognized by NiFi
      • Content: the data itself
      • Attribute: information related to the data and expressed as key and value
    • Processor: a feature that saves, changes, and collects FlowFile
      • A new FlowFile can be created after completing data handling
      • Processor can operate in multiple parallel
    • Connector: connects a processor with another processor and forwards the FlowFile
      • It is the queue of the FlowFile
      • Set the backpressure and priority to adjust the load
    • Flow Controller: connects each process and manage the exchanged FlowFile

    Use NiFi

    The following describes how to create Data Flow to transfer the local file to HDFS. The following shows the order and the description of each order when the file created in the local directory is transferred to HDFS via NiFi Data Flow.

    1. Create GetFile processor
    2. Create local file
    3. Create HDFS processor
    4. Connect processor
    5. Check for running and result
    6. Troubleshooting

    1. Create GetFile processor

    The following describes how to create a GetFile processor in NiFi.

    1. Create a nifi-test directory in the /tmp directory from the local environment as shown below.
    mkdir /tmp/nifi-test
    chown nifi /tmp/nifi-test
    
    1. Drag the processor from the nifi web GUI component toolbar to the canvas.
      chadoop-nifi-1-2-3_ko
    2. Create and set the GetFile processor in the nifi web GUI.
      chadoop-nifi-1-2-4_ko
    3. Right-click the GetFile processor and click the [Configure] button.
    4. Click the [Properties] tab.
    5. Enter the location of the nifi-test directory from the input directory list and click the [Apply] button.
      chadoop-nifi-1-2_ko

    2. Create local file

    The following describes how to create a temporary file in a local environment. Create a test1.txt file using vi as shown below.

    [irteamsu@dev-nch271-ncl nifi-test]$ vi test1.txt
    

    3. Create HDFS processor

    The following describes how to create a HDFS processor in NiFi.

    1. Create an HDFS directory for the data to be saved in the local environment as shown below.
    sudo -u hdfs hdfs dfs -mkdir -p /user/nifi
    
    1. Create PutHDFS processor on the canvas via NiFi GUI in NiFi web GUI.
    2. Right-click the PutHDFS processor and click the [Configure] button.
    3. Click the [Properties] tab and enter the property as shown below.
    • Hadoop Configuration Resources : /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
    • Directory : /user/test/nifi
    1. Click the [Relationships] tab and check all of the Terminate from Failure and Success.
      chadoop-nifi-3

    4. Connect processor

    The following describes how to connect the PutHDFS processor and the GetFile processor in NiFi.

    1. Move the mouse cursor to GetFile processor in NiFi web GUI and drag the connect icon to connect to PutHDFS processor.
    2. The complete Data Flow is shown below.
      chadoop-nifi-1-3_ko

    5. Check for running and result

    The following describes how to check the NiFi Data Flow running in a local environment and the result.

    1. Access NiFi web GUI.

    2. Right-click the GetFile processor, and click the [Start] button.

    3. Right-click the PutHDFS processor, and click the [Start] button.

    4. To check the /tmp/nifi-test directory, enter the following in the local environment.

      [irteamsu@dev-nch271-ncl nifi-test]$ pwd
       /tmp/nifi-test
      [irteamsu@dev-nch271-ncl nifi-test]$ ls
      
      • The file from the local environment was transferred to HDFS via NiFi Data Flow, so the file from /tmp/nifi-test directory does not exist.
    5. To check the transferred file enter as shown below in local environment.

      [irteamsu@dev-nch271-ncl nifi-test]$ sudo -u hdfs hdfs dfs -ls /user/nifi
      Found 1 items
      -rw-r--r--   2 root hdfs          4 2023-08-29 17:33 /user/nifi/test1.txt
      [irteamsu@dev-nch271-ncl nifi-test]$ sudo -u hdfs hdfs dfs -cat /user/nifi/test1.txt   
      
      • The file from the local environment was transferred to HDFS via NiFi Data Flow.
    Note

    The file is automatically transferred to HDFS when the file is created within the /tmp/nifi-test directory of the local environment, and the file in the local environment is deleted.

    6. Troubleshooting

    When a problem occurs in Data Flow, you can troubleshoot by checking Data Provenance. The following describes how to check Data Provenance.

    1. Right-click the processor in the NiFi web GUI to check.
    2. Click the View data provenance button.
    3. You can see the Data Provenance information as below:
      chadoop-nifi-1-4_ko

    chadoop-nifi-1-5_ko


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.