Data Flow concept
    • PDF

    Data Flow concept

    • PDF

    Article Summary

    Available in VPC

    Before learning about the entire scenario using Data Flow, the concepts of Data Flow are explained using examples.

    Note

    To aid in understanding of Data Flow concepts, see glossary.

    Job and data pipeline

    The data pipeline that can be configured with a Data Flow is shown as a structure diagram as follows:

    dataflow-info_conceptual_diagram_ko

    • Data pipeline is composed of data source (source node), data collection or conversion (convert node), and data storage (target node).
    • Job (ETL job) is to extract data from the data source, convert it, and then save it in the target node.
    • Workflow connects many jobs sequentially, reserves the job schedule, and manages the event.
    • Trigger is a file defining the job schedule.
    • A lot of jobs and workflow can be created, and the Data Flow dashboard is used to monitor the running status.
    • NAVER Cloud Platform's Object Storage and Data Catalog can be used as source node and target node.
    • The execution script and history of the job are saved in Object Storage.
    • The schema and detailed information of the data read from the Data Catalog table can be used.
    • The data uploaded to Object Storage buckets can be converted in bulk and saved.
    Note

    The January 2024 release version has the following service limits:

    • Data source and target support Object Storage, Cloud DB for MySQL, and Data Catalog. The future plan is to support the integration of NAVER Cloud Platform Cloud DB and the on-premise database of customers.
    • The workflow is designed to combine and configure vast triggers and jobs. In the current release, however, the workflow can only configure 1 trigger and 1 job. Check the future release for multiple trigger and job configuration, and event node configuration.

    Applied example

    The Data Flow use scenario can be configured in various ways. This guide helps understand Data Flow by providing use scenario examples.

    Column merge use scenario

    Open 2 JSON files, merge the columns, and save as CSV file.

    1. In Object Storage, create 1 bucket to store the target data and 2 buckets to store the JSON files in each
    2. Upload each JSON file to the 2 buckets
    3. Create the column merge conversion job of the 2 data from Data Flow
      1. Create 2 source nodes and specify the 2 JSON files
      2. Set column merge conversion
      3. Set the bucket settings in the target node and data type as CSV
    4. Create execute workflow by setting job and trigger
    5. Check the CSV file saved in the Object Storage bucket when the workflow is executed according to the trigger

    Property definition use scenario

    Open the data table and define the schema of the target node.

    1. Create a data table from Data Catalog
    2. Create 1 bucket to store target data from Object Storage
    3. Create the property-defined conversion job of the table from Data Flow
      1. Set the source node to table and the schema version
      2. Set the property-defined conversion and schema mapping for each target node
      3. Set the data type and bucket settings in target node
    4. Create execute workflow by setting job and trigger
    5. Check the schema file saved in the Object Storage bucket when the workflow is executed according to the trigger

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.