Data Flow concept

Job and data pipeline

The data pipeline that can be configured with a Data Flow is shown as a structure diagram as follows:

Data pipeline is composed of data source (source node), data collection or conversion (convert node), and data storage (target node).

Job (ETL job) is to extract data from the data source, convert it, and then save it in the target node.

Workflow connects many jobs sequentially, reserves the job schedule, and manages the event.

Trigger is a file defining the job schedule.

A lot of jobs and workflow can be created, and the Data Flow dashboard is used to monitor the running status.

NAVER Cloud Platform's Object Storage and Data Catalog can be used as source node and target node.

The execution script and history of the job are saved in Object Storage.

The schema and detailed information of the data read from the Data Catalog table can be used.

The data uploaded to Object Storage buckets can be converted in bulk and saved.

Note

The January 2024 release version has the following service limits:

Data source and target support Object Storage, Cloud DB for MySQL, and Data Catalog. The future plan is to support the integration of NAVER Cloud Platform Cloud DB and the on-premise database of customers.
The workflow is designed to combine and configure vast triggers and jobs. In the current release, however, the workflow can only configure 1 trigger and 1 job. Check the future release for multiple trigger and job configuration, and event node configuration.

Applied example

The Data Flow use scenario can be configured in various ways. This guide helps understand Data Flow by providing use scenario examples.

Column merge use scenario

Open 2 JSON files, merge the columns, and save as CSV file.

In Object Storage, create 1 bucket to store the target data and 2 buckets to store the JSON files in each

Upload each JSON file to the 2 buckets

Create the column merge conversion job of the 2 data from Data Flow

Create execute workflow by setting job and trigger

Check the CSV file saved in the Object Storage bucket when the workflow is executed according to the trigger

Property definition use scenario

Open the data table and define the schema of the target node.

Create a data table from Data Catalog

Create 1 bucket to store target data from Object Storage

Create the property-defined conversion job of the table from Data Flow

Create execute workflow by setting job and trigger

Check the schema file saved in the Object Storage bucket when the workflow is executed according to the trigger

Job and data pipeline