Job

release/20240425
English

Job

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in VPC

Describes the Job menu screen configuration, job editor screen configuration, job creation procedure, and settings procedure of job execution options.

Job is a data processing job that extracts, converts, and loads vast data.
Data conversion supported by Data Flow has attribute definition, attribute selection, column merging, filter, row merging, count, edit attribute name, delete replica, and fill in empty values.
The source node and target node can specify the Object Storage and Data Catalog from NAVER Cloud Platform. The future plan is to support the integration of NAVER Cloud Platform Cloud DB and the on-premise database of customers.
Job editor is a GUI interface that allows the configuration of an ETL job without a code. It is composed of source node, convert node, and target node in a diagram.

Job screen

The job page is laid out as follows:
dataflow-job-vpc_screen_ko

Area	Description
① Menu name	Name of the menu currently being viewed
② Basic features	Features displayed when initially entering the Job menu [Create job] button: click to create job [Learn more about the product] button: click to move to the Data Flow page [Refresh] button: click to refresh the page
③ Post-creation functions	Features provided after creating a job [Execute] button: execute on-demand for the selected job. Before executing Job execute options provide settings features [Delete] button: delete jobs that are selected
④ Job list	Created job list. Click the [Details] button for each job to move to the job editor screen.
⑤ Search bar	Search created jobs based on job name

Check job information

To check the created job information, follow these steps:

Click the environment you are using in the Region menu and the Platform menu on the NAVER Cloud Platform console.
Click Services > Big Data & Analytics > Data Flow in order.
Click the Jobs menu.
When the job list appears, check the summarized information.
- Job name: unique job name entered by the user when creating a job
- Latest executed date and time: the latest executed date and time of the job. The latest date and time of on-demand execution or reservation execution due to trigger.
- Status: the status of job running
  - Complete: the complete status of job running
  - In progress: the ongoing status of job running
  - Temporary save: the incomplete status of job editing. Click the [Temporary save] button on the editor screen to save temporarily.
- Update date and time: the latest job update date and time. The latest date and time of editing the job component from job editor.
- [Details] button: view the details of the job
Click the [Details] button to view the detailed information of the job configuration.
- Move to Job editor screen configuration to check the settings criteria and node configuration of the job.

Create job

You can configure the job by setting and adding source node, convert node, and target node.

Note

To specify the source node and target node, Data Catalog and Object Storage must be in use. If Data Catalog and Object Storage are not in use, proceed with the request for the service first.

To create new jobs, follow these steps:

Click the environment you are using in the Region menu and the Platform menu on the NAVER Cloud Platform console.
Click Services > Big Data & Analytics > Data Flow in order.
Click the Jobs menu.
Click the [Create jobs] button.
When the job editor screen is displayed, add the source node, convert node, and target node in the [Job configuration] tab to set the job description.
- For more information on the editor screen layout, see Job editor screen layout.
Click the [Source] button on the job editor screen and select Object Storage or Data Catalog from the menu that appears.
- Object Storage: specify the NAVER Cloud Platform Object Storage bucket as the data source
- Data Catalog: specify NAVER Cloud Platform Data Catalog as data source
- Cloud DB for MySQL: specify NAVER Cloud Platform's Cloud DB for MySQL as the data source
After selecting the added source node from 6, enter the Attribute information and Detailed settings of the source node on the right.
- For more information about the entry item, see Source node configuration.
- The number of added source nodes depends on the type of convert node job. For more information, see Convert node configuration.
Click the [Convert] button on the job editor screen and select the convert job from the menu that appears.
- Attribute definition: defines the schema of the target data using the source data. For more information on the settings item, see Attribute definition.
- Select attribute: select the target data configuration attribute from the property key of source data set. For more information on the settings item, see Select attribute.
- Merge column: merges 2 sets of data. For more information on the settings item, see Merge column.
- Filter: filters the entered data set and creates a new data set. For more information on the settings item, see Filter.
- Merge row: merge rows of data sets with more than 2 identical schema. For more information on the settings item, see Merge row.
- Count: calculate the average, total, maximum, and minimum of the selected field and row and create a new field with the result value. For more information on the settings item, see Count.
- Edit attribute name: edits the name of a specific property key from the data. For more information on the settings item, see Edit attribute name.
- Delete replica: deletes data column replica from the data source. For more information on the settings item, see Delete replica.
- Fill in empty values: fill in the value of the omitted column from the data with a set value. For detailed information on the settings item, see Fill in empty value.
Select the convert node added in step 8; on the right side of the screen, fill in Attribute information and Detailed settings.
- For more information about the entry item, see Convert node configuration.
- The number of added convert nodes is 1 per job.
On the job editor screen, click [Target]. In the menu that appears, select Object Storage or Data Catalog.
- Object Storage: specify NAVER Cloud Platform's Object Storage bucket as the data storage
- Data Catalog: specify NAVER Cloud Platform's Data Catalog as the data storage
- Cloud DB for MySQL: specify NAVER Cloud Platform's Cloud DB for MySQL as the data storage
Select the target node added in step 10; on the right side of the screen, fill in Attribute information.
- For more information about the entry item, see Target node configuration.
- Check the schema setting from the Preview column.
Click the [Complete] button on the job editor screen.
- Creating a job is complete, so the screen will convert to the job list screen.
- The created job will be added to the job list.
- The created job will be registered as a NAVER Cloud Platform resource. For more information, see Resource Manager concept.

Note

Select a job from the job list and click the [Execute] button or click [Details] > [Execute] to execute the job with on-demand.
To reserve jobs for execution, create a workflow and connect to a trigger. For more information on workflow creation, see Create workflow.
Bucket will automatically be created in Object Storage when creating a job. The execute log files and script files of the job are saved in the bucket.

Job editor screen configuration

The job editor screen is laid out as follows:
The job editor screen appears when clicking the [Create job] button or [Details] button from the job list.
dataflow-job-vpc_editor_ko

Area	Description
① Basic information	Enter the job name
② Function tab	Select the feature to use [Job configuration] tab: workflow editor screen [Execute list] tab: view history of executed jobs
③ Show node field	Add source node, convert node, and target node. Each node is expressed as a box, and the boxes with connecting line depict the parent node and sub node.
④ Setting field	Attribute settings of each node. Detailed settings if required. For more information on the node settings, see Source node configuration, Convert node configuration, and Target node configuration.
⑤ Toggle button	Depending on the edit status, toggle between the [Temporary storage] button and [Execute] button [Temporary storage] button: temporarily saves the job being edited [Execute] button: execute the edit completed job with on-demand

After adding the job configuration components (source, convert, or target) from the show node field (field number ③) of the job editor [Job configuration] tab, enter the detailed settings and attribute of job configuration components from the settings field (field number ④) of the job editor [Job configuration] tab.
The [Complete] button becomes active when at least 1 of the source node, convert node, or target node is added. The number of added source nodes depends on the type of convert node job.

Source node configuration

Specifies the original node of the data to be converted through source node configuration.
On the job editor, add a [Source] node. Then, on the right side of the screen, fill in Attribute information and Detailed settings.

Note

Selectable source nodes are Object Storage, Data Catalog, and Cloud DB for MySQL. (As of January 2024)
Integration between NAVER Cloud Platform's Cloud DB and client's on-premise database is to be supported in the future.

Source node attribute information

Attribute information entry item differs by type of source node.

When the source node is Object Storage
- Name: enter the name of the source node.
- Data store: Object Storage has been selected. When edited, the entry item is edited.
- Bucket: select the bucket that includes the original data to work on from Object Storage.
- Prefix: specify the specific path of the Object Storage bucket. Based on the specified path of the sub data, when the data is not extracted or entered, the data from all the sub path of the bucket is extracted.
- Data type: enter the format of the original data. Select from JSON, CSV, or Parquet.
When the source node is a Data Catalog
- Name: enter the name of the source node.
- Data store: Data Catalog has been selected.
- Database: select database. Database is a set of metadata-defined table.
- Select table: select a table. Table provides metadata that defines the schema of the data.
- Schema version: select the schema version.
When the target node is Cloud DB for MySQL
- Name: enter the source node's name.
- Data store: Cloud DB for MySQL is selected by default. When edited, the entry item is edited.
- Connection: among Data Catalogs, select Connection.
- Table: enter the DB's table name.

Source node detailed settings

The detailed settings criteria differ by type of source node.

When the source node is Object Storage, the schema table is composed for source data use.
- Click the [Add] button to add a field, and specify the data type and field name.
- For more information on data types, see Schema data type.
When the source node is Data Catalog, the schema table read from Data Catalog is shown.
- The schema table configuration field cannot be added or edited. You can delete a specific property key.
When the source node is Cloud DB for MySQL: the schema table is configured for use as the source data.
- Click the [Add] button to add a field, and specify the data type and field name.
- For more information on data types, see Schema data type.

Convert node configuration

On the job editor, add a [Convert] node. Then, on the right side of the screen, fill in Attribute information and Detailed settings to define the data conversion job.
The conversion setting item differs by type of conversion job. Describes the settings item for each type of job.

Attribute definition

Defines the target data schema by selecting the source data.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: the type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
[Detailed settings] tab: map the schema of target node and source node.
- The source node property key that appears in the Parent node field and the sub node property key that appears in the Sub node field are mapped.
- Sub node field is enabled only when there is a target node added. If a target node is not added, the selected value does not appear.
- The Data type can be edited. The data type of source node can be edited from target node.
  - For more information on data types, see Schema data type.

Select attribute

Select the attribute configuration of the target data from the source data property key. The property key that is not selected is excluded from the target data.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: the type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
[Detailed settings] tab: from the parent node's property keys, select at least 1 property key to send to the sub node.

Merge column

Merges 2 sets of data column. You can select up to 2 from the parent node.
The schema of the data changes after merging.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: the type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specify 2 nodes to connect to the convert node. 2 source nodes need to be created in advance.
[Detailed settings]: set the column merge rules.
- Type: select 1 type of column merging from Internal join, left join, right join, or external join.
  - Internal join: merge the columns of 2 sets of data regarding rows satisfying the merging condition. Rows that do not satisfy the merging condition cannot be merged. If a condition is not added, merge columns regarding all the rows of the 2 data sets.
  - Left join: merge columns based on the left data set rows. Column merging includes the rows of data set on the right side satisfying all of the merging conditions of all rows of data set on the left side.
  - Right join: merge columns based on the right data set rows. Column merging includes the rows of data set on the left side satisfying all of the merging conditions of all rows of data set on the right side.
  - External join: merge columns including all rows of 2 data sets
- Condition: select the property key for mutual comparison from each data set. A condition need not be set.
  - Click the [Add] button to create the Left node field, comparison operator, or right node field table
  - In the Left node field, select the property key of the left data set
  - In the Right node field, enter the property key of the right data set
  - If the property key of the right node field and that of the left node field are identical, merge columns for the corresponding rows
- Prefix: the name of the left node field and right node field cannot be a duplicate, so a prefix is added automatically to the right node field name. You can change the name of the prefix.

Filter

The source data gets filtered and creates a target data. Rows that do not satisfy the filter condition are excluded from the target data.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: the type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
[Detailed settings] tab: set the filtering condition.
- Filter type: select AND or OR. If there are many filters, the filters are combined.
- Condition: set the filtering condition.
  - Click [Add] to create Field, Condition, and Value tables
  - Example: value == 0.7: The value of the value field is numeric type and 0.7, the field is added to the target data
  - Example: value > Car: The value of the value field is character type and the ASCII code value is more than "C," the first letter of the condition, the field is added to the target data

Merge rows

Merge 2 source data with identical schema. You need to check if the schema component of the 2 source data is the same before merging rows.
If the schema is the same, the merged data of the column is identical to before the merge, so a row is added.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: The type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specify 2 source nodes to connect to the convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
Detailed settings > Type: set rule for column merging.
- Merge all: does not exclude replica rows, combines all the columns. Upper case and lower case need to be determined separately when determining if the rows are replicas.
- Merging after excluding replicas: combine all the rows after excluding the replica rows

Counting

Save the calculated average, total, maximum, and minimum of the selected field and row from the source data and create a new field with the result value.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: the type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
[Detailed settings] tab: select the data field for counting and set the counting function and result field to apply to the row.
- Grouping standard: specify the standard field showing the counting area. \<example> the value field counts regarding the AAA data
- Counting condition: set the result field and the counting function
  - Click [Add] to create Field, Condition, and Result field tables
  - Field: select the property key of the counting- applied source data
  - Condition: select the counting function applied to the data of the selected area. AVG/SUM/MAX/MIN.
  - Result field: specify the new field name to save the counting result

Edit attribute name

Edits the specific property key name in the data.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: the type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
[Detailed settings] tab: on the Current key name / Edited key name table read from the source node schema, edit the Edited key name of the desired property key.

Delete replica

Delete data row replicas from the data source. Upper case and lower case are distinguished when checking for replica. Since a row is deleted, the schema is not edited due to this conversion.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: the type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
Detailed settings > Replica type: select the delete replica option.
- Delete if all rows are identical: delete rows if all of the field values are identical. Upper case and lower case need to be determined separately when determining if the rows are replicas.
- Delete if specific fields are identical: delete only if specific fields' values are identical, and targets will be deleted regardless of order.

Fill in the empty value

Fill in the value of the omitted column with the set value in the data.

[Attribute information] tab: defines the attribute of the conversion job.
- Name: enter the convert node's name.
- Convert: the type of conversion job is selected. When edited, the entry item is edited.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
Detailed settings: defines the property key to the existing omitted data and sets a replacement value.
- Target key of omitted data: delete except for the property key with existing omitted data
- Replacement value: enter the replacement value for the omitted data

Target node configuration

Specify the target node of the data to be converted through target node configuration.
On the job editor, add a [Target] node. Then, on the right side of the screen, fill in Attribute information and Detailed settings.

Note

Selectable target nodes are Object Storage, Data Catalog, and Cloud DB for MySQL. (As of January 2024)
Integration between NAVER Cloud Platform's Cloud DB and client's on-premise database is to be supported in the future.

Target node attribute information

The Attribute information fields differ by target node type.

When the target node is Object Storage
- Name: enter the target node's name.
- Data store: Object Storage has been selected. When edited, the entry item is edited.
- Bucket: select the bucket that saves the conversion data from Object Storage.
- Prefix: specify the specific path of the Object Storage bucket. Save the result data on the specified sub path.
- Data format: enter the format of the target data. Select from JSON, CSV, or Parquet.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
When the target node is Data Catalog
- Name: enter the target node's name.
- Data store: Data Catalog has been selected. When edited, the entry item is edited.
- Database: select database. Database is a set of metadata defined table.
- Select table: select the saved table with edited schema through convert node.
- Schema version: select the schema version.
- Update option: select from update the entire table, add new column, or no update.
- Parent node: specify 1 convert node to connect with a target node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.
When the target node is Cloud DB for MySQL
- Name: enter the target node's name.
- Data store: Cloud DB for MySQL is selected by default. When edited, the entry item is edited.
- Connection: among Data Catalogs, select Connection.
- Table: select a table where to save the schema edited through the convert node.
- Parent node: specifies 1 source node to be connected with a convert node. When selecting a Data node, 1 of the source node can be selected; when selecting Process node, 1 of the convert node can be selected.

Preview column

Preview the schema of data to be saved in the target node.

Note

The supported types of the source and target are as follows: (As of April 2024)
Void, Boolean, Tinyint, Smallint, Int, Bigint, Float, Double, String, Char, Varchar, Date, Datetime, Timestamp, Decimal, Binary, Array, Map, Struct, and Uniontype

When some types are converted into MySQL, they are converted into the following fixed types:
Varchar -> varchar(250), Char -> char(64), Array, Map, Struct, String -> mediumtext

Set the job execution option

Set the job execution option after you create a job. To set a job execution option, follow these steps:

Click the environment you are using in the Region menu and the Platform menu on the NAVER Cloud Platform console.
Click Services > Big Data & Analytics > Data Flow in order.
Click the Jobs menu.
Select a specific job from the job list and click the [execute] button.
When the execution option popup window appears, set the execution option.
- Execute container: set how many containers to use for distributed jobs
- Number of retries: set the maximum number of retries upon job failure
- Timeout: set the waiting time of the job result when the job is executed once
- Script path: path for the job command script to be saved. Automatically specifies the sub path of the automatically created Object Storage bucket when a job is created.
- Execution log: path where the job execution history is saved. Automatically specifies the sub path of the automatically created Object Storage bucket when a job is created.
- Role name: role of SubAccount in job execution.
Click the [Execute] or the [Save option without execute] button.
- If you click [Execute], the job's status is changed to Running on the job list.

Note

If you are using Cloud DB as the source or target node, check if the DB server's network environment or user settings permit access through the following DataFlow access IP addresses:
10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16

In Server > ACG > ACG settings, add to inbound rules
In VPC > Network ACL > ACL Rule > Rule settings, add to inbound rules
In Cloud DB for MySQL > Manage DB > Manage DB user, add DB user
- 10.%, 172.%, 192.168.%

View job execution list

To view the job execution history, follow these steps:

Click the environment you are using in the Region menu and the Platform menu on the NAVER Cloud Platform console.
Click Services > Big Data & Analytics > Data Flow in order.
Click the Jobs menu.
Click the [Details] button for the specific job from the job list.
When the job editor screen appears, click the [Execution list] tab.
- You can check the job execution list of the recent month. The job execution list is kept for 90 days.
- The items you can view in the execution list are as follows:
  - Job name (ID): unique job name (Job ID) entered by the user when creating the job
  - Execute status: execution result of the job. Success, fail, executing, or standby can be viewed.
  - Execute log: click the [details] button to move to the location of the job execution history file
  - Container: number of containers set in the Job execution option
  - Trigger: when a job is connected to a trigger (schedule) file, it can be searched
  - Execution start date and time: the starting date and time of the executed job. The executed date and time if executed due to on-demand or trigger.
  - Execution end time: time when the job execution ended. The end date and time if executed due to on-demand or trigger.
  - Execute preparation time: time prepared to execute a job
  - Execution time: total time taken for the job execution
  - Number of retries: number of job execution retries made

Note

If the job is executed with on-demand only without workflow configuration, the execution history can be searched from the execution list of the job screen only.
Jobs with workflow configuration can be searched from the execution list of the workflow screen, including the execution list of the job screen.

Delete job

To delete a job, follow these steps:

Click the environment you are using in the Region menu and the Platform menu on the NAVER Cloud Platform console.
Click Services > Big Data & Analytics > Data Flow in order.
Click the Jobs menu.
Select the specific job from the job list and click the [Delete] button.
- The job is deleted from the job list.
- The workflow that includes the deleted jobs will not execute even if reserved by trigger.

Was this article helpful?

What's Next

Trigger

Table of contents

Job screen
Check job information
Create job
Set the job execution option
View job execution list
Delete job