Add and manage data source

Available in VPC

This section describes how to add data sources for queries and manage registered data.

Add data source

Data sources for queries can be added through connections.
To add a data source, follow these steps:

In the VPC environment on the NAVER Cloud Platform console, navigate to Menu > Services > Big Data & Analytics > Data Query.
Click the Query Editor menu.
Click [Add data source] at the top of the interface.
- Alternatively, click [Manage data source], then click [Add data source] in the popup.
Enter the information for the data source you want to register.
- Name: Enter up to 50 characters using English uppercase and lowercase letters, numbers, and underscores (_). The first character must be an English letter.
- Source type: Select a single supported JDBC (various source types to be added later).
- Connection: Select a connection created for the data you want to use from those registered in Data Catalog.
Click [Create].
- The added data source is displayed in the Data source dropdown menu and in the list shown when you click the [Manage data source].
- By selecting an item from the Data source drop-down menu, you can view the data imported through the connection in a tree format.
- You are now ready to run queries using the database of the added data source. For more information on how to run queries, see Run and manage query.

Note

The data source is created by referencing the connection information in the Data Catalog service. If the connection information in the Data Catalog service changes after you add a data source, delete the data source and add it again.

Data source details

Connection and usage methods may vary depending on the data source type.

Data Catalog

Data Query integrates schema information collected from the Data Catalog service for direct use in queries. Data Catalog is automatically added upon subscription, without a separate data source registration process.

Caution

The following tables collected in Data Catalog are not available for viewing data in Data Query:

Tables with their location specified as individual files
- As in the following example, if a table's location is specified as a file, Data Query cannot retrieve data from the table.
  Example:
  - Location where you can view data: s3a://test-bucket/database-name/table-name/
  - Location where you cannot view data: s3a://test-bucket/database-name/table-name/data.csv
    - In Data Catalog, tables scanned in this manner typically contain multiple data structures within a single directory. If you separate the data into individual directories in Object Storage and then run the scanner, the table location is correctly recognized as a directory.
Tables with their location set to an internal path within the service
- CLOUD_DB_FOR_MYSQL
- CLOUD_DB_FOR_MSSQL
- CLOUD_DB_FOR_POSTGRESQL
- CLOUD_DB_FOR_MONGODB
- JDBC
- Tables with their location set to HDFS of Cloud Hadoop
- Among these types of tables, those using MySQL, MSSQL, and PostgreSQL can view data after creating and connecting a JDBC data source. (A JDBC Connection can be created in Data Catalog using a public IP address.)

Public Data

The Data Query service provides public data sources that can be used for queries by default.
Selecting public_data from the data source list displays it. The specific list of databases included in the source is subject to change.

Database name	Table name	Data description
data_naver_cloud_service	vpc_flowlog	NAVER Cloud service Flow Log data examples Note: Flow Log configuration guide
incheon_airport	passenger_flight_schedule_summer_arrival passenger_flight_schedule_summer_departure passenger_flight_schedule_winter_arrival passenger_flight_schedule_winter_departure	Incheon International Airport Corporation Passenger flight's summer schedule for departure Passenger flight's summer schedule for arrival Passenger flight's winter schedule for departure Passenger flight's winter schedule for arrival
incheon_airport	cargo_flight_schedule_summer_arrival cargo_flight_schedule_summer_departure cargo_flight_schedule_winter_arrival cargo_flight_schedule_winter_departure	Incheon International Airport Corporation Cargo flight's summer schedule for departure Cargo flight's summer schedule for arrival Cargo flight's winter schedule for departure Cargo flight's winter schedule for arrival
korea_national_railway	subway_busan subway_seoul_capital_area	Korea National Railway Busan subway transfer information Location of Seoul Line 1 subway stations
korea_trade_insurance	exchange_rate guaranteed_exchange_rate	Korea Trade Insurance Corporation Exchange rate information Guaranteed exchange rate information
ministry_economy_finance	foreign_exchange_reserves	Ministry of Economy and Finance Foreign exchange reserve
ministry_land_infra_transport	public_land_value nationwide_bus_stop_location	Ministry of Land, Infrastructure, and Transport Standard official land value Location of national bus stops
national_health_insurance_service	health_screening emergency_room_visits giving_birth_business_size	National Health Insurance Service Medical checkup information Emergency room visit status of a certain disease Number of postpartum women by business establishment size
national_pension_service	pension_enrolled_business_establishment	National Pension Service Details of business establishments subscribed to the national pension
national_tax_service	business_status_age_group business_status_gender business_status_years_of_establishment business_status_top_100_essential	National Tax Service Business status by age Business status by gender Business status by years of establishment Status of new business establishments for the top 100 essential businesses (by business type and month)

JDBC

The Data Query service supports JDBC connections, allowing you to easily connect to databases. This enables a variety of database operations, such as viewing, editing, and deleting data.
Databases supported for connection with Data Query JDBC
- MySQL (supports version compatibility with Cloud DB for MySQL)
- MSSQL (supports version compatibility with Cloud DB for MSSQL)
- PostgreSQL (supports version compatibility with Cloud DB for PostgreSQL)
- MongoDB (supports version compatibility with Cloud DB for MongoDB)

Caution

For JDBC connections, you need to allow access from the following IP addresses to the user-relational database and network environment.

Data Query access IP: 223.130.128.167
- To allow access for metadata and data to your DB from Data Query, follow these steps:
- For example, in Server > ACG > ACG settings, add [TCP, 223.130.128.167, 사용자 DB port 번호] to inbound rules.
- For example, in Cloud DB for MySQL > Manage DB > Manage DB user, add DB user and access IP (223.130.128.167).
Data Catalog access IP: 110.165.25.5
- When you create a JDBC connection, you must also add the 110.165.25.5 IP address, as shown in the Data Catalog Connection creation guide.
- To confirm connection to the user's DB from Data Catalog, add access permission.
- Add 110.165.25.5 to ACG, DB user & access IP in the same way as for Data Query above.

Caution

For JDBC MongoDB integrations, you need READ_WRITE permissions. For more information, see Cloud DB for MongoDB.

Delete data source

You can stop integrations for the data sources that are no longer used for queries by deleting them.
To delete a data source, follow these steps:

Note

Deleting a data source disconnects the integration, making it unavailable only in the Data Query service, without affecting the original data or the Data Catalog service.

In the VPC environment on the NAVER Cloud Platform console, navigate to Menu > Services > Big Data & Analytics > Data Query.
Click the Query Editor menu.
Click [Manage data source] at the top of the interface.
In the data source popup, select the checkbox for the item you want to delete and click [Delete].
In the notification popup, click [OK].
- Integrations are terminated and disappear from all data source lists.

Manage table

Registering a data source allows you to view the internal table fields in a tree format. At this time, you can easily manage tables by using the additional features provided for each table. The provided features are as follows:

Preview table: Automatically enters a query in the query window to preview the table content.
Create table DDL: Analyzes the table and automatically enters a DDL statement in the query window to create the table. Can be used for copying or editing tables.
Delete table: Automatically enters a query statement in the query window to delete the table.
View catalog: Go to the catalog in Data Catalog where the table is registered.

To run the additional features for table management, follow these steps:

In the VPC environment on the NAVER Cloud Platform console, navigate to Menu > Services > Big Data & Analytics > Data Query.
Click the Query Editor menu.
Hover the cursor over the options menu next to the table you want in the data source tree component.
- The table is classified by marks.
Select and click the desired feature.
- Once you have selected the feature of automatically entering a query statement, you can run the query by clicking [Run].
- If you have selected View catalog, the Table menu page of the Data Catalog service will be displayed in a new window.

Add table manually

You can register the data file to the table by running the table creation SQL statement in Data Query or using the table creation feature of Data Catalog. To add a table, follow these steps:

In the VPC environment on the NAVER Cloud Platform console, navigate to Menu > Services > Big Data & Analytics > Data Query.
Click the Query Editor menu.
In the data source component, click the "+" button next to the search bar.
- The "+" button is displayed as .

Create tables with SQL statements

CREATE TABLE

Creates a new table by defining the schema directly and specifying the table properties and the data location.
To create a table in Iceberg table format, see the CREATE TABLE(ICEBERG) syntax.

CREATE TABLE AS SELECT

Creates a new table using existing tables and data.
You can create a table with the same data and schema as the original table.
Alternatively, you can create a table by selecting only the column you need through a SELECT syntax, or by simply transforming the data.
To create a table in Iceberg table format, see the CREATE TABLE AS SELECT(ICEBERG) syntax.

CREATE VIEW

Creates a new VIEW for the SELECT query you entered.
You can create a complex SELECT query as a view and easily reference it later.

Create tables in Data Catalog

Specify the Object Storage path

You can register the data file to the table manually using Data Catalog's Scanner. Enter the information required for scanning and run Scanner.

Database: Select a database for the table created by the scanner.
Data type
- Catalog Default: It is the default Hive Table type provided by Data Catalog.
- Apache Iceberg: It is an open table format for the extensive analysis data set, which supports ACID transaction, schema evolution, and Time Travel query and helps work safely and simultaneously in Spark, Trino, and Hive.
Path: Enter the path of the source data to scan.
- Run a scan for sub-paths of the path you entered.
- Click the [+Settings] to specify a detailed path for the bucket or sub-bucket.
Scanning method
- Create new scanner: Create a new scanner and run the scanner.
- Select existing scanner: Run an existing scanner in Data Catalog with the same [data type], [database], and [path].
Scan range: Specify the number of files to scan in Object Storage. Files are read in file name order.
- Configurable from 1 to 100. The default is 10.
- Scans the specified number of files for each leaf directory in the specified path.
- Available only when [Data type] is set to Catalog Default.
Pattern: Configure whether to include or exclude metadata collection for specific data.
- Enter the pattern in Glob pattern format.
- Exclude settings take precedence over Include settings.
- Available only when [Data type] is set to Catalog Default.

Note

The [Create tables in Data Catalog>Specify the Object Storage path] feature is supported only for DATA_CATALOG data sources.

Create tables in Data Catalog

You can create a table with a manually defined schema in the Data Catalog console, or by adjusting detailed scanner options.
For more information, see Create tables in Data Catalog.

Documentation Index

Add and manage data source

Add data source

Data source details

Data Catalog

Public Data

JDBC

Delete data source

Manage table

Add table manually

Create tables with SQL statements

Create tables in Data Catalog

Specify the Object Storage path

Create tables in Data Catalog