Available in VPC
This section describes how to add data sources for queries and manage registered data.
Add data source
Data sources for queries can be added through connections.
To add a data source, follow these steps:
- In the VPC environment on the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Data Query. - Click the Query Editor menu.
- Click [Add data source] at the top of the interface.
- Alternatively, click [Manage data source], then click [Add data source] in the popup.
- Enter the information for the data source you want to register.
- Name: Enter up to 50 characters using English uppercase and lowercase letters, numbers, and underscores (_). The first character must be an English letter.
- Source type: Select a single supported JDBC (various source types to be added later).
- Connection: Select a connection created for the data you want to use from those registered in Data Catalog.
- Click [Create].
- The added data source is displayed in the Data source dropdown menu and in the list shown when you click the [Manage data source].
- By selecting an item from the Data source drop-down menu, you can view the data imported through the connection in a tree format.
- You are now ready to run queries using the database of the added data source. For more information on how to run queries, see Run and manage query.
The data source is created by referencing the connection information in the Data Catalog service. If the connection information in the Data Catalog service changes after you add a data source, delete the data source and add it again.
Data source details
Connection and usage methods may vary depending on the data source type.
Data Catalog
- Data Query integrates schema information collected from the Data Catalog service for direct use in queries. Data Catalog is automatically added upon subscription, without a separate data source registration process.
The following tables collected in Data Catalog are not available for viewing data in Data Query:
- Tables with their location specified as individual files
- As in the following example, if a table's location is specified as a file, Data Query cannot retrieve data from the table.
Example:- Location where you can view data: s3a://test-bucket/database-name/table-name/
- Location where you cannot view data: s3a://test-bucket/database-name/table-name/data.csv
- In Data Catalog, tables scanned in this manner typically contain multiple data structures within a single directory. If you separate the data into individual directories in Object Storage and then run the scanner, the table location is correctly recognized as a directory.
- As in the following example, if a table's location is specified as a file, Data Query cannot retrieve data from the table.
- Tables with their location set to an internal path within the service
- CLOUD_DB_FOR_MYSQL
- CLOUD_DB_FOR_MSSQL
- CLOUD_DB_FOR_POSTGRESQL
- CLOUD_DB_FOR_MONGODB
- JDBC
- Tables with their location set to HDFS of Cloud Hadoop
- Among these types of tables, those using MySQL, MSSQL, and PostgreSQL can view data after creating and connecting a JDBC data source. (A JDBC Connection can be created in Data Catalog using a public IP address.)
Public Data
- The Data Query service provides public data sources that can be used for queries by default.
- Selecting public_data from the data source list displays it. The specific list of databases included in the source is subject to change.
| Database name | Table name | Data description |
|---|---|---|
| data_naver_cloud_service | vpc_flowlog | NAVER Cloud service
|
| incheon_airport | passenger_flight_schedule_summer_arrival passenger_flight_schedule_summer_departure passenger_flight_schedule_winter_arrival passenger_flight_schedule_winter_departure |
Incheon International Airport Corporation |
| incheon_airport | cargo_flight_schedule_summer_arrival cargo_flight_schedule_summer_departure cargo_flight_schedule_winter_arrival cargo_flight_schedule_winter_departure |
Incheon International Airport Corporation |
| korea_national_railway | subway_busan subway_seoul_capital_area |
Korea National Railway |
| korea_trade_insurance | exchange_rate guaranteed_exchange_rate |
Korea Trade Insurance Corporation |
| ministry_economy_finance | foreign_exchange_reserves | Ministry of Economy and Finance |
| ministry_land_infra_transport | public_land_value nationwide_bus_stop_location |
Ministry of Land, Infrastructure, and Transport |
| national_health_insurance_service | health_screening emergency_room_visits giving_birth_business_size |
National Health Insurance Service |
| national_pension_service | pension_enrolled_business_establishment | National Pension Service |
| national_tax_service | business_status_age_group business_status_gender business_status_years_of_establishment business_status_top_100_essential |
National Tax Service |
JDBC
- The Data Query service supports JDBC connections, allowing you to easily connect to databases. This enables a variety of database operations, such as viewing, editing, and deleting data.
- Databases supported for connection with Data Query JDBC
- MySQL (supports version compatibility with Cloud DB for MySQL)
- MSSQL (supports version compatibility with Cloud DB for MSSQL)
- PostgreSQL (supports version compatibility with Cloud DB for PostgreSQL)
- MongoDB (supports version compatibility with Cloud DB for MongoDB)
For JDBC connections, you need to allow access from the following IP addresses to the user-relational database and network environment.
- Data Query access IP:
223.130.128.167- To allow access for metadata and data to your DB from Data Query, follow these steps:
- For example, in Server > ACG > ACG settings, add
[TCP, 223.130.128.167, 사용자 DB port 번호]to inbound rules. - For example, in Cloud DB for MySQL > Manage DB > Manage DB user, add DB user and access IP (
223.130.128.167).
- Data Catalog access IP:
110.165.25.5- When you create a JDBC connection, you must also add the
110.165.25.5IP address, as shown in the Data Catalog Connection creation guide. - To confirm connection to the user's DB from Data Catalog, add access permission.
- Add
110.165.25.5to ACG, DB user & access IP in the same way as for Data Query above.
- When you create a JDBC connection, you must also add the
For JDBC MongoDB integrations, you need READ_WRITE permissions. For more information, see Cloud DB for MongoDB.
Delete data source
You can stop integrations for the data sources that are no longer used for queries by deleting them.
To delete a data source, follow these steps:
Deleting a data source disconnects the integration, making it unavailable only in the Data Query service, without affecting the original data or the Data Catalog service.
- In the VPC environment on the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Data Query. - Click the Query Editor menu.
- Click [Manage data source] at the top of the interface.
- In the data source popup, select the checkbox for the item you want to delete and click [Delete].
- In the notification popup, click [OK].
- Integrations are terminated and disappear from all data source lists.
Manage table
Registering a data source allows you to view the internal table fields in a tree format. At this time, you can easily manage tables by using the additional features provided for each table. The provided features are as follows:
- Preview table: Automatically enters a query in the query window to preview the table content.
- Create table DDL: Analyzes the table and automatically enters a DDL statement in the query window to create the table. Can be used for copying or editing tables.
- Delete table: Automatically enters a query statement in the query window to delete the table.
- View catalog: Go to the catalog in Data Catalog where the table is registered.
To run the additional features for table management, follow these steps:
- In the VPC environment on the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Data Query. - Click the Query Editor menu.
- Hover the cursor over the options menu next to the table you want in the data source tree component.
- The table is classified by
marks.

- The table is classified by
- Select and click the desired feature.
- Once you have selected the feature of automatically entering a query statement, you can run the query by clicking [Run].
- If you have selected View catalog, the Table menu page of the Data Catalog service will be displayed in a new window.
Add table manually
You can register the data file to the table by running the table creation SQL statement in Data Query or using the table creation feature of Data Catalog. To add a table, follow these steps:
- In the VPC environment on the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Data Query. - Click the Query Editor menu.
- In the data source component, click the "+" button next to the search bar.
- The "+" button is displayed as
.
- The "+" button is displayed as
Create tables with SQL statements
CREATE TABLE
- Creates a new table by defining the schema directly and specifying the table properties and the data location.
- To create a table in Iceberg table format, see the
CREATE TABLE(ICEBERG)syntax.
CREATE TABLE AS SELECT
- Creates a new table using existing tables and data.
- You can create a table with the same data and schema as the original table.
- Alternatively, you can create a table by selecting only the column you need through a SELECT syntax, or by simply transforming the data.
- To create a table in Iceberg table format, see the
CREATE TABLE AS SELECT(ICEBERG)syntax.
CREATE VIEW
- Creates a new VIEW for the SELECT query you entered.
- You can create a complex SELECT query as a view and easily reference it later.
Create tables in Data Catalog
Specify the Object Storage path
You can register the data file to the table manually using Data Catalog's Scanner. Enter the information required for scanning and run Scanner.
- Database: Select a database for the table created by the scanner.
- Data type
- Catalog Default: It is the default Hive Table type provided by Data Catalog.
- Apache Iceberg: It is an open table format for the extensive analysis data set, which supports ACID transaction, schema evolution, and Time Travel query and helps work safely and simultaneously in Spark, Trino, and Hive.
- Path: Enter the path of the source data to scan.
- Run a scan for sub-paths of the path you entered.
- Click the [+Settings] to specify a detailed path for the bucket or sub-bucket.
- Scanning method
- Create new scanner: Create a new scanner and run the scanner.
- Select existing scanner: Run an existing scanner in Data Catalog with the same [data type], [database], and [path].
- Scan range: Specify the number of files to scan in Object Storage. Files are read in file name order.
- Configurable from 1 to 100. The default is 10.
- Scans the specified number of files for each leaf directory in the specified path.
- Available only when [Data type] is set to Catalog Default.
- Pattern: Configure whether to include or exclude metadata collection for specific data.
- Enter the pattern in Glob pattern format.
- Exclude settings take precedence over Include settings.
- Available only when [Data type] is set to Catalog Default.
The [Create tables in Data Catalog>Specify the Object Storage path] feature is supported only for DATA_CATALOG data sources.
Create tables in Data Catalog
You can create a table with a manually defined schema in the Data Catalog console, or by adjusting detailed scanner options.
For more information, see Create tables in Data Catalog.