Run and manage query

Prev Next

Available in VPC

This section describes how to enter and run queries and how to save and utilize the queries you enter.

Simulate query

When you start a query simulation, the query statement you enter is not formally processed; rather, the amount of data scanned when processing that query statement is measured and displayed in the result window. You can predict how much you will be charged by measuring the amount of data to be scanned.
To simulate a query:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Query.
  2. Click Query Editor.
  3. Select a data source and enter a query statement in the query input window.
    • If you are using a template query, you do not have to select a data source as template queries have their data source defined in the statements. Go to the [Query] tab, click [Template query], and select the desired query statement to have it entered automatically.
  4. Click [Simulate].
  5. Check the results of the simulation in the lower section of the page.
Note
  • There is no charge for simulating queries.
  • For the query that includes a storage catalog, the size of the storage catalog data is not included in the simulation results. For more information, see the Storage catalog feature.

Run query

To run a query:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Query.
  2. Click Query Editor.
  3. Select a data source and enter a query statement in the query input window.
    • Select the database name from the Database dropdown menu to enter the table name easily without having to enter the full source path.
      dataquery-query-vpc_database_ko
    • For template queries, their data source is already defined as a database belonging to "public_data." Go to the [Query] tab, click [Template query], and select the desired ready-to-run query to have it entered automatically.
    • If you are re-running the same query, consider enabling the result reuse feature for efficient processing. For more information, see Enable query result reuse.
  4. Click [Run].
    • If you want to measure the amount of data scanned during the query's processing in advance, click [Simulate] to check the measurement result before clicking [Run].
  5. Check the query result in the lower section of the page.
    • You can view a variety of information in addition to the query result in the result area. For more information on viewing results, see View query results.
    • The query statement you enter disappears when you leave the page. Save the query statement if you need to. For more information on saving query statements, see Save query statement.
Note
  • The [Undo] button is enabled when a query is running. Click it to cancel the current query run.
  • A query can be executed for each SQL statement.
    • The system perceives the entire strings in the Editor as an executable SQL statement and runs the query.
    • If the strings in the Editor consist of 2 or more SQL statements, select the target strings to have only the strings used to execute a query.
Caution

Query execution may not be possible for certain unsupported file formats or tables.

  • A table when a file that does not match its schema or file format is in its location.
  • A table that includes a data file with an unsupported encoding format.
    • Only UTF-8 encoding format is supported for data files.

Iceberg table

You can execute a query on an Iceberg table. Apache Iceberg is a high-performance table format for analyzing a vast amount of data. The Apache Iceberg table versions 1 and 2 are supported.

To Create Iceberg table, use the CREATE TABLE USING ICEBERG statement described on the relevant page.

CREATE TABLE TABLENAME (col_name data_type) USING ICEBERG

The supported tasks for Iceberg tables are as follows:

  • SELECT, DESCRIBE, SHOW STATS, SHOW CREATE TABLE
  • INSERT, UPDATE, MERGE, DELETE
  • ALTER TABLE, DROP TABLE, COMMENT
Caution

When you delete an Iceberg table, actual data is deleted as well. In order not to delete actual data, use the following settings when creating an Iceberg table:

  • data_location
  • object_store_layout_enabled = true
  • gc.enabled = false

For more information on the settings, see the Trino documentation.

CREATE TABLE TABLENAME (col_name data_type)
with (
  "object_store_layout_enabled" = true, 
  "data_location" = 'data save path', 
  extra_properties = MAP(ARRAY['gc.enabled'], ARRAY['false'])
) 
USING ICEBERG;
Note

Data Query's Iceberg query is provided based on Trino. For more information beyond the list above, see the Trino documentation.

Enable query result reuse

Query result reuse refers to the reuse of results from queries processed a certain amount of time ago. You can use it to save query processing time when running queries with the same content repeatedly.
Result reuse must be enabled before running the completed query statement. To enable result reuse:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Query.
  2. Click Query Editor.
  3. Enter the query statement in the query input window.
  4. Find the Reuse results item in the upper right corner of the query window and click Do not reuse.
    dataquery-query-vpc_reuse_ko
  5. Find the dropdown menu and select the time frame you want to allow query result reuse.
    • You can select from 1 hour ago, 12 hours ago, or 1 day ago. Alternately, click Enter manually and select a time between 1 minute and 7 days.
    • Click [Run] to execute the query after completing the configuration. If a valid relevant result exists, it will be reused.

Execute query using parameter

You compose a query including variables and bind the variables to the execution time for dynamic query execution. The following is a query execution example using parameters:

  • Query text to execute
    select * 
    from public_data.incheon_airport.cargo_flight_schedule_summer_arrival
    where TRY_CAST(year AS INTEGER) >= ?
      and TRY_CAST(year AS INTEGER) <  ?
      and airline = ?
    
  • Parameters
    • Parameter 1: 2015
    • Parameter 2: YEAR(NOW())
    • Parameter 3: KOREAN AIR
Note
  • If you want to use number input values (integer or decimal) as the string type, enclose the value with single quotation marks (') and enter. For example, if you want to enter an input value of 123 as the string type, enter '123' for the parameter value.
  • As for a table for which the table data format is CSV, all values are processed as string type. Therefore, you must enter the input value as string type.

Storage catalog

You can enter the path of the file stored in Object Storage to run the query directly without creating a table.
The following is an example of running a query on the file stored in Object Storage using the storage catalog:

  • Use the TABLE(object_storage.file.read('TYPE', 'PATH')) syntax in the table location.
    • TYPE: File type (supported formats: csv, tsv, json, avro, parquet, and orc)
    • PATH: The path of the file stored in Object Storage
select * from TABLE(object_storage.file.read('csv', 's3a://test-bucket/path/file.csv')) ;
Note
  • For the following file types, only the specified delimiters are supported. Other delimiters will be supported in the future:
    • For csv, use comma (,) as the delimiter.
    • For tsv, use tab (\t) as the delimiter.
Caution
  • Queries are only available for the files stored in My Object Storage.
  • The storage catalog is provided for viewing small files under 10 GB. (For files larger than 10 GB, queries may not be run properly due to timeout.)
  • For the query that includes a storage catalog, the size of the storage catalog data is not included in the simulation results.
  • You can view the simulation results by running the following query, if necessary:
    • Example: EXPLAIN ANALYZE SELECT * FROM TABLE(object_storage.file.read('TYPE', 'PATH')) ;

Provided functions

The following is the list of functions provided by Data Query:

Function Return Type Argument Types Description Example
convert_range_long varchar integer/varchar, varchar The input is returned after being converted into a user-defined range to determine if it is within the desired range.

Description of the first parameter: (Required) The data to be converted into the range.
Description of the second parameter: (Required) The range values. Each range must be sorted in ascending order, and the values must be separated by commas (,).
select convert_range_long('10', '1,10,20,30')
day_of_week Integer date/varchar Returns the number assigned to the day of the week for the date entered.

Monday (1), Tuesday (2), Wednesday (3), Thursday (4), Friday (5), Saturday (6), and Sunday (7)
select day_of_week('2024-06-20')
decode_url varchar varchar, varchar Converts an encoded URL to a decoded value.

Description of the first parameter: (Required) The encoded URL.
Description of the second parameter: (Optional) The character format. (Only formats supported by JAVA are allowed.)
select decode_url
match boolean varchar, varchar, varchar Check if the column described in the first parameter matches the list of keywords entered in the second parameter. The third parameter is the matching method.

Description of the first parameter: (Required) The name of the column to match.
Description of the second parameter: (Required) The list of keywords to match. You can enter multiple keywords divided by commas (,). If you select regex for the third parameter, you can only enter JAVA regex syntax.
Description of the third parameter: (Required) exact|prefix|postfix|contain|regex
* Value descriptions - exact: exact matching (default), prefix: prefix matching, postfix: postfix matching, contain: partial matching, regex: regular expression matching
select match ("JAVA," "NAVER, Cloud, Data, JAVA, query," "exact")

select match ("NEVER," "\bNA\b," "regex")
normalize_option varchar varchar, varchar, varchar Returns original search keywords replaced with normalized search keywords in the form of the set flags.

Description of the first parameter: (Required) The search keywords.
Description of the second parameter: (Optional) Normalization option flag values. "i": Change the invisible value to an empty value. "s": Change a breaking white space to a space. "n": Change a breaking white space to an empty value. "l": Change uppercase letters to lowercase. "b": Change \ to \. It is the same as the "inl" option when there is no flag value.
Description of the third parameter: (Optional) Language locale required when applying normalization (e.g. ko_KR, en_US)
select normalize_option('NAVER', 'l')
parse_agent map(varchar,varchar) varchar The Agent field of the Apache Log is returned in units of Device, OperationSystem, or Browser. select parse_agent('Mozilla/5.0 (Linux; Android 9; LG-F800S Build/PKQ1.190522.001) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/69.0.3497.128 Whale/1.0.0.0 Crosswalk/23.69.600.0 Mobile Safari/537.36 NAVER(inapp; search; 690; 10.16.2)')
to_period varchar date/timestamp/varchar, varchar Converts data in a "date/timestamp/varchar" format to a string with a specific format.

Description of the first parameter: (Required) Date/times.
Description of the second parameter: (Required) The string format to be converted into. all/year/half/quarter/month/week/week-monday/day/12hour/6hour/3hour/hour/min/5min/10min/20min/30min
select to_period('2024-06-20','week-monday');
rank map(varchar,integer) array(varchar), integer/varchar, integer Returns the ranking of the target area.

Description of the first parameter: (Required) The target area.
Description of the second parameter: (Optional) The values of the areas separated by commas (,).
Description of the third parameter: (Optional) The exposure rank value for the unexposed. If the value is negative and unexposed, ignore it. (Default = 99)
select rank(array['aaa','bbb','shp','fff'], 'fff,ddd,eee,aaa', 50)
Note

The functions provided by Data Query are based on Trino. For more information beyond the list above, see the Trino documentation.

Manage query

You can save your query statement as My query so that it can be reused. Once saved, the query statement can be edited, copied, or deleted as needed. You can also run the saved query statement at the desired time by using the scheduled execution feature.

Save query statements

You can save a manually written query statement or template-based query statement as My query. To save query statements:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Query.
  2. Click Query Editor.
  3. Enter the query statement in the query input window.
    • If you want to use a template query, go to the [Query] tab, click [Template query], and select the desired query statement to have it entered automatically.
  4. Hover the cursor over the options menu on the right side of the query tab in the upper section of the query window.
    dataquery-query-vpc_save_ko
  5. Click Save or Save as.
  6. Enter a name for the query and click [Save] in the popup window.
    • The query is saved to My query list. Click the [Query] tab > [My query] on the left side of the page to see the list of your saved queries.
      dataquery-query-vpc_myquery_ko

Edit saved query

You can edit a saved query's name or content, save it under a different name, or delete it. To edit a saved query:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Query.
  2. Click Query Editor.
  3. Click the [Query] tab > [My query] on the left side of the page.
  4. Click the query you want to edit in My query list.
    • That prompts the selected query's content to be entered in the query window.
  5. Hover the cursor over the options menu on the right side of the query tab in the upper section of the query window.
    dataquery-query-vpc_save_ko
  6. Select and click the desired menu.
    • Edit name: Click to update the saved query's name. If you edit the query name while you have edited the content in the query window, the content is updated too.
    • Save: Run the edited statement in the query window to overwrite the saved query.
    • Save as: Copy the query's content and save it under a different name.
    • Delete: Delete the query from My query list.

Schedule saved query's execution

You can run a saved query at the desired time by adding an execution schedule to the saved query statement. To set an execution schedule:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Query.
  2. Click Query Editor.
  3. Click the [Query] tab > [My query] on the left side of the page.
  4. Find the query you want to edit in My query list. Hover the cursor over the options menu on the right side of the query.
  5. Click Set execution schedule.
    • Set execution schedule: Set execution schedule details so that the saved query statement is executed at the desired time/cycle.
    • Delete execution schedule: Delete the set execution schedule information.
Note

You can use the Execute query using parameter feature in the Schedule query execution section. When you set an execution schedule for queries including variables, enter the parameter information to be used for the execution along with the execution scheduling information.

Caution
  • Make sure that the saved query statement runs properly before setting an execution schedule. If certain errors such as query parsing errors occur, the execution attempt will not create execution history. Therefore, their execution records cannot be viewed in the Query history menu.
  • Query execution scheduling is conducted using the permissions granted to the Main Account. Bear it in mind when granting the permission to schedule query execution to an account of Sub Account.