Table

Prev Next

Available in VPC

Table is a metadata definition with details and schema of the data. You can create a table through the scanner or by defining your own schema. In the Table menu, you can create and manage tables and view the collected metadata.

Table list interface

A basic description of the Table menu for using Data Catalog is as follows:
data_catalog_table_ko_iceberg.png

Component Description
① Menu name Current menu name and the number of tables being viewed.
② Basic features Features displayed when you enter the Table menu for the first time.
  • [Create table]: Create a table (see Create table).
  • [Learn more]: Go to the Data Catalog overview page.
  • [Refresh]: Reload the current table.
③ Search bar You can search by database name, table name, location, table type, data format, and tag, and you can also sort by order.
④ Table list Displays the list of tables being viewed.
⑤ Table name Go to the Table details interface.
⑥ Database name Go to the Database details interface.
⑦ Location Go to the location of the corresponding file of Object Storage.

Table details interface

A basic description of the Table details interface is as follows:
data_catalog_table_ko_iceberg.png

Component Description
① Table name Name of the selected table.
② Basic information component Displays the name of the database where the table belongs, table description, table location, update date and time, name of the created scanner, creation date and time, table type, and data format information.
③ Details tab component Consists of the table's schema, schema version, partition, tag, property information, and analytics tabs; you can view details for each item. See Search for tables and view information.
④ Delete button Delete a table.
⑤ Edit basic information button Edit the basic information of the table.
⑥ Edit schema button Edit the schema information. You can edit the tag information on the Tag tab.
⑦ View data button Go to the Data Query service and view the data in the table.

Create table

You can create tables in any way you want. To create a table:

Create table with manual schema definition

You can create tables by setting up your own database and schema.

To create a table with manual schema definition:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Catalog.
  2. Click the Table menu.
  3. Click the [Create table].
  4. Click Create table with manual schema definition, and then click [Next].
  5. Enter basic information.
    • Database: Click the dropdown menu to select a database to connect to the table.
    • Table name: Enter a table name.
    • Location: Where the data in the table exists.
    • Description: Enter a table description.
    • Table type
      • Catalog Default: It is the default Hive Table type provided by Data Catalog.
      • Apache Iceberg: It is an open table format for the extensive analysis data set, which supports ACID transactions, schema evolution, and Time Travel queries and helps work safely and simultaneously in Spark, Trino, and Hive.
  6. Select a data format.
    • If you select the Apache Iceberg table type, do not select a data type.
    • When you select CSV, you can select or enter the delimiter, data recognition symbol, and character to delete. Also, you can enter the number of header lines to exclude.
    • When you select XML, you can enter Row Tag.
  7. Click the [Add] and enter the schema information to add a user-defined schema.
    • For more information about Data type, see Schema data type.
    • Click the check box of the schema, and then click the [Delete] to delete an added schema.
    • If you do not add a user-defined schema, a schema with the field name "default" will be added automatically.
    • Spaces are allowed in field names.
  8. If you need to enter the partition key, click the Partition component and add a partition key.
    • After clicking the [Add], enter the partition key name in the input field to add the partition key.
    • Click the check box of the partition to select it, and then click the [Delete] to delete the partition key.
    • Spaces are allowed in partition key names.
    • You do not need to enter the partition for the Apache Iceberg table type.
  9. If a tag is necessary, click the Set tag component to add tags.
    • After clicking the [Add], enter the tag information in the input field to add the tag.
    • Click the check box of the tag to select it, and then click the [Delete] to delete a tag.
    • Click the [Load tag template] to display the popup for loading tag templates.
      • Select and click a tag template, and then click the [Add] to add a tag of the relevant tag template.
      • For more information about tag templates, see Tag template.
  10. Click [Create].

Schema data type

The data types in the schema that can be defined manually and a description of each type are as follows:

Data type Description Whether to be supported by Catalog Default Whether to be supported by Apache Iceberg
tinyint Integer data (1 byte). Y N
smallint Integer data (2 bytes). Y N
int Integer data (4 bytes). Y Y
bigint Integer data (8 bytes). Y N
long Integer data (8 bytes). N Y
float Floating decimal data (4 bytes). Y Y
double Floating decimal data (8 bytes). Y Y
decimal Fixed decimal data.
  • Enter the length (1 to 38 bytes) in the input field.
Y Y
string String data. Y Y
char Fixed-length character type data.
  • Enter the length (1 to 255 bytes) in the input field.
Y N
varchar Variable length character type data.
  • Enter the length (1 to 65,535 bytes) in the input field.
Y N
boolean Data with true or false values. Y Y
binary Binary data in char format. Y Y
timestamp Date and time representation data and timestamp. Y Y
time N Y
datetime Date and time representation data (YYYY-MM-DD HH:MM:SS). Y N
date Date representation data (YYYY-MM-DD). Y Y
fixed Fixed-length byte array N Y
uuid Uniqueness-guaranteed ID (Universally Unique IDentifier). N Y
list Collection of data of the same type
  • Click the [Details] and enter detailed settings.
N Y
array Collection of data of the same type
  • Click the [Details] and enter detailed settings.
Y N
map Data made of pairs of key and value.
  • Click the [Details] and enter detailed settings.
Y Y
struct Data including various types of data and related schema.
  • Click the [Details] and enter detailed settings.
Y Y
uniontype Type for storing various structured data types.
  • Click the [Details] and enter detailed settings.
Y N

Examples of entering detailed settings for each data type are as follows:

  • Example: Detailed settings of array type
    ARRAY <
       STRUCT <
          place: STRING,
          start_year: INT
       >
    >
    
  • Example: Detailed settings of map type
    MAP <
       STRING,
       ARRAY<STRING>
    >
    
  • Example: Detailed settings of struct type
    STRUCT <
       place: STRING,
       start_year: INT
    >
    
  • Example: Detailed settings of uniontype type
    UNIONTYPE <
       INT,
       DOUBLE,
       ARRAY<STRING>,
       STRUCT<a:INT,b:STRING>
    >
    
  • Example: Detailed settings of list type
    LIST <
       STRUCT <
          place: STRING,
          start_year: INT
       >
    >
    
Note
  • When you create the Iceberg table from the Data Catalog console, the table is stored in Object Storage as a field type you selected. However, when you view the table from the console, you can view the information stored in Metastore, so the data type that is not supported by Hive is viewed as a converted type.
    • Converted type: List -> array, long -> bigint, time -> string, fixed -> binary, and uuid -> string.
  • Partition update
    • When you create a table with the partition key, you must perform the partition update information task, as there is no partition key. You can perform the task as follows:
      • Data Query: Run call data_catalog.system.sync_partition_metadata('{database name}','{table name}','ADD') syntax.
      • Cloud Hadoop Hive: msck repair table {table name}
    • This feature is coming soon to Data Catalog, with direct use planned for the second half of 2025.

Create table through scanner

To create a table by automatically defining the schema through the scanner:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Catalog.
  2. Click the Table menu.
  3. Click the [Create table].
  4. Click Create tables through scanner, and then click [Next].
    • Go to the Scanner creation interface.
  5. Tables are created automatically when you create and run the scanner.
    • The table name is automatically set based on the name of the source data.
    • For more information about how to create and run a scanner, see Scanner.
Caution
  • Data files are supported only in the UTF-8 encoding format.
  • If you use other encoding formats, data scanning and querying may not work properly.

Search for tables and view information

To search for the created tables and view the information:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Catalog.
  2. Click the Table menu.
  3. Enter the search conditions you want and click i-datacatalog-search to search for the table.
  4. Click the table to view the information.
    • Database: Name of the database where the table belongs.
    • Table: Table name.
    • Location: The Object Storage location where the data of the table exists.
    • Table type: Type of the table (Catalog Default and Apache Iceberg).
    • Data format: Format of the scanned data (CSV, XML, JSON, Parquet, ORC, AVRO, MySQL, MongoDB, MSSQL, and PostgreSQL).
    • Creation date and time: The date and time when a table was first created.
    • Update date and time: The most recent date and time when you edited a table's information.
    • [Schema]: View the schema registered to the table.
      • For more information about Data type, see Schema data type.
      • You can edit schema by clicking the Edit button. For view tables or Apache Iceberg tables, you cannot edit the schema.
    • [Schema version]: Click to view the schema version list, and then click a version to view the schema of that version.
    • [Partition]: Click to check the partition key and value registered to the table.
      • Partition update feature
        • The partition value update feature is provided only for tables whose data format is CSV, XML, JSON, Parquet, AVRO, or ORC.
        • Update is available only for the Hive partition type, and for the Directory partition type, update is not available. Also, if you run "All synchronization" or "Delete-only synchronization," the partition value may be deleted.
        • Only the partition value is updated, and the partition key is not added. If you want to add a partition key, you must scan the table again.
        • Options: All synchronization (update all added/deleted partition values), add-only synchronization (update added partition values only), and delete-only synchronization (update deleted partition values only).
    • [Tag]: Click to view the tags registered to the table.
      • Click the Settings button to add or delete tags.
    • [Property information]: Click to view the property information about table and source data.
    • [Analytics]: Click to view the analytics information in the field/partition unit.
      • If you subscribe to Data Catalog, you can run and view the analytics feature and extract analytics data, such as minimum value, maximum value, average, and so on, in field units.
      • Supported data types: Parquet, AVRO, ORC, CSV, and JSON.
      • The data you can view is from the most recent successful run.
      • If you extract all analytics and then extract analytics for specific columns, those columns are updated, but the other columns remain from the previous run history.
        Caution
        • Unique value estimates the approximate number of data points within an average error range of 5%.
        • For CSV files, you cannot estimate the number of null values or true/false entries.
    • [Optimization]: Perform the Iceberg file optimization feature (only Iceberg table format exposed).
      • Merge files: A feature that combines data files that have been divided into multiple files for more efficient file management and improved performance. Only files smaller than the merge threshold are selected for merging. (The default merge threshold is 100 MB, and its unit is MB.)
        Caution
        • If the merge threshold is too large, files are merged every time the merge operation runs, causing snapshots to be created even when no file changes have occurred.
        • It is recommended to set the merge threshold to an appropriate value in proportion to the data file size.
      • Manage snapshots: Delete snapshots you don't need to use. Snapshots within the maximum retention period are retained, while those beyond the retention period are deleted. (The minimum value for the maximum retention period is 7 days.)
      • Manage orphan files: Organize unused files, such as merged files or incorrectly written files. Orphan files within the maximum retention period are retained, while those beyond the retention period are deleted. (The minimum value for the maximum retention period is 7 days.)

Property information

If you click the [Property information] tab in the table details component, you can view the property information of the table and source data. The information items and description for each item are as follows:

Property key Description
EXTERNAL External storage
clusterNo Cluster number of the scanned Cloud database product
connectionId Scanner connection ID that created a table
connectionName Connection name used to scan the data
created_time Unix time display of the table creation date and time
dataFormat Format of the data source
dataType Type of the data source
delimiter Delimiter if the source data is a CSV file
inputFormat Format for reading files into Object
isDirectory TRUE if the scan target is a directory
last_modified_time Unix time display of the table update date and time
numFiles Total number of files scanned when the scan target is a directory
objectstorageContentLength Sum of ContentLength for files within the scanned Object Storage Content directory
objectstorageContentType Common ContentType for the scanned Object Storage directory
objectstorageLastModified Edited time of the most recently edited file in the scanned Object Storage directory
outputFormat Format for writing files into Object
rowTag XML tag defining a row
scannerId Scanner ID that created a table
scannerName Scanner name that created a table
serializationLib Serializer and Deserializer Library
serde.separatorChar Delimiter to determine a schema of the data
serde.quoteChar Symbol to recognize a string as data
serde.escapeChar Character to delete the character included in the string value recognized as data
skip.header.line.count Number of header lines to exclude
totalSize Total amount of data scanned when the scan target is a directory
transient_lastDdlTime Unix time display of the table DDL's last change date and time
mysqlCollation String sort settings of a MySQL table
mysqlDataSize Data size of a MySQL table
mysqlIndexSize Index size of a MySQL table
mysqlIndexes Number of indexes of a MySQL table
mysqlRows Number of saved rows (records) of a MySQL table
mysqlTableSize Total size of a MySQL table
mssqlCollation String sort settings of a MSSQL table
mssqlDataSize Data size of a MSSQL table
mssqlIndexSize Index size of a MSSQL table
mssqlIndexes Number of indexes of a MSSQL table
mssqlRows Number of saved rows (records) of a MSSQL table
mssqlTableSize Total size of a MSSQL table
postgresqlCollation String sort settings of a PostgreSQL table
postgresqlDataSize Data size of a PostgreSQL table
postgresqlIndexSize Index size of a PostgreSQL table
postgresqlIndexes Number of indexes of a PostgreSQL table
postgresqlRows Number of saved rows (records) of a PostgreSQL table
postgresqlTableSize Total size of a PostgreSQL table
mongodbAvgObjSize Average document size of a MongoDB collection
mongodbFreeStorageSize Size of available storage space in a MongoDB database
mongodbIndexSize Index size of a MongoDB collection
mongodbIndexes Number of indexes of a MongoDB collection
mongodbRowCount Number of saved documents (records) of a MongoDB collection
mongodbSize Size of a MongoDB database
mongodbStorageSize Storage size of a MongoDB database
mongodbTotalSize Total size of a MongoDB database
compressionType Zipped file extension when the scanned file is a zipped file
metadata_location Directory of the metadata file added when using an Iceberg table
Note

For more information about other property information of an Iceberg table beyond the list above, see the Iceberg document.

Edit table

To edit the information of the created table or to select the schema version:

Note

The database where the table name and the table are included cannot be edited.

Caution

If the table type is Apache Iceberg or view tables, you cannot edit the schema.

Edit basic information

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Catalog.
  2. Click the Table menu.
  3. Click the table name to go to the Table details interface.
  4. Click [Edit] on the basic information component.
  5. Edit the information of the table in the Basic information edit popup.
    • You can edit the location of the source data, table description, and source data format.
  6. Click [Save].

Edit schema information

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Catalog.
  2. Click the Table menu.
  3. Click the table name to go to the Table details interface.
  4. Click [Edit] on the schema tab.
  5. When the Edit schema interface appears, you can edit the field name, data type, and description. You can also edit them manually using the Edit JSON button.
    • Click the Version dropdown menu in the Schema component to select the schema version you want to edit.
    • When you edit JSON, the name, type, typeValue, and description items must exist as follows:
    [
      {
        "name": "col_name",
        "type": "decimal",
        "typeValue": "(10,2)",
        "description": "catalog decimal"
       }
     ]
    
  6. Click [Save].

Edit property information

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Catalog.
  2. Click the Table menu.
  3. Click the table name to go to the Table details interface.
  4. Click [Edit] on the property information tab.
  5. You can edit inputFormat, outputFormat, and serializationLib to use in the table.
    • Note that if you edit it to a library that is not compatible with the data format, you will not be able to run queries in Data Query, Hive, or Spark.
  6. Click [Save].

Delete table

To delete the created table:

Caution
  • When you click Delete, all related meta information, such as the table's version information, tags, and properties, is deleted.
  • If there is no EXTERNAL=true value in the property information (if it is a Managed table), the actual data of Object Storage may be deleted.
  • You cannot recover deleted tables and data.
Note

If you delete an Iceberg type table, the actual Object Storage data is not deleted.

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Data Catalog.
  2. Click the Table menu.
  3. Click the name of the table you want to delete to go to the Table details interface.
  4. Click [Delete].
  5. When the notification popup appears, read the cautions and click [Delete].