Table

Available in VPC

Table is a metadata definition with details and schema of the data. You can create a table through the scanner or by defining your own schema. In the Table menu, you can create and manage tables and view the collected metadata.

Table list interface

A basic description of the Table menu for using Data Catalog is as follows:

Component	Description
① Menu name	Current menu name and the number of tables being viewed.
② Basic features	Features displayed when you enter the Table menu for the first time. [Create table]: Create a table (see Create table). [Learn more]: Go to the Data Catalog overview page. [Refresh]: Reload the current table.
③ Search bar	You can search by database name, table name, location, table type, data format, and tag, and you can also sort by order.
④ Table list	Displays the list of tables being viewed.
⑤ Table name	Go to the Table details interface.
⑥ Database name	Go to the Database details interface.
⑦ Location	Go to the location of the corresponding file of Object Storage.

Table details interface

A basic description of the Table details interface is as follows:

Component	Description
① Table name	Name of the selected table.
② Basic information component	Displays the name of the database where the table belongs, table description, table location, update date and time, name of the created scanner, creation date and time, table type, and data format information.
③ Details tab component	Consists of the table's schema, schema version, partition, tag, property information, and analytics tabs; you can view details for each item. See Search for tables and view information.
④ Delete button	Delete a table.
⑤ Edit basic information button	Edit the basic information of the table.
⑥ Edit schema button	Edit the schema information. You can edit the tag information on the Tag tab.
⑦ View data button	Go to the Data Query service and view the data in the table.

Create table

You can create tables in any way you want. To create a table:

Create table with manual schema definition: Create tables by setting up your own database and schema.
Create table through scanner: Automatically define the schema through the scanner to create a table.

Create table with manual schema definition

You can create tables by setting up your own database and schema.

To create a table with manual schema definition:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Catalog.
Click the Table menu.
Click the [Create table].
Click Create table with manual schema definition, and then click [Next].
Enter basic information.
- Database: Click the dropdown menu to select a database to connect to the table.
  - Click the [Create database] to create a database (see Create database).
- Table name: Enter a table name.
- Location: Where the data in the table exists.
- Description: Enter a table description.
- Table type
  - Catalog Default: It is the default Hive Table type provided by Data Catalog.
  - Apache Iceberg: It is an open table format for the extensive analysis data set, which supports ACID transactions, schema evolution, and Time Travel queries and helps work safely and simultaneously in Spark, Trino, and Hive.
Select a data format.
- If you select the Apache Iceberg table type, do not select a data type.
- When you select CSV, you can select or enter the delimiter, data recognition symbol, and character to delete. Also, you can enter the number of header lines to exclude.
- When you select XML, you can enter Row Tag.
Click the [Add] and enter the schema information to add a user-defined schema.
- For more information about Data type, see Schema data type.
- Click the check box of the schema, and then click the [Delete] to delete an added schema.
- If you do not add a user-defined schema, a schema with the field name "default" will be added automatically.
- Spaces are allowed in field names.
If you need to enter the partition key, click the Partition component and add a partition key.
- After clicking the [Add], enter the partition key name in the input field to add the partition key.
- Click the check box of the partition to select it, and then click the [Delete] to delete the partition key.
- Spaces are allowed in partition key names.
- You do not need to enter the partition for the Apache Iceberg table type.
If a tag is necessary, click the Set tag component to add tags.
- After clicking the [Add], enter the tag information in the input field to add the tag.
  - For more information about Data type, see Tag data types.
- Click the check box of the tag to select it, and then click the [Delete] to delete a tag.
- Click the [Load tag template] to display the popup for loading tag templates.
  - Select and click a tag template, and then click the [Add] to add a tag of the relevant tag template.
  - For more information about tag templates, see Tag template.
Click [Create].

Schema data type

The data types in the schema that can be defined manually and a description of each type are as follows:

Data type	Description	Whether to be supported by Catalog Default	Whether to be supported by Apache Iceberg
tinyint	Integer data (1 byte).	Y	N
smallint	Integer data (2 bytes).	Y	N
int	Integer data (4 bytes).	Y	Y
bigint	Integer data (8 bytes).	Y	N
long	Integer data (8 bytes).	N	Y
float	Floating decimal data (4 bytes).	Y	Y
double	Floating decimal data (8 bytes).	Y	Y
decimal	Fixed decimal data. Enter the length (1 to 38 bytes) in the input field.	Y	Y
string	String data.	Y	Y
char	Fixed-length character type data. Enter the length (1 to 255 bytes) in the input field.	Y	N
varchar	Variable length character type data. Enter the length (1 to 65,535 bytes) in the input field.	Y	N
boolean	Data with true or false values.	Y	Y
binary	Binary data in char format.	Y	Y
timestamp	Date and time representation data and timestamp.	Y	Y
time		N	Y
datetime	Date and time representation data (YYYY-MM-DD HH:MM:SS).	Y	N
date	Date representation data (YYYY-MM-DD).	Y	Y
fixed	Fixed-length byte array	N	Y
uuid	Uniqueness-guaranteed ID (Universally Unique IDentifier).	N	Y
list	Collection of data of the same type Click the [Details] and enter detailed settings.	N	Y
array	Collection of data of the same type Click the [Details] and enter detailed settings.	Y	N
map	Data made of pairs of key and value. Click the [Details] and enter detailed settings.	Y	Y
struct	Data including various types of data and related schema. Click the [Details] and enter detailed settings.	Y	Y
uniontype	Type for storing various structured data types. Click the [Details] and enter detailed settings.	Y	N

Examples of entering detailed settings for each data type are as follows:

Example: Detailed settings of array type

ARRAY <
   STRUCT <
      place: STRING,
      start_year: INT
   >
>

Example: Detailed settings of map type
```
MAP <
   STRING,
   ARRAY<STRING>
>
```

Example: Detailed settings of struct type

STRUCT <
   place: STRING,
   start_year: INT
>

Example: Detailed settings of uniontype type

UNIONTYPE <
   INT,
   DOUBLE,
   ARRAY<STRING>,
   STRUCT<a:INT,b:STRING>
>

Example: Detailed settings of list type

LIST <
   STRUCT <
      place: STRING,
      start_year: INT
   >
>

Note

When you create the Iceberg table from the Data Catalog console, the table is stored in Object Storage as a field type you selected. However, when you view the table from the console, you can view the information stored in Metastore, so the data type that is not supported by Hive is viewed as a converted type.
- Converted type: List -> array, long -> bigint, time -> string, fixed -> binary, and uuid -> string.
Partition update
- When you create a table with the partition key, you must perform the partition update information task, as there is no partition key. You can perform the task as follows:
  - Data Query: Run call data_catalog.system.sync_partition_metadata('{database name}','{table name}','ADD') syntax.
  - Cloud Hadoop Hive: msck repair table {table name}
- This feature is coming soon to Data Catalog, with direct use planned for the second half of 2025.

Create table through scanner

To create a table by automatically defining the schema through the scanner:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Catalog.
Click the Table menu.
Click the [Create table].
Click Create tables through scanner, and then click [Next].
- Go to the Scanner creation interface.
Tables are created automatically when you create and run the scanner.
- The table name is automatically set based on the name of the source data.
- For more information about how to create and run a scanner, see Scanner.

Caution

Data files are supported only in the UTF-8 encoding format.
If you use other encoding formats, data scanning and querying may not work properly.

Search for tables and view information

To search for the created tables and view the information:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Catalog.
Click the Table menu.
Enter the search conditions you want and click to search for the table.
Click the table to view the information.
- Database: Name of the database where the table belongs.
- Table: Table name.
- Location: The Object Storage location where the data of the table exists.
- Table type: Type of the table (Catalog Default and Apache Iceberg).
- Data format: Format of the scanned data (CSV, XML, JSON, Parquet, ORC, AVRO, MySQL, MongoDB, MSSQL, and PostgreSQL).
- Creation date and time: The date and time when a table was first created.
- Update date and time: The most recent date and time when you edited a table's information.
- [Schema]: View the schema registered to the table.
  - For more information about Data type, see Schema data type.
  - You can edit schema by clicking the Edit button. For view tables or Apache Iceberg tables, you cannot edit the schema.
- [Schema version]: Click to view the schema version list, and then click a version to view the schema of that version.
- [Partition]: Click to check the partition key and value registered to the table.
  - Partition update feature
    - The partition value update feature is provided only for tables whose data format is CSV, XML, JSON, Parquet, AVRO, or ORC.
    - Update is available only for the Hive partition type, and for the Directory partition type, update is not available. Also, if you run "All synchronization" or "Delete-only synchronization," the partition value may be deleted.
    - Only the partition value is updated, and the partition key is not added. If you want to add a partition key, you must scan the table again.
    - Options: All synchronization (update all added/deleted partition values), add-only synchronization (update added partition values only), and delete-only synchronization (update deleted partition values only).
- [Tag]: Click to view the tags registered to the table.
  - Click the Settings button to add or delete tags.
- [Property information]: Click to view the property information about table and source data.
  - For more information about Property keys, see Property information.
- [Analytics]: Click to view the analytics information in the field/partition unit.
  - If you subscribe to Data Catalog, you can run and view the analytics feature and extract analytics data, such as minimum value, maximum value, average, and so on, in field units.
  - Supported data types: Parquet, AVRO, ORC, CSV, and JSON.
  - The data you can view is from the most recent successful run.
  - If you extract all analytics and then extract analytics for specific columns, those columns are updated, but the other columns remain from the previous run history.
    Caution
    Unique value estimates the approximate number of data points within an average error range of 5%.
    
    For CSV files, you cannot estimate the number of null values or true/false entries.
- [Optimization]: Perform the Iceberg file optimization feature (only Iceberg table format exposed).
  - Merge files: A feature that combines data files that have been divided into multiple files for more efficient file management and improved performance. Only files smaller than the merge threshold are selected for merging. (The default merge threshold is 100 MB, and its unit is MB.)
    Caution
    If the merge threshold is too large, files are merged every time the merge operation runs, causing snapshots to be created even when no file changes have occurred.
    
    It is recommended to set the merge threshold to an appropriate value in proportion to the data file size.
  - Manage snapshots: Delete snapshots you don't need to use. Snapshots within the maximum retention period are retained, while those beyond the retention period are deleted. (The minimum value for the maximum retention period is 7 days.)
  - Manage orphan files: Organize unused files, such as merged files or incorrectly written files. Orphan files within the maximum retention period are retained, while those beyond the retention period are deleted. (The minimum value for the maximum retention period is 7 days.)

Property information

If you click the [Property information] tab in the table details component, you can view the property information of the table and source data. The information items and description for each item are as follows:

Property key	Description
EXTERNAL	External storage
clusterNo	Cluster number of the scanned Cloud database product
connectionId	Scanner connection ID that created a table
connectionName	Connection name used to scan the data
created_time	Unix time display of the table creation date and time
dataFormat	Format of the data source
dataType	Type of the data source
delimiter	Delimiter if the source data is a CSV file
inputFormat	Format for reading files into Object
isDirectory	TRUE if the scan target is a directory
last_modified_time	Unix time display of the table update date and time
numFiles	Total number of files scanned when the scan target is a directory
objectstorageContentLength	Sum of ContentLength for files within the scanned Object Storage Content directory
objectstorageContentType	Common ContentType for the scanned Object Storage directory
objectstorageLastModified	Edited time of the most recently edited file in the scanned Object Storage directory
outputFormat	Format for writing files into Object
rowTag	XML tag defining a row
scannerId	Scanner ID that created a table
scannerName	Scanner name that created a table
serializationLib	Serializer and Deserializer Library
serde.separatorChar	Delimiter to determine a schema of the data
serde.quoteChar	Symbol to recognize a string as data
serde.escapeChar	Character to delete the character included in the string value recognized as data
skip.header.line.count	Number of header lines to exclude
totalSize	Total amount of data scanned when the scan target is a directory
transient_lastDdlTime	Unix time display of the table DDL's last change date and time
mysqlCollation	String sort settings of a MySQL table
mysqlDataSize	Data size of a MySQL table
mysqlIndexSize	Index size of a MySQL table
mysqlIndexes	Number of indexes of a MySQL table
mysqlRows	Number of saved rows (records) of a MySQL table
mysqlTableSize	Total size of a MySQL table
mssqlCollation	String sort settings of a MSSQL table
mssqlDataSize	Data size of a MSSQL table
mssqlIndexSize	Index size of a MSSQL table
mssqlIndexes	Number of indexes of a MSSQL table
mssqlRows	Number of saved rows (records) of a MSSQL table
mssqlTableSize	Total size of a MSSQL table
postgresqlCollation	String sort settings of a PostgreSQL table
postgresqlDataSize	Data size of a PostgreSQL table
postgresqlIndexSize	Index size of a PostgreSQL table
postgresqlIndexes	Number of indexes of a PostgreSQL table
postgresqlRows	Number of saved rows (records) of a PostgreSQL table
postgresqlTableSize	Total size of a PostgreSQL table
mongodbAvgObjSize	Average document size of a MongoDB collection
mongodbFreeStorageSize	Size of available storage space in a MongoDB database
mongodbIndexSize	Index size of a MongoDB collection
mongodbIndexes	Number of indexes of a MongoDB collection
mongodbRowCount	Number of saved documents (records) of a MongoDB collection
mongodbSize	Size of a MongoDB database
mongodbStorageSize	Storage size of a MongoDB database
mongodbTotalSize	Total size of a MongoDB database
compressionType	Zipped file extension when the scanned file is a zipped file
metadata_location	Directory of the metadata file added when using an Iceberg table

Note

For more information about other property information of an Iceberg table beyond the list above, see the Iceberg document.

Edit table

To edit the information of the created table or to select the schema version:

Note

The database where the table name and the table are included cannot be edited.

Caution

If the table type is Apache Iceberg or view tables, you cannot edit the schema.

Edit basic information

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Catalog.
Click the Table menu.
Click the table name to go to the Table details interface.
Click [Edit] on the basic information component.
Edit the information of the table in the Basic information edit popup.
- You can edit the location of the source data, table description, and source data format.
Click [Save].

Edit schema information

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Catalog.
Click the Table menu.
Click the table name to go to the Table details interface.
Click [Edit] on the schema tab.
When the Edit schema interface appears, you can edit the field name, data type, and description. You can also edit them manually using the Edit JSON button.
- Click the Version dropdown menu in the Schema component to select the schema version you want to edit.
- When you edit JSON, the name, type, typeValue, and description items must exist as follows:
```
[
  {
    "name": "col_name",
    "type": "decimal",
    "typeValue": "(10,2)",
    "description": "catalog decimal"
   }
 ]
```
Click [Save].

Edit property information

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Catalog.
Click the Table menu.
Click the table name to go to the Table details interface.
Click [Edit] on the property information tab.
You can edit inputFormat, outputFormat, and serializationLib to use in the table.
- Note that if you edit it to a library that is not compatible with the data format, you will not be able to run queries in Data Query, Hive, or Spark.
Click [Save].

Delete table

To delete the created table:

Caution

When you click Delete, all related meta information, such as the table's version information, tags, and properties, is deleted.
If there is no EXTERNAL=true value in the property information (if it is a Managed table), the actual data of Object Storage may be deleted.
You cannot recover deleted tables and data.

Note

If you delete an Iceberg type table, the actual Object Storage data is not deleted.

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Catalog.
Click the Table menu.
Click the name of the table you want to delete to go to the Table details interface.
Click [Delete].
When the notification popup appears, read the cautions and click [Delete].