Documentation Index

Fetch the complete documentation index at: https://guide.ncloud-docs.com/llms.txt

Use this file to discover all available pages before exploring further.

Data Manager

Prev Next

Available in VPC

This guide describes the Data Manager interface. Data Manager allows you to view the list and details of datasets within your Workspace.

Note
  • Datasets uploaded to Data Manager can be referenced across different projects within the Workspace.
  • In the Data Manager interface, you can only view the dataset list and dataset details.
  • To perform tasks such as uploading or deleting datasets, or creating tags and branches, use the ML expert Platform SDK.

View Data Manager list

The list of your datasets includes the following information:

mlxp_console_datamanager01_ko

  • Dataset Title: Name set when uploading the dataset.
  • Creation date: Initial creation date and time.
  • Operation: Click [Dataset detail] to go to the details interface.

View Data Manager details

You can view details for the selected dataset. The information is organized into tabs.

Overview

View metadata for the selected dataset.

Files and Versions

You can view the file list for each directory in the selected dataset.

Use Data Manager SDKs

Data Manager SDKs support the Python-based Huggingface Dataset Interface.
You can upload and download datasets using the SDK as follows:

Install the SDK

You can install the SDK by running the following command:

pip install "ncloud-mlx[data-manager]" # double quotes are required

Prerequisites

To use the SDK, create an API Key and specify the MLX endpoint. Enter the generated API Key to complete the setup. You can set the endpoint URL using the MLX_ENDPOINT_URL environment variable.

from mlx.sdk.data import login

login("{ API Key }") # MLXP API Key
login("{ API Key }", "{MLX endpoint}") # Specify the endpoint URL at login instead of using an environment variable

Read datasets

To use datasets in training logic, you must load them as dataset classes. For details, see the official Huggingface Python SDK documentation.

To load a local dataset:

from mlx.sdk.data import load_dataset
ds = load_dataset(
    "{ path to local data }" #  local data path e.g. "path/to/folder/*"
)

To load a dataset managed in Data Manager:

from mlx.sdk.data import load_dataset
ds = load_dataset(
    "{ Workspace name }/{ dataset name }" # Dataset location e.g. "workspaceA/datasetA"
)

Upload dataset

You can upload datasets using the same method as the Huggingface Dataset interface. For details, see the official Huggingface Python SDK documentation.

Note

You can run create_repo with Workspace Admin privileges.

Common upload methods are as follows:

push_to_hub

...
ds.push_to_hub(
    repo_id="{ Workspace name }/{ dataset name }"
)
...

upload_file

from huggingface_hub import create_repo, upload_file

path = "{ Workspace name }/{ dataset name }" # Location of the dataset to upload
create_repo(repo_id=path, repo_type="dataset")
upload_file(
    repo_id=path,
    path_or_fileobj="{ local file path }", # Path to the local file to upload
    path_in_repo="path/to/folder/foo.csv", # Remote file path in the dataset
    repo_type="dataset",
)

upload_folder

from huggingface_hub import create_repo, upload_folder

path = "{ Workspace name }/{ dataset name }" # Location of the dataset to upload
create_repo(repo_id=path, repo_type="dataset")
upload_folder(
    repo_id=path,
    folder_path="{ local directory path }", # Path to the local directory to upload
    path_in_repo="path/to/folder", # Remote directory path in the dataset
    repo_type="dataset",
)

Download a dataset

To download a dataset to a local disk:

from huggingface_hub import snapshot_download

path = "{ Workspace name }/{ dataset name }" # Location of the dataset to upload
snapshot_download(
    repo_id=path,
    repo_type="dataset",
    local_dir="path/to/folder", # Path to the directory to download to
    local_dir_use_symlinks="auto" # Whether to use symlinks with cache_dir
)

Create tags and branches

When you create a dataset, a unique commit ID is assigned. You can use this commit ID to read a dataset from a specific revision or record a tag for additional information.

To create a tag:

from huggingface_hub import create_tag

path = "{ Workspace name }/{ dataset name }" 
create_tag(
    repo_id=path,
    repo_type="dataset",
    tag="{ tag name to create}",
    revision="{ revision }",  # Base version. The default is main
    tag_message="{ tag message }"
)

Metadata such as a tag message is immutable and cannot be modified, but it can be deleted and recreated. To delete a tag:

from huggingface_hub import delete_tag

path = "{ Workspace name }/{ dataset name }" 
delete_tag(
    repo_id=path,
    repo_type="dataset",
    tag="{ tag name to delete}"
)

To create a branch:

from huggingface_hub import create_branch

path = "{ Workspace name }/{ dataset name }" 
create_branch(
    repo_id=path,
    repo_type="dataset",
    branch="{ branch name to create}",
    revision="{ revision }"
)

Deleting a dataset

Caution

Use caution when deleting a dataset. This action cannot be undone.

from huggingface_hub import delete_repo

path = "{ Workspace name }/{ dataset name }" # dataset to delete
delete_repo(
    repo_id=path,
    repo_type="dataset"
)