Utilize tools

Prev Next

Available in Classic and VPC

This section describes how to use various tools provided in the Explore menu. CLOVA Studio currently provides a batch creation tool and data expansion tool.

Instruction

The following models support batch creation tool and data expansion tool:

  • HCX-003, HCX-DASH-001 (including tuning models)

Data expansion

The data expansion tool allows you to expand the data sample you uploaded by any amount. When you upload a seed dataset, the language model analyzes the patterns in the seed dataset to generate as much similar data as you want.

To use the data expansion tool, follow these steps:

  1. From the NAVER Cloud Platform console, click i_menu > Services > AI Services > CLOVA Studio in order.
  2. Click my product > [Go to CLOVA Studio] button.
  3. Click the Explorer menu.
  4. On the Tool tab, click the [Start] button of Data expansion.
  5. Select a model, which is the default training model needed to expand the data you uploaded.
  6. After selecting a model, enter the number of rows of data you want to get.
    • You can enter a minimum of 20 rows and a maximum of 50,000 rows (rows = number of data).
    • Be sure to enter a value greater than the number of data written in the seed dataset you uploaded.
  7. Upload a seed dataset.
    • Analyze patterns in uploaded datasets and expand to similar types of datasets.
    • Only CSV and JSONL extensions are supported for seed data. Seed data set must be encoded in UTF-8 format.
    • Seed data must be uploaded in at least 10 rows, with no more than 1,000 characters per row, including spaces.
    • If the contents of the dataset contain "#" symbols, performance may be degraded.
  8. Click the [Run] button.
    • The Task confirmation popup window appears.
  9. Click the [OK] button to start the task.
    • Go to the My task menu where you can view and download your task history.
    • Click the [Stop] button to stop the task and return to the previous screen.
    • To view and download task results, see [Manage tasks].
Note

If you upload 10 datasets and enter 20 for the desired number of data, you will receive the 10 uploaded datasets and 10 newly created datasets.

Caution
  • Only 1 data expansion task can run at a time per account.
  • A data expansion task takes 10 seconds on average to create 1 piece of data. This time may vary depending on your system environment.
  • If you cancel a task after it has started, you are charged based on the progress of the task.

Batch creation

Batch creation is a tool for batch processing and forwarding large amounts of tasks you uploaded.

To use the batch creation tool, follow these steps:

  1. From the NAVER Cloud Platform console, click i_menu > Services > AI Services > CLOVA Studio in order.
  2. Click the my product menu > [Go to CLOVA Studio] button.
  3. Click the Explorer menu.
  4. Click the Tool tab menu, and then click the [Start] button of Batch creation.
  5. When the batch creation screen appears, select a model.
    • If you selected the default model, fill in the prompt template.
      • Filling in the prompt template is similar to the creation of a playground.
      • Your prompt template should consist of at least 3 sets of examples, separated by a ### between each set of examples.
      • Be sure to type {text} to end the prompt template.
    • If you are using tuning to create your own training model, see Tuning.
  6. Upload a seed dataset.
    • Analyze patterns in uploaded datasets and expand to similar types of datasets.
    • Only CSV and JSONL extensions are supported for seed data. Seed data set must be encoded in UTF-8 format.
    • The seed dataset must contain at least 10 rows of data, with no more than 1,000 characters per row, including spaces.
    • When selecting a tuning model in the model, the operation type of the seed dataset must match that of the tuning model.
    • If the contents of the dataset contain "#" symbols, performance may be degraded.
  7. Click the [Run] button.
    • The Task confirmation window appears.
  8. Click the [OK] button to start the task.
    • Go to the [My task] menu where you can view and download your task history.
    • Click the [Stop] button to stop the task and return to the previous screen.
    • To view and download task results, see Manage tasks.
Caution
  • Only 1 batch creation task can run at a time per account.
  • A batch creation task takes 10 seconds on average to create 1 piece of data. This time may vary depending on your system environment.
  • Note that if you cancel a task after it has started, you may be charged depending on the progress of the task.
Note

Because batch data creation is based on a seed dataset, results can vary significantly depending on the data in the seed dataset. To predict the results, try creating and testing different prompts in the playground. Here's an example of a seed dataset and the resulting output.
clovastudio-explorer_augbatch_seed01_ko.png

clovastudio-explorer_augbatch_seed02_ko.png

Use cases

Expand CareCall conversation dataset

Data expansion is more suitable for creative work to create new sentences than response-type work with a predetermined answer (completion). For example, tuning training requires a dataset with at least a thousand data points. The data expansion tool saves users from having to manually create thousands of data points.

To expand a CareCall conversation dataset, follow these steps:

  1. Obtain a seed dataset to use for data expansion. 100 dialog turns are created to generate a CareCall dialog set.
  2. Expand to 1000, the minimum number of data required for tuning.
  3. 100 dialog turns are expanded to 1000 and come out as a result.
  4. Validate (error) the data to get 1000 datasets for tuning training.

Performance test with batch creation

When you run tests to check the performance of your tuned model, you need to enter the input (text) one at a time to get a single output (completion). However, with the batch creation tool, you can enter multiple inputs at once and get results. This section describes how to test the performance of a model tuning trained using the batch generation tool.

To test performance after training on the CareCall conversation dataset, follow these steps:

  1. Train on 1000 results from your data expansion.
  2. Call the tuning model trained in batch creation into the model.
  3. Prepare a seed data set filled with only Input (text) values for performance testing.
  4. Upload a seed dataset and run batch creation.
    • An output (completion) suitable for the given input (text) is created and comes out as a result.
  5. Check the performance of the tuning model by running a validation test to see if it produces the desired results.

Expand your data with the batch creation tool

It is suitable for tasks that generate different outputs (completions) from repetitive inputs (text). You can expand your data by generating multiple outputs (completion) with a small number of inputs (text). This section describes how to expand your data with the batch creation tool.

To create contextual Christmas phrases, follow these steps:

  1. On the batch creation service screen, complete the contextual Christmas phrase creation prompt template.
    clovastudio-explorer02_ex03-01.png
  2. To configure the seed data set, give 5 situations (Input_text) and copy and paste each situation 20 times to create a total of 100 seed data sets.
    clovastudio-explorer_dataset2_ko.png
  3. Check the results.