Using tools

release/20240425
English

Using tools

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in Classic and VPC

Using tools describes how to use various tools provided in the Explorer menu. CLOVA Studio JP currently provides a batch creation tool and a data expansion tool.

Create in bulk
Expand data
Expand data & tips for creating in bulk
Create in bulk: Tips for creating prompt templates

Create in bulk

Bulk creation is a tool for batch processing and forwarding large amounts of user-uploaded tasks.

How to use

The following describes how to create in bulk.

In the NAVER Cloud Platform console, click the Services > AI Services > CLOVA Studio JP menus, in that order.
Click the My Product menu.
Click the [Go to CLOVA Studio JP] button.
Click the Explorer menu.
- You can also click the [Browse Explorer] button in the function introduction area on the CLOVA Studio JP home screen.
Go to the Tool tab menu, and then click the [Start] button of Create in bulk.
Select a model engine, which is the default training model needed to process user-uploaded data.
- If you select the basic learning model provided by CLOVA Studio JP
  - Must compose prompt templates for pattern learning
  - Prompt templates are similar to playground creation tips
  - Prompt templates must consist of at least 3 sets of examples. Separate sets by putting ### between each set of examples
  - Prompt templates must end with {text}
- When you create the learning models through tuning
  - See Tuning for how to create
Upload the seed dataset for you to process.
- Analyze patterns in uploaded datasets and expand to similar types of datasets
- Seed dataset extension supports only csv and jsonl
- Upload at least 10 sets of seed data, and each set must contain less than 1000 characters.
- If you select a tuning model in the model engine, the operation type of the seed dataset must match the operation type of the tuning
Click the [Run] button.
- [Run] button: you can check the operation confirmation pop-up window when clicking it
- [OK] button: The task starts and moves to the My Task tab menu where you can check and download the task details when clicking it
- [Stop] button: Stop the operation and returns to the previous screen when clicking it
- You can perform only 1 data expansion operation at the same time (1 per account)

Caution

Batch operation takes 10 seconds per data creation and may vary depending on the system environment.
Once you start the work, you cannot stop the process and you will be charged for using the service, so please proceed with caution.

View task result and download

The following describes how to check the results of batch creation and download the results.

In the NAVER Cloud Platform console, click the Services > AI Services > CLOVA Studio JP menus, in that order.
Click the My Product menu.
Click the [Go to CLOVA Studio JP] button.
Click the User Account menu in the upper right corner of the screen.
Go to the My task tab menu, then click the Explorer tab.
Click the [Create in bulk] button.
Check the task result and download the result if necessary.
- Download: the work is complete and the result can be downloaded
- Requesting: the work is in progress
- Stop: the work is suspended
- Period Expired: the period of downloading the result has expired (7 days from task completion)

Expand data

Expand data is a tool to expand and manage user-uploaded data samples in any amount.

How to use

The following describes how to expand data.

In the NAVER Cloud Platform console, click the Services > AI Services > CLOVA Studio JP menus, in that order.
Click the My Product menu.
Click the [Go to CLOVA Studio JP] button.
Click the Explorer menu.
- You can also click the [Browse Explorer] button in the function introduction area on the CLOVA Studio JP home screen.
Go to the Tool tab menu, then click the [Start] button of Expand data.
Select a model engine, which is the default training model needed to expand user-uploaded data.
Enter the total number of data to be finally provided to the user.
- You can enter a minimum of 20 columns and a maximum of 50,000 columns (columns = number of datasets)
- Enter a value higher than the number of uploaded seed datasets
Note
If the user uploads 10 datasets and enters 20 in the desired number of data sets, 10 uploaded datasets and 10 newly created datasets are provided.
Upload a seed dataset, which is the base material for reading the type of dataset you want to expand.
- Analyze patterns in uploaded datasets and expand to similar types of datasets
- Seed dataset extension supports only csv and jsonl
- Upload at least 10 rows of seed data. Each set must contain less than 1000 characters including spaces per row.
Note
If you upload a dataset for extracting strengths and weaknesses by 10 keywords and then enter the desired number of data as 20, the following result is provided.
Click the [Run] button.
- [Run] button: you can check the operation confirmation pop-up window when clicking it
- [OK] button: the task starts and moves to the My Task tab menu where you can check and download the task details when clicking it
- [Stop] button: stop the operation and returns to the previous screen when clicking it
- You can perform only 1 data expansion operation at the same time (1 per account)

Caution

Batch operation takes 10 seconds per data creation and may vary depending on the system environment.
Once you start the work, you cannot stop the process and you will be charged for using the service, so please proceed with caution.

View task result and download

The following describes how to check the results of data expansion and download the results.

In the NAVER Cloud Platform console, click the Services > AI Services > CLOVA Studio JP menus, in that order.
Click the My Product menu.
Click the [Go to CLOVA Studio JP] button.
Click the User Account menu in the upper right corner of the screen.
Go to the My task tab menu, then click the Explorer tab.
Click the [Create in bulk] button.
Check the task result and download the result if necessary.
- Download: when the job is complete, you can download the result
- Requesting: task being performed
- Stop: stop a task
- Period expired: the period of downloading the result has expired (7 days from task completion)

Expand data & tips for creating in bulk

Create a training dataset for tuning with Expand data
- Tips
  - For tuning learning, you need at least 1000 datasets.
  - Data expansion solves the hassle of users having to create 1000 datasets one by one.
  - Data expansion is more suitable for creative work to create new sentences than answer-type work with a fixed output (completion).
- Application example: CareCall dialog dataset expansion
  - Obtain a seed dataset for users to use for data expansion. 100 dialog turns are created to generate a CareCall dialog set.
  - Expand to 1000, the minimum number of data required for tuning. (Model engine: Choco)
  - 100 dialog turns are expanded to 1000 and come out as a result.
  - Validate (error) the data to get 1000 datasets for tuning learning.
- Download example file: Expand CareCall dialog dataset
  1-1. 케어콜_데이터증강_시드데이터.csv
  1-2. 케어콜_데이터증강_결과물.csv
Test the performance of the tuned-learned engine with batch creation
- Tips
  - Proceed with tuning learning through 1000 datasets obtained through data expansion.
  - Conduct inference test to check the performance of tuning learning.
  - Inference Test requires input (text) one by one to receive one output (completion), but multiple inputs (text) can be put in and run at once through batch creation.
- Application example: Performance test after training on CareCall dialog dataset tuning
  - Perform tuning learning on the 1000 results expanded above. (Dialog tuning, Choco_LoRA)
  - Call the tuning model trained in batch creation into the model engine.
  - Prepare a seed dataset filled with only Input (text) values for performance testing.
  - Upload a seed dataset and run batch creation.
  - An output (completion) suitable for the given input (text) is created and comes out as a result.
  - Check the performance of the tuning model through validation tests to see if the desired output has been produced.
- Download example file: CareCall dialog dataset tuning and batch creation
  2-1. 케어콜_튜닝학습_데이터셋.csv
  2-2. 케어콜_일괄생성_시드데이터.csv
  2-3. 케어콜_일괄생성_결과물.csv
Batch creation is more suitable for generating various outputs (completion) through repetitive input (text).
- Tips <Data expansion (augmentation) through batch creation>
  - Expand data by generating various outputs (completion) with a small number of inputs (text).
  - Usage example: creating Christmas phrases that fit the situation
  - Create contextual Christmas phrase creation prompt templates on the batch creation service screen. (Tips for writing prompt templates)
  - To configure the seed dataset, 5 situations (Input_text) are given, and each situation is copied and pasted 20 times to create a total of 100 seed datasets.
  - A total of 100 new data can be secured because different Outputs (completion) are created 20 times for each of the 5 Input (text) values.
- Download example file: expand Christmas phrase generation data set through batch creation
  3-1. 크리스마스문구_일괄생성_시드데이터.csv
  3-2. 크리스마스문구_일괄생성_결과물.csv

Create in bulk: tips for creating prompt templates

Expand data constructs a prompt based on the given seed data and returns the output generated by the playground.
Seed datasets can have a significant impact on the results because the more diverse datasets are uploaded, the more random prompts are configured.
Create and test various prompts in the playground to predict what the output will look like.

Was this article helpful?

What's Next

Guide

Table of contents

Create in bulk
Expand data
Expand data & tips for creating in bulk
Create in bulk: tips for creating prompt templates