- Print
- PDF
Utilize APIs
- Print
- PDF
The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.
Available in Classic and VPC
This page describes the APIs available from the Explorer menu. Click the [Get Started] button for each API to see the details of that API.
Tokenizer APIs
The Tokenizer APIs can count the number of tokens in a sentence that you type. You can use the tokenizer to find the optimal number of tokens or to create efficient prompts.
The tokenizer (HCX) is an API that counts the number of tokens in a given sentence in HyperCLOVA X model.
The tokenizer (embedding v2) is an API that calculates the number of tokens in a sentence you type in the bge-m3 model of the embedding v2 API.
Sliding window APIs
The Sliding window APIs helps to keep conversations flowing in chat mode by scaling prompts and results to the maximum number of tokens HyperCLOVA X language model can handle.
In chat mode, if the user's conversation with the assistant exceeds the maximum number of tokens that can be processed by HyperCLOVA X language model, new conversations cannot be created. The Sliding window APIs can delete the oldest conversation turn in the conversation history of the user and the assistant to prevent such a situation. When deleting conversations, it starts with the conversation following the system directive, i.e., the earliest entered conversation turn.
- The Sliding window APIs only works for models that are in chat mode (Chat completions API).
- You will need to set the order so that the results of the Sliding window APIs are passed to the Chat completions APIs as they are.
- Other settings, such as modelName and maxTokens, should be set to the same values as the Chat completions APIs settings you are using.
- Because the chat history between the user and the assistant is deleted sequentially from the beginning of the conversation, newly created chats may not reflect previous chats.
- In particular, if the maximum number of tokens generated by the result is set to a large number (the maxTokens value in the API), the conversation history will be deleted proportionally based on the number, so newly created conversations may not fully reflect previous conversations.
How the Sliding window works
In chat mode, if the sum of the total number of tokens in the entered conversations (A) and the maximum number of tokens in the new conversation (B=maxTokens) is greater than the maximum number of tokens the model can handle (X) (i.e., A+B>X), the Chat completions APIs will not generate any more conversations. To work around this, the Sliding window APIs delete conversation turns from existing conversations based on the number of excess tokens (A+B-X). It deletes on a conversation turn basis to avoid deleting only part of a conversation turn (deleting the minimum number of conversation turns based on the number of excess tokens).
For example, if the number of excess tokens is 200, as shown in the figure below, using the Sliding window APIs will delete the two oldest conversation turns (100 and 200 tokens) of the existing conversation history. If the number of excess tokens is 100 tokens or fewer, only the oldest conversation turn in the existing conversation history will be deleted. This means that individual conversation turns are deleted, not pairs of conversations between the user and the assistant.
Sliding window APIs workflow
By using the Sliding window APIs, you can use the Chat completions APIs continuously without having to separately adjust the number of tokens for the entire conversation.
The following describes how the Sliding window APIs work.
- Before using the Chat completions APIs, first call the Sliding window APIs and provide the prompt you want to enter (conversation content; body > messages) in the request.
- Enter the result in the Sliding window APIs response (result > messages) into the Chat completions APIs request as is.
- The model name and maximum number of tokens should be the same for the Chat completions APIs and the Sliding window APIs.
Segmentation APIs
The Segmentation APIs can separate paragraphs by topic by calculating the similarity between sentences. You can specify the number of tokens that can fit in a paragraph. The Segmentation API can also split paragraphs based on context, even if there are no blank lines in the paragraph or the break is not clear. The number of paragraphs can be adjusted with the SegCount value. If you want automatic segmentation, set the SegCount value to -1. If the value you enter is 0 or greater, paragraphs are broken to that value.
The following describes how the Segmentation APIs works.
Summarization APIs
The Summarization APIs can split a given set of sentences into paragraphs and then summarize each of the paragraphs.
It breaks long documents into contextualized paragraphs and summarizes each paragraph. You can reduce the length of your text by removing unnecessary parts while retaining important information. You can also control the size of the summary by using the segMaxSize and segMinSize of the segmentation to limit the number of characters that can be included in a paragraph.
The Summarization APIs can be utilized as follows:
- Summarize long meeting minutes and understand the content of the minutes to generate key takeaways.
- Summarize a long email to make it easier to understand what is important.
- Summarize a report or script. Summarize paragraphs in context, making it easy to create a table of contents.
Summarizing may not work well if the text is published on the web.
Embedding APIs
Convert input text to a vector of numeric values. You can select your desired one from the three models depending on the task and purpose. Each model will give different similarity results for the same pair of sentences.
Tool | Model | Number of tokens | Dimension of vector space | Recommended distance metric | Note |
---|---|---|---|---|---|
Embedding | clir-emb-dolphin | 500 tokens | 1024 | IP (inner/dot/scalar product) | |
Embedding | clir-sts-dolphin | 500 tokens | 1024 | Cosine similarity | |
Embedding v2 | bge-m3 | 8192 tokens | 1024 | Cosine similarity | Open source model* |
Each embedding API model has the following characteristics.
Consider the following matter to obtain the same output from embedding v2 as that from the open-source bge-m3 model.
- Embedding v2 returns "dense" out of the 3 methods (sparse, dense, multi-dense/colbert) of the bge-m3 model.
- Embedding v2 does not adopt FP16 and normalization.
Utilize embedding
The embedding API can be used for the following tasks.
- Compute vector similarity between sentences to improve search performance. For example, you can measure the vector similarity of documents to a search keyword entered by a user and return the most relevant documents.
- Compute the similarity between two sentences to determine the similarity of related documents, or compare the semantic similarity between sentences.
- Cluster documents with similar characteristics.
- Categorize documents. Vectorized text data can be used in trained models to perform a variety of classification tasks, such as classifying text based on topic or sentiment.
The embedding workflow consists of preparing the data, performing the embedding, saving the vector output, developing the API, and calling the output. Through this process, you can save the embedded output and use it in your database.
Embedding workflow:::(Info)
- Convert the file to text for embedding.
- Depending on the type of text and its purpose, break up the text appropriately using the Segmentation or Summarization APIs.
- After selecting an appropriate embedding model, perform the embedding operation by converting the text to a vector.
- Store the embedded result and the original text together in a vector DB.
- Create an API by converting the user input query to a vector → Compare the similarity with the vectors stored in the DB to find a matching vector and call the mapped original text to generate the final result.
- You can use the Chat APIs to output the final result by putting the API result into a prompt and generating a response in the appropriate format that the user wants.
:::
Process long text
The embedding API can process up to 500 tokens (clir-emb-dolphin, clir-sts-dolphin) or 8192 tokens (v2, bge-m3) at a time. If embedding is difficult due to the text token limit, we recommend using the chunking method to properly break up long texts. When chunking text, it is important to properly break the text into semantic units in order to extract the correct information. Chunking is the process of breaking text into smaller pieces, and includes Sliding window, Segmentation, Summarization operations.
Here are the types of chunking methods and the advantages and disadvantages of each of them.
Method | Description | Advantages | Disadvantages |
---|---|---|---|
Sliding window | Splits text into units of constant length | Easily extracts the correct answer to a query by breaking the text into smaller pieces | Because text is divided into units of length rather than units of semantics, the beginning and end of the text are poorly treated, and it is difficult to understand the meaning of the entire text |
Segmentation | Separates text into meaningful paragraphs that make sense in context | Text can be grouped into meaningful units for better embedding performance | Long paragraphs make it difficult to identify where the query is answered |
Summarization | Summarizes a long piece of text into a shorter version, focusing on the main points | Summarizes longer contextual text than Segmentation, making it easier to embed on a per-document basis | Long paragraphs make it difficult to identify where the query is answered |
CLOVA Studio provides the tokenizer (embedding v2) APIs, segmentation APIs, and summarization APIs. For more information, see Tokenizer APIs, Segmentation APIs, and Summarization APIs.
See the citation information for the bge-m3 model below.
@misc{bge-m3,
title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
year={2024},
eprint={2402.03216},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Create test app
We provide API guides and application creation tools for integrating the services provided by CLOVA Studio. After creating a test app, you can use curl and Python code to call the APIs provided by the Explorer.
- After clicking the [Create test app] button, you can call the API using curl and Python code.
- The information about the created test apps can be found on the Test apps tab of the App application status.
- For more information on how to issue a test app, see Utilize samples and manage tasks.