Utilize APIs
    • PDF

    Utilize APIs

    • PDF

    Article summary

    Available in Classic and VPC

    This page describes the APIs available from the Explorer menu. Click the [Get Started] button for each API to see the details of that API.

    Tokenizer API

    The Tokenizer API can count the number of tokens in a sentence that you type. You can use the tokenizer to find the optimal number of tokens or to create efficient prompts.
    The tokenizer (HCX) is an API that counts the number of tokens in a given sentence in HyperCLOVA X model.
    The tokenizer (embedding v2) is an API that calculates the number of tokens in a sentence you type in the bge-m3 model of the embedding v2 API.

    Sliding window API

    The Sliding window API helps to keep conversations flowing in chat mode by scaling prompts and results to the maximum number of tokens HyperCLOVA X language model can handle.

    In chat mode, if the user's conversation with the assistant exceeds the maximum number of tokens that can be processed by HyperCLOVA X language model, new conversations cannot be created. The Sliding window API can delete the oldest conversation turn in the conversation history of the user and the assistant to prevent such a situation. When deleting conversations, it starts with the conversation following the system directive, i.e., the earliest entered conversation turn.

    • The Sliding window API only works for models that are in chat mode (Chat completions API).
    • You will need to set the order so that the results of the Sliding window API are passed to the Chat completions API as they are.
    • Other settings, such as modelName and maxTokens, should be set to the same values as the Chat completions API settings you are using.
    • Because the chat history between the user and the assistant is deleted sequentially from the beginning of the conversation, newly created chats may not reflect previous chats.
    • In particular, if the maximum number of tokens generated by the result is set to a large number (the maxTokens value in the API), the conversation history will be deleted proportionally based on the number, so newly created conversations may not fully reflect previous conversations.

    How the Sliding window works

    In chat mode, if the sum of the total number of tokens in the entered conversations (A) and the maximum number of tokens in the new conversation (B=maxTokens) is greater than the maximum number of tokens the model can handle (X) (i.e., A+B>X), the Chat completions API will not generate any more conversations. To work around this, the Sliding window API deletes conversation turns from existing conversations based on the number of excess tokens (A+B-X). It deletes on a conversation turn basis to avoid deleting only part of a conversation turn (deleting the minimum number of conversation turns based on the number of excess tokens).
    For example, if the number of excess tokens is 200, as shown in the figure below, using the Sliding window API will delete the two oldest conversation turns (100 and 200 tokens) of the existing conversation history. If the number of excess tokens is 100 tokens or fewer, only the oldest conversation turn in the existing conversation history will be deleted. This means that individual conversation turns are deleted, not pairs of conversations between the user and the assistant.

    Sliding window API workflow

    By using the Sliding window API, you can use the Chat completions API continuously without having to separately adjust the number of tokens for the entire conversation.

    The following describes how the Sliding window API works.

    1. Before using the Chat completions API, first call the Sliding window API and provide the prompt you want to enter (conversation content; body > messages) in the request.
    2. Enter the result in the Sliding window API response (result > messages) into the Chat completions API request as is.
    3. The model name and maximum number of tokens should be the same for the Chat completions API and the Sliding window API.

    Segmentation API

    The Segmentation API can separate paragraphs by topic by calculating the similarity between sentences. You can specify the number of tokens that can fit in a paragraph. The Segmentation API can also split paragraphs based on context, even if there are no blank lines in the paragraph or the break is not clear. The number of paragraphs can be adjusted with the SegCount value. If you want automatic segmentation, set the SegCount value to -1. If the value you enter is 0 or greater, paragraphs are broken to that value.

    The following describes how the Segmentation API works.


    Summarization API

    The Summarization API can split a given set of sentences into paragraphs and then summarize each of the paragraphs.

    It breaks long documents into contextualized paragraphs and summarizes each paragraph. You can reduce the length of your text by removing unnecessary parts while retaining important information. You can also control the size of the summary by using the segMaxSize and segMinSize of the segmentation to limit the number of characters that can be included in a paragraph.


    The Summarization API can be utilized as follows:

    1. Summarize long meeting minutes and understand the content of the minutes to generate key takeaways.
    2. Summarize a long email to make it easier to understand what is important.
    3. Summarize a report or script. Summarize paragraphs in context, making it easy to create a table of contents.

    Summarizing may not work well if the text is published on the web.

    Embedding API

    Convert input text to a vector of numeric values. You can select your desired one from the three models depending on the task and purpose. Each model will give different similarity results for the same pair of sentences.

    ToolModelNumber of tokensDimension of vector spaceRecommended distance metricNote
    Embeddingclir-emb-dolphin500 tokens1024IP (inner/dot/scalar product)
    Embeddingclir-sts-dolphin500 tokens1024Cosine similarity
    Embedding v2bge-m38192 tokens1024Cosine similarityOpen source model*

    Each embedding API model has the following characteristics.


    Consider the following matter to obtain the same output from embedding v2 as that from the open-source bge-m3 model.

    • Embedding v2 returns "dense" out of the three methods (sparse, dense, multi-dense/colbert) of the bge-m3 model.
    • Embedding v2 does not adopt FP16 and normalization.

    Utilize embedding

    The embedding API can be used for the following tasks.

    • Compute vector similarity between sentences to improve search performance. For example, you can measure the vector similarity of documents to a search keyword entered by a user and return the most relevant documents.
    • Compute the similarity between two sentences to determine the similarity of related documents, or compare the semantic similarity between sentences.
    • Cluster documents with similar characteristics.
    • Categorize documents. Vectorized text data can be used in trained models to perform a variety of classification tasks, such as classifying text based on topic or sentiment.

    The embedding workflow consists of preparing the data, performing the embedding, saving the vector output, developing the API, and calling the output. Through this process, you can save the embedded output and use it in your database.

    Embedding workflow

    1. Convert the file to text for embedding.
    2. Depending on the type of text and its purpose, break up the text appropriately using the Segmentation or Summarization API.
    3. After selecting an appropriate embedding model, perform the embedding operation by converting the text to a vector.
    4. Store the embedded result and the original text together in a vector DB.
    5. Create an API by converting the user input query to a vector → Compare the similarity with the vectors stored in the DB to find a matching vector and call the mapped original text to generate the final result.
    6. You can use the Chat API to output the final result by putting the API result into a prompt and generating a response in the appropriate format that the user wants.


    Process long text

    The embedding API can process up to 500 tokens (clir-emb-dolphin, clir-sts-dolphin) or 8192 tokens (v2, bge-m3) at a time. If embedding is difficult due to the text token limit, we recommend using the chunking method to properly break up long texts. When chunking text, it is important to properly break the text into semantic units in order to extract the correct information. Chunking is the process of breaking text into smaller pieces, and includes Sliding window, Segmentation, Summarization operations.

    Here are the types of chunking methods and the advantages and disadvantages of each of them.

    Sliding windowSplits text into units of constant lengthEasily extracts the correct answer to a query by breaking the text into smaller piecesBecause text is divided into units of length rather than units of semantics, the beginning and end of the text are poorly treated, and it is difficult to understand the meaning of the entire text
    SegmentationSeparates text into meaningful paragraphs that make sense in contextText can be grouped into meaningful units for better embedding performanceLong paragraphs make it difficult to identify where the query is answered
    SummarizationSummarizes a long piece of text into a shorter version, focusing on the main pointsSummarizes longer contextual text than Segmentation, making it easier to embed on a per-document basisLong paragraphs make it difficult to identify where the query is answered

    CLOVA Studio provides the tokenizer (embedding v2) API, segmentation API, and summarization API. For more information, see Tokenizer API, Segmentation API, and Summarization API.

    See the citation information for the bge-m3 model below.

        title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
        author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},

    Create test app

    We provide API guides and application creation tools for integrating the services provided by CLOVA Studio. After creating a test app, you can use curl and Python code to call the APIs provided by the Explorer.

    • After clicking the [Create test app] button, you can call the API using curl and Python code.
    • The information about the created test apps can be found on the Test apps tab of the App application status.
    • For more information on how to issue a test app, see Utilize samples and manage tasks.

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.