Using APIs

Prev Next

Available in Classic and VPC

This section describes the APIs provided in the Explorer menu.

Reranker API

The Reranker API evaluates the relevance between retrieved documents and a query, selects the most relevant documents, and summarizes and compresses them to generate a RAG answer. It can be used to refine search results, and because only selected documents are used to generate the answer (not all retrieved documents) you can operate resources more efficiently.

RAG Reasoning API

RAG answers include elements that improve reliability, such as citation sources and indexed citation markers. By using the RAG Reasoning model trained to match these answer types, you can deliver evidence-based RAG answers to users. RAG Reasoning provides engine calls using a Function calling format. You can specify one or multiple RAG functions, and the LLM autonomously selects the optimal function based on the situation to perform retrieval-augmented generation.

The RAG Reasoning API can be used alone for retrieval-augmented generation. However, when chained with the Reranker API, it provides a more stable RAG answering system that supports multi-turn and multi-query requests.

chain_rag_reranker.png

Token Calculator API

The Token Calculator API counts the number of tokens in an input sentence. You can use it to determine the optimal token count or to build more efficient prompts.
Token Calculator (Chat) and Token Calculator (Chat v3) calculate the number of tokens in sentences submitted to Chat Completions and Chat Completions v3 APIs that support HyperCLOVA X models.
Token Calculator (Embedding v2) calculates token counts for sentences submitted to the bge-m3 model in the Embedding v2 API.

Sliding Window API

The Sliding Window API ensures that conversations continue smoothly by adjusting the prompt and output so that the total stays within the maximum number of tokens the HyperCLOVA X language model can process.

In chat mode, if the total number of tokens in the conversation between the user and the assistant exceeds the model's maximum token limit, new messages can no longer be generated. The Sliding Window API prevents situations where no further conversation can be generated by deleting the oldest turns in the conversation history between you and the assistant. When deleting conversation turns, it removes them in order starting with the first turn entered after the system instruction.

Caution
  • Because turns are deleted in order from the beginning, newly generated messages may not fully reflect earlier conversation context.
  • If the maximum output token count (the API's maxTokens value) is set very high, more conversation history will be deleted in proportion, causing new messages to reflect even less of the previous dialogue.
Note
  • The Sliding Window API works only with models supported by the Chat Completions API. It does not work with models supported by Chat Completions v3 API.
  • The output of the Sliding Window API must be passed directly to the Chat Completions API in the correct order.
  • Settings such as model name, maxTokens, and others must match the settings used in the Chat Completions API.

How the Sliding Window Works

If the sum of the total tokens in the input conversation history (A) and the maximum number of tokens for newly generated content (B = maxTokens) exceeds the model's maximum token limit (X), then (A + B > X), the Chat Completions API cannot generate new messages. To resolve this, the Sliding Window API deletes old conversation turns until the number of excess tokens (A + B − X) is removed. Deletion always occurs in full turn units. Partial deletion of a turn is not allowed. The minimum number of turns required to cover the excess tokens is deleted.
For example, if the excess is 200 tokens, and the oldest two turns contain 100 tokens and 200 tokens, both turns are deleted. If the excess is 100 tokens or less, only the oldest turn is deleted. Deletion occurs per turn, not per user/assistant message pair.
clovastudio-explorer_slidingwindow_ko.png

Sliding Window Processing Flow

You can continuously use the Chat Completions API without separately adjusting the total token count of the entire conversation by applying the Sliding Window API.

The Sliding Window API workflow is as follows:

  1. Before calling the Chat Completions API, first call the Sliding Window API and include the prompt you want to input (conversation history; messages) in the request.
  2. Insert the value returned in the Sliding Window API response as-is (messages) into the Chat Completions API request.
  3. Make sure that the model name and the maximum token count are set identically in both the Chat Completions API and the Sliding Window API.

Summarization API

The Summarization API divides a given text into paragraphs and generates summaries for each paragraph.

It can split long documents into contextually meaningful sections and summarize each section. This reduces overall text length by removing unnecessary content while preserving key information. By using the paragraph-splitting options (segMaxSize and segMinSize), you can constrain the number of characters per paragraph and adjust the length of the summaries.

clovastudio-explorer03_summarize_ko

You can use the Summarization API in the following ways:

  • Summarize long meeting transcripts and extract key points.
  • Condense important information from long emails for quicker understanding.
  • Create concise summaries of reports or scripts. Because it divides content into contextually appropriate paragraphs, it can also help generate a table of contents.

Embedding API

The Embedding API converts input text into numerical vector values. You can choose from three models depending on your task and objectives. Each model returns a different similarity score for the same sentence pair. If you are new to embeddings, it is recommended that you use Embedding v2, which supports the Token Calculator API.

Tool Name Model name Max Tokens Vector Dimension Recommended Distance Metric Notes
Embedding clir-emb-dolphin 500 tokens 1024 IP (Inner/Dot/Scalar Product; Inner) Token calculator not supported
Embedding clir-sts-dolphin 500 tokens 1024 Cosine Similarity Token calculator not supported
Embedding v2 bge-m3 8,192 tokens 1024 Cosine Similarity
  • Token calculator supported
  • Open-source model

The characteristics of each model supported by the Embedding API are as follows.
clovastudio-explorer_embedding1_ko.png

Note

To obtain output identical to the open-source bge-m3 model when using Embedding v2, keep the following in mind:

  • Embedding v2 returns the dense output type among the three bge-m3 output formats (sparse, dense, and multi-dense/colbert).
  • Embedding v2 does not apply FP16 or normalization.

Embedding API Processing Flow

A typical embedding workflow consists of data preparation, performing embedding, storing the resulting vectors, and developing the API and retrieving results. Through this process, you can store embedded vectors and use them in a database or other retrieval systems.

The embedding workflow is as follows:
clovastudio-explorer_embedding2_ko.png

  1. Convert the file you want to embed into text.
  2. Depending on the type and purpose of the text, divide it appropriately using paragraph splitting or the Summarization API.
  3. Select an appropriate embedding model and generate vector embeddings for the text.
  4. Store both the embedded vectors and the original text together in a vector database.
  5. Convert the user's query into a vector → compare it with vectors stored in the database → identify the most similar vectors, then retrieve the mapped original text and generate the final result.
  6. Insert the API result into a prompt and use the Chat Completions API to generate the final output in the desired format.

Embedding Use Cases

You can use the Embedding API in the following ways:

  • Improve search performance by calculating vector similarity between sentences. For example, you can measure similarity between a user's query vector and document vectors to return the most relevant documents.
  • Calculate similarity between two sentences to determine document relatedness or compare semantic similarity.
  • Group documents with similar characteristics into clusters.
  • Classify documents. You can perform various classification tasks (such as topic or sentiment classification) by using vectorized text data with a trained model.

Processing Long Text

The maximum text length that the Embedding API can process in a single request is 500 tokens for clir-emb-dolphin and clir-sts-dolphin or 8,192 tokens for Embedding v2 (bge-m3). If embedding is difficult due to token limits, it is recommended to split long text using a chunking method. When chunking, it is important to divide the text into meaningful units to extract information accurately. Chunking refers to dividing text into smaller segments, and can be done through basic text splitting, paragraph segmentation, and summarization.

The types of chunking and their advantages and disadvantages are as follows:

Method Description Advantages Disadvantages
Basic Text Splitting Divides the text into fixed-length segments. Allows fine-grained segmentation, making it easier to extract the correct answer for a query. Because text is cut by length rather than meaning, the beginning and end of segments may be handled poorly, making it difficult to understand the overall context.
Paragraph Splits the text into meaningful paragraphs based on context. Groups text into semantically meaningful units, which can improve embedding performance. If paragraphs are long, it may be difficult to pinpoint the exact area containing the answer to a query.
Summarization Produces concise summaries of long text by focusing on key content. Can summarize longer contextual text than paragraph splitting, making it suitable for embedding entire documents. If paragraphs are long, it may be difficult to pinpoint the exact area containing the answer to a query.
Note

CLOVA Studio provides the Token Calculator (Embedding v2) API, the Paragraph Splitting API, and the Summarization API. For more details, see the Token Calculator API, Paragraph Splitting API, and Summarization API.

Citation information for the bge-m3 model is as follows.

@misc{bge-m3,
    title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
    author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
    year={2024},
    eprint={2402.03216},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Paragraph Splitting API

The Paragraph Splitting API divides text into topic-based segments by calculating similarity between sentences. You can specify the maximum number of tokens allowed per paragraph, and even if the original text contains no blank lines or has unclear boundaries, the API can determine the appropriate segmentation based on context. You can control the number of paragraphs using the segCnt parameter. To perform automatic paragraph splitting, set the segCnt parameter to -1. If the value is 0 or higher, the API splits the text into exactly the number of paragraphs specified.

The workflow for the Paragraph Splitting API is as follows:

clovastudio-explorer03_segment_ko