Run evaluation

Prev Next

Available in VPC

You can verify the accuracy of the data search service and perform evaluations using the evaluation model.

The evaluation procedures are as follows:

1. Create evaluation set

To create a evaluation set, follow these steps:

If the evaluation set type is automatic:

  1. In the Evaluations menu, check the progress status in the evaluation list.
  2. If automatic is selected in 1. Evaluation settings > evaluation metrics and dataset > dataset settings during evaluation creation, the evaluation dataset is created automatically and marked as creating evaluation set.
  3. When the evaluation set is completed, the [Create evaluation completed] button appears in the progress status.
  4. Click the [Create evaluation completed] button.
  5. Download the automatically generated evaluation set.
Caution

Additional charges apply for automatic evaluation set creation.

If the evaluation dataset type is manual:

  1. In the Evaluations menu, check the progress status in the evaluation list.
  2. If manual is selected in 1. Evaluation settings > evaluation metrics and dataset > dataset settings during evaluation creation, the evaluation dataset is created automatically and marked as waiting for upload.
  3. Click the [Upload] button in the evaluation set column of the list.
  4. Click Download template in the evaluation data upload section.
    • Template download: downloads a template file with sample evaluation set data
    • Supported formats: CSV, XLSX
    • File size: up to 200 MB
  5. Refer to the template file and create the evaluation dataset file manually.
    • query: input the evaluation target question
    • llm_answer: area for the LLM-generated answer
    • contexts: search result context
Note

For automatic evaluation dataset types, after dataset creation, you can upload the evaluation dataset file directly using the [Upload] button.

Stop evaluation set creation

You can stop evaluation set creation if it is being created automatically. To stop evaluation set creation, do the following:

  1. Click [Create evaluation] in the Evaluations menu.
  2. Select automatic in the dataset settings when creating the evaluation.
  3. Complete the creation of the evaluation.
  4. Check that the progress status shows creating evaluation set in the evaluation list.
  5. Click the rag_common_button1.png button for the evaluation whose evaluation set creation you want to stop.
  6. Click the [Stop] button in the progress status section of the evaluation information.
  7. Under stop evaluation click the [Apply] button.
  8. The progress status changes to dataset creation stopped, and evaluation set creation is stopped.
Caution
  • Charges may apply for progress made before stopping evaluation sets.
  • Stopped evaluation set creation cannot be restarted. You must create the evaluation again.

2. Upload evaluation set

To upload a evaluation set file, follow these steps:

  1. In the Evaluations menu, check the evaluations set in the evaluation list.
  2. Click the [Upload] button.
  3. Upload the evaluation dataset file in the evaluation data upload section.
    • Template download: downloads a template file with sample evaluation set data
    • Supported formats: CSV, XLSX
    • File size: up to 200 MB
  4. Click the [Apply] button.

Stop evaluation progress

To stop the evaluation progress for an evaluation dataset file, follow these steps:

  1. Upload the evaluation set file in the evaluation list under the Evaluations menu.
  2. The progress status changes to evaluating.
  3. Click the rag_common_button1.png button for the evaluation progress you want to stop.
  4. Click the [Stop] button in the progress status section of the evaluation information.
  5. Under stop evaluation click the [Apply] button.
  6. The progress status changes to stopped, and evaluation progress is stopped.
Caution
  • Charges may apply for progress made before stopping.
  • Stopped evaluation progress cannot be restarted. You must create the evaluation again.

3. Check evaluation results

To conduct evaluation, you must upload data that the model can consider correct. The system collects documents with additional information, along with user-created questions (queries) and LLM-generated answers via the RAG system. Based on the collected data, the accuracy of the LLM answers is quantified.

To check the evaluation result, follow these steps:

  1. In the Evaluations menu, check the progress status in the evaluation list.
  2. Confirm that the status changes to evaluating immediately after uploading the evaluation set file.
  3. When evaluation is complete, the progress status changes to evaluation completed.
  4. Click the [Download] button for evaluation results in the evaluation list.
  5. Download the evaluation results in CSV format.
    • Query: evaluation target question
    • llm_answer: LLM-generated answer
    • Retrieval_context: collected search results
    • Result: evaluation result value. For more information, see evaluation criteria.
    • Success: true/false judgment on the result
Note

If the progress status does not change after some time, refresh the browser.

Evaluation criteria

RAG service evaluates based on the following criteria:

  • Groundedness: evaluates how much the generated answer is based on the search results. If the answer includes inappropriate information without referencing collected documents, the groundedness score decreases. Groundedness does not measure answer correctness. Even if errors exist in indexed data, if the answer is based on collected data, groundedness scores are high.
    • Score range: 0.0 to 1.0
  • Context Relevancy: uses a judge model to evaluate relevance between the retrieved sentences and the question. Shorter, more relevant retrieved sentences score higher, longer or less relevant sentences score lower.
    • Score range: 0.0 to 1.0