OI RAG Evaluator: A Comprehensive Evaluation Framework for Retrieval-Augmented Generation Systems

June 10, 2025

3 Minutes Read

1. Introduction

The OI RAG Evaluator is a framework designed to assess the performance and quality of Retrieval-Augmented Generation (RAG) systems. Our evaluator aims to provide a holistic assessment of RAG systems, covering various aspects of their performance.

2. RAG System Overview

Before diving into the evaluator, let’s briefly review the components of a RAG system:

image-20240910-115125.png
  1. Retriever: Finds relevant documents from a knowledge base based on the user’s query.
  2. Generator: Uses the retrieved documents and the query to generate a response.

3. OI RAG Evaluator Components

Our evaluator consists of several key components:

image-20240924-053328.png
  • Claim Extractor: Identifies key claims from the generated response and ground truth.
  • Claim Checker: Verifies the extracted claims against the retrieved context and ground truth.
  • Metric Calculator: Computes various metrics based on the extracted and checked claims.

This diagram illustrates the relationship between the ground truth, model response, retrieved chunks, and the various metrics we use to evaluate the RAG system’s performance.

image-20240910-120145.png
image-20240910-120235.png

4. Evaluation Metrics

The OI RAG Evaluator uses a comprehensive set of metrics to assess different aspects of the RAG system’s performance. These metrics are grouped into three categories:

Category Metrics Description
Overall Metrics Precision, Recall, F1 Measure the overall accuracy of the generated response
Retriever Metrics Claim Recall, Context Precision Assess the quality of the retrieved documents
Generator Metrics Context Utilization, Noise Sensitivity, Hallucination, Self-Knowledge, Faithfulness Evaluate the generator’s performance and behavior

4.1 Metric Descriptions

  1. Overall Metrics
    • Precision: Accuracy of the claims in the generated response
    • Recall: Proportion of ground truth claims covered in the response
    • F1: Harmonic mean of precision and recall
  2. Retriever Metrics
    • Claim Recall: Proportion of ground truth claims supported by retrieved documents
    • Context Precision: Relevance of retrieved documents to the query
  3. Generator Metrics
    • Context Utilization: How well the generator uses the retrieved context
    • Noise Sensitivity: Generator’s resilience to irrelevant information
    • Hallucination: Tendency to generate unfounded information
    • Self-Knowledge: Ability to rely on its own knowledge when appropriate
    • Faithfulness: Consistency of the response with the retrieved context

5. Evaluation Process

The OI RAG Evaluator follows a systematic process to assess RAG system performance:

  1. The user submits RAG results, including queries, retrieved documents, generated responses, and ground truth.
  2. The Claim Extractor identifies key claims from the generated responses and ground truth.
  3. The Claim Checker verifies these claims against the retrieved context and ground truth.
  4. The Metric Calculator computes various metrics based on the extracted and checked claims.
  5. The Evaluator aggregates the results and provides a comprehensive evaluation report.

6. Future Directions

  1. Domain-Specific Metrics: Develop metrics tailored to specific domains or use cases.
  2. Integration with GENAI Studio: Develop plugins for our GENAI-Studio platform for seamless integration into development workflows.

 

Islam Almersawi

3 Minutes Read

Share on socials:

Related Articles

December 8, 2025

The CLOUD Act gives U.S. authorities the right to access data held by U.S. cloud providers anywhere in the world. For governments and critical industries, this creates a direct sovereignty risk. This article breaks down how the law works and why nations must rebuild jurisdictional immunity.

November 28, 2025

Data residency answers where your data sits. Data sovereignty answers who can legally access it. This article breaks down the real difference, exposes how laws like the U.S. CLOUD Act bypass geography entirely, and explains why governments and regulated industries must rethink cloud control in the AI era.

Stay Ahead of the
AI Curve