OI RAG Evaluator: A Comprehensive Evaluation Framework for Retrieval-Augmented Generation Systems

1. Introduction The OI RAG Evaluator is a framework designed to assess the performance and quality of Retrieval-Augmented Generation (RAG) systems. Our evaluator aims to provide a holistic assessment of RAG systems, covering various aspects of their performance. 2. RAG System Overview Before diving into the evaluator, let’s briefly review the components of a RAG […]
OI Performance Benchmark Technical Review

To evaluate the inference capabilities of a large language model (LLM), we focus on two key metrics: latency and throughput. Latency Latency measures the time it takes for an LLM to generate a response to a user’s prompt. It is a critical indicator of a language model’s speed and significantly impacts a user’s perception of […]
Decoding LLM Inference Math: Your Step-by-Step Guide

Understanding the maths behind the LLM inference is a crucial knowledge that everyone working in the LLMOPS should know. The high price rates for the GPUs used in the LLM inference puts the GPU utilization optimization in the top of our priorities list, so I’ll go through the process of memory utilization for the LLM […]