vLLM Judge

A lightweight library for LLM-as-a-Judge evaluations using vLLM hosted (or any OpenAI API-compatible) models. Evaluate LLM inputs & outputs at scale with just a few lines of code. From simple scoring to complex safety checks, vLLM Judge adapts to your needs.

Features

🚀 Simple Interface: Single evaluate() method that adapts to any use case
🔧 Template Support: Dynamic evaluations with template variables
🎯 Pre-built Metrics: 20+ ready-to-use evaluation metrics
💬 Conversation Support: Evaluate entire conversations with multi-turn dialog
🛡️ Model-Specific Support: Seamlessly works with specialized models like Llama Guard 3 & Granite Guardian 3.2 without breaking their trained formats
⚡ High Performance: Async-first design enables high-throughput evaluations
🌐 API Mode: Run as a REST API service

📦 Source Code: GitHub Repository | 🐛 Issues: Report Bugs | 📖 PyPI: Package

Getting Started

Installation

Basic installation with pip:

pip install vllm-judge

For detailed installation instructions, prerequisites, and environment setup, see Installation Guide.

Your First Evaluation

from vllm_judge import Judge

# Initialize with vLLM server URL
judge = Judge.from_url("http://vllm-server:8000")

# Simple evaluation
result = await judge.evaluate(
    content="The Earth orbits around the Sun.",
    criteria="scientific accuracy"
)
print(f"Decision: {result.decision}")
print(f"Reasoning: {result.reasoning}")

What’s Next?

📚 Learn the Basics

Installation Guide - Detailed setup instructions and prerequisites
Quick Start Guide - Get up and running with comprehensive examples in 5 minutes

🔧 Advanced Usage

Basic Evaluation Guide - Deep dive into evaluation options and patterns
Using Metrics - Explore all 20+ pre-built metrics
Template Variables - Advanced templating features for dynamic evaluations

Ready to get started? Head to the Installation Guide or jump straight into the Quick Start Guide!