vLLM Judge
A lightweight library for LLM-as-a-Judge evaluations using vLLM hosted (or any OpenAI API-compatible) models. Evaluate LLM inputs & outputs at scale with just a few lines of code. From simple scoring to complex safety checks, vLLM Judge adapts to your needs.
Features
-
🚀 Simple Interface: Single
evaluate()
method that adapts to any use case -
🔧 Template Support: Dynamic evaluations with template variables
-
🎯 Pre-built Metrics: 20+ ready-to-use evaluation metrics
-
💬 Conversation Support: Evaluate entire conversations with multi-turn dialog
-
🛡️ Model-Specific Support: Seamlessly works with specialized models like Llama Guard 3 & Granite Guardian 3.2 without breaking their trained formats
-
⚡ High Performance: Async-first design enables high-throughput evaluations
-
🌐 API Mode: Run as a REST API service
📦 Source Code: GitHub Repository | 🐛 Issues: Report Bugs | 📖 PyPI: Package |
Getting Started
Installation
Basic installation with pip:
pip install vllm-judge
For detailed installation instructions, prerequisites, and environment setup, see Installation Guide.
Your First Evaluation
from vllm_judge import Judge
# Initialize with vLLM server URL
judge = Judge.from_url("http://vllm-server:8000")
# Simple evaluation
result = await judge.evaluate(
content="The Earth orbits around the Sun.",
criteria="scientific accuracy"
)
print(f"Decision: {result.decision}")
print(f"Reasoning: {result.reasoning}")
What’s Next?
📚 Learn the Basics
-
Installation Guide - Detailed setup instructions and prerequisites
-
Quick Start Guide - Get up and running with comprehensive examples in 5 minutes
🔧 Advanced Usage
-
Basic Evaluation Guide - Deep dive into evaluation options and patterns
-
Using Metrics - Explore all 20+ pre-built metrics
-
Template Variables - Advanced templating features for dynamic evaluations
Ready to get started? Head to the Installation Guide or jump straight into the Quick Start Guide!