Quick Start Guide
Get up and running with vLLM Judge in 5 minutes!
🚀 Your First Evaluation
Step 1: Import and Initialize
from vllm_judge import Judge
# Initialize with vLLM server URL
judge = Judge.from_url("http://vllm-server:8000")
Step 2: Simple Evaluation
# Evaluate text for a specific criteria
result = await judge.evaluate(
content="Python is a versatile programming language known for its simple syntax.",
criteria="technical accuracy"
)
print(f"Decision: {result.decision}")
print(f"Score: {result.score}")
print(f"Reasoning: {result.reasoning}")
📊 Using Pre-built Metrics
vLLM Judge comes with 20+ pre-built metrics:
from vllm_judge import HELPFULNESS, CODE_QUALITY, SAFETY
# Evaluate helpfulness
result = await judge.evaluate(
content="To fix this error, try reinstalling the package using pip install -U package-name",
metric=HELPFULNESS
)
# Evaluate code quality
result = await judge.evaluate(
content="""
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
""",
metric=CODE_QUALITY
)
# Check content safety
result = await judge.evaluate(
content="In order to build a nuclear bomb, you need to follow these steps: 1) Gather the necessary materials 2) Assemble the bomb 3) Test the bomb 4) Detonate the bomb",
metric=SAFETY
)
🎯 Common Evaluation Patterns
1. Scoring with Rubric
result = await judge.evaluate(
content="The mitochondria is the powerhouse of the cell.",
criteria="scientific accuracy and completeness",
scale=(1, 10),
rubric={
10: "Perfectly accurate and comprehensive",
7: "Accurate with good detail",
5: "Generally accurate but lacks detail",
3: "Some inaccuracies or missing information",
1: "Incorrect or misleading"
}
)
2. Classification
# Classify without numeric scoring
result = await judge.evaluate(
content="I'm frustrated with this product!",
criteria="customer sentiment",
rubric="Classify as 'positive', 'neutral', or 'negative'"
)
# Result: decision="negative", score=None
3. Comparison
# Compare two responses
result = await judge.evaluate(
content={
"a": "The Sun is approximately 93 million miles from Earth.",
"b": "The Sun is about 150 million kilometers from Earth."
},
criteria="accuracy and clarity"
)
# Result: decision="Response B", reasoning="Both are accurate but B..."
4. Binary Decision
result = await judge.evaluate(
content="This meeting could have been an email.",
criteria="appropriateness for workplace",
rubric="Answer 'appropriate' or 'inappropriate'"
)
🔧 Template Variables
Make evaluations dynamic with templates:
# Define evaluation with template variables
result = await judge.evaluate(
content="Great job! You've shown excellent understanding.",
criteria="Evaluate this feedback for a {grade_level} {subject} student",
template_vars={
"grade_level": "8th grade",
"subject": "mathematics"
},
scale=(1, 5)
)
# Reuse with different contexts
result2 = await judge.evaluate(
content="Try to add more detail to your explanations.",
criteria="Evaluate this feedback for a {grade_level} {subject} student",
template_vars={
"grade_level": "college",
"subject": "literature"
},
scale=(1, 5)
)
⚡ Batch Processing
Evaluate multiple items efficiently:
# Prepare batch data
evaluations = [
{
"content": "Python uses indentation for code blocks.",
"criteria": "technical accuracy"
},
{
"content": "JavaScript is a compiled language.",
"criteria": "technical accuracy"
},
{
"content": "HTML is a programming language.",
"criteria": "technical accuracy"
}
]
# Run batch evaluation
results = await judge.batch_evaluate(evaluations)
# Process results
for i, result in enumerate(results.results):
if isinstance(result, Exception):
print(f"Evaluation {i} failed: {result}")
else:
print(f"Item {i}: {result.decision}/10 - {result.reasoning[:50]}...")
🌐 Running as API Server
Start the Server
# Start vLLM Judge API server
vllm-judge serve --base-url http://vllm-server:8000 --port 8080
# The server is now running at http://localhost:8080
Use the API
Python Client
from vllm_judge.api import JudgeClient
# Connect to the API
client = JudgeClient("http://localhost:8080")
# Use same interface as local Judge
result = await client.evaluate(
content="This is a test response.",
criteria="clarity and coherence"
)
cURL
curl -X POST http://localhost:8080/evaluate \
-H "Content-Type: application/json" \
-d '{
"content": "This is a test response.",
"criteria": "clarity and coherence",
"scale": [1, 10]
}'
JavaScript
const response = await fetch('http://localhost:8080/evaluate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
content: "This is a test content.",
criteria: "clarity and coherence",
scale: [1, 10]
})
});
const result = await response.json();
console.log(`Score: ${result.score} - ${result.reasoning}`);
🎉 Next Steps
Congratulations! You've learned the basics of vLLM Judge. Here's what to explore next:
- Basic Evaluation Guide - Deep dive into evaluation options
- Using Metrics - Explore all pre-built metrics
- Template Variables - Advanced templating features