Community Benchmarks
LLMxRay ships with 5 built-in benchmark suites (ARC, GSM8K, HellaSwag, MMLU-Pro, TruthfulQA). Community members can contribute additional suites to test models on specialized topics.
How to Contribute
- Fork the LLMxRay repository
- Create a JSON file in the
community-benchmarks/directory - Follow the schema defined in SCHEMA.md
- Submit a Pull Request
Requirements
- Minimum 20 questions per suite
- All questions must have verifiable correct answers
- Exactly 4 answer choices per question (A, B, C, D)
- Set
"builtIn": false - Include a mix of difficulty levels when applicable
JSON Format
json
{
"id": "my-suite",
"name": "My Custom Suite",
"description": "What this suite tests",
"builtIn": false,
"questions": [
{
"id": "my-suite_001",
"category": "my-suite",
"subcategory": "topic",
"question": "The question text?",
"choices": [
"A) First option",
"B) Second option",
"C) Third option",
"D) Fourth option"
],
"correctAnswer": "B",
"difficulty": "medium"
}
]
}Suite Ideas
Looking for inspiration? Here are some areas not covered by the built-in suites:
- Programming — Code comprehension and debugging questions
- Logic puzzles — Formal logic and deductive reasoning
- Language understanding — Idioms, ambiguity, pragmatics
- Domain-specific — Medical, legal, financial, or engineering knowledge
- Multilingual — Questions in languages other than English
Community Suites
No community suites submitted yet. Be the first!
See the example suite for a working reference.