Creating Custom Benchmark Suites

Create your own benchmark suites to test models on topics that matter to you — no JSON editing required. The Benchmark Suite Builder lets you craft questions manually or use a local AI model to generate them automatically.

Sidebar item: Benchmark Builder Route: /benchmark (Builder tab)

Manual Question Entry

To build a suite entirely by hand:

Open the Benchmark page and click the Builder tab.
Enter a Suite name and an optional Description.
Click Add Question to create a new entry.
Fill in the question text.
Add the answer choices (A, B, C, D). You can add more choices if needed.
Select the correct answer from the dropdown.
Optionally assign a category to group questions by subject.
Repeat steps 3-7 for each question you want to add.
Click Save Suite when you are done.

Your custom suite now appears alongside the built-in suites and is ready to run.

AI-Assisted Generation

Let a local model do the heavy lifting:

In the Builder tab, switch to the AI Generate section.
Enter a topic — for example, "French Revolution", "TCP/IP networking", or "organic chemistry".
Choose the number of questions to generate.
Select a difficulty level (easy, medium, hard).
Pick the model that will generate the questions from the dropdown.
Click Generate. The model streams questions in real time.

Once generation completes:

Review each question for accuracy.
Edit any question text, choices, or correct answers that need fixing.
Deselect questions that are poorly formed or incorrect — they will not be included.
Click Add to Suite to merge the approved questions into your suite.

Editing Suites

Click the Edit button on any custom suite to modify it. You can:

Rename the suite or update its description
Add, remove, or reorder questions
Edit individual question text, choices, and correct answers
Change question categories

Built-in suites (ARC, GSM8K, etc.) cannot be edited, but you can duplicate them to create a modified version.

Exporting

Click the Export button on a custom suite to download it as a JSON file. The exported file contains:

Suite name and description
All questions with their choices, correct answer index, and category

Share exported JSON files with others or contribute them to the Community Benchmarks collection.

Tips

Use specific topics — "Mitochondrial electron transport chain" generates better questions than "biology". The more focused the topic, the more accurate the AI output.
Mix manual and AI — Generate a batch with AI, then add a few hand-crafted edge-case questions to round out coverage.
Always validate answers — AI-generated correct answers can be wrong. Double-check each one before saving.
Larger models produce better questions — A 14B-parameter model generates significantly more accurate and nuanced questions than a 3B model. Use the best model you have available.
Categorize consistently — Use consistent category names across questions so the results breakdown is meaningful.
Start small — Generate 5-10 questions first to check quality before committing to a larger batch.

Creating Custom Benchmark Suites ​

Manual Question Entry ​

AI-Assisted Generation ​

Editing Suites ​

Exporting ​