Added a filter toggle for comparing LLM outputs. Added deep seek results for comparison.

4 jobs from improved-chat-prompt in 1 minute 53 seconds (queued for 3 seconds)
Status Job ID Name Coverage
  Build
passed #41418
minecraft
build_mod

01:53

 
  Test
manual #41419
minecraft allowed to fail manual
gpt-3.5-turbo
manual #41420
minecraft allowed to fail manual
gpt-4o
manual #41421
minecraft allowed to fail manual
llama3-8b