Adding protect behavior to LLM tests

4 jobs from protect-goal-v2 in 2 minutes 22 seconds (queued for 1 second)
Status Job ID Name Coverage
  Build
passed #38973
minecraft
build_mod

02:22

 
  Test
manual #38974
minecraft allowed to fail manual
gpt-3.5-turbo
manual #38975
minecraft allowed to fail manual
gpt-4o
manual #38976
minecraft allowed to fail manual
llama3-8b