Improved LLM unit tests for UNFLEE (trying to prevent failures for brave archer)

Status Job ID Name Coverage
  Build
canceled #42056
minecraft
build_mod

 
  Test
skipped #42057
minecraft allowed to fail manual
gpt-3.5-turbo
skipped #42058
minecraft allowed to fail manual
gpt-4o
skipped #42059
minecraft allowed to fail manual
llama3-8b