New experimental prompt strategy to only show valid and invalid combinations of behavior emojis, instead of example phrases. Works well with gpt-4o-mini and llama3-70b. Does not work great with gpt-3.5-turbo. Also added negative LLM unit tests checks for behaviors (FOLLOW and not ATTACK).