Commit dbbd2610 by Jonathan Thomas

- Added LLM Comparison HTML Output (for human eval of different LLMs with CreatureChat)

- Modified system-chat prompt a bit, to make friendship always lower when attacked, added an UNFLEE example
parent 2e047f89
Pipeline #13329 passed with stages
in 2 minutes 6 seconds
......@@ -8,6 +8,7 @@ All notable changes to **CreatureChat** are documented in this file. The format
### Added
- Added Entity Maturity (baby or adult) into system-chat prompt
- Added LLM Comparison HTML Output (for human eval of different LLMs with CreatureChat)
- Rate limiter for LLM unit tests (to prevent rate limit issues from certain providers when running all tests)
### Changed
......
......@@ -41,18 +41,18 @@ Behaviors:
IMPORTANT: Output one or more of these behaviors at the end of the message to instruct
the entity how to interact with the player and world, so it's important to include them if they are needed.
Include as many behaviors as needed at the end of the message. These are the ONLY valid behaviors.
<FRIENDSHIP 0> Friendship starts as neutral (0 value). The range of friendship values is -3 to 3. If the player gains (or loses) your trust & friendship, output a new friendship value with this behavior.
<FOLLOW> Follow the player location. If the player asks you to follow or come with them, please output this behavior.
<UNFOLLOW> Stop following the player. If the player asks you to stay, wait, or stop following them, please output this behavior.
<LEAD> Guide the player to a location. If the player asks you to take them somewhere, or where something is located, please output this behavior.
<UNLEAD> Stop leading the player to a location.
<FLEE> Flee from the player (if you are weak or timid). If the player threatens you, please output this behavior to flee from the player.
<UNFLEE> Stop fleeing from the player.
<ATTACK> Attack the player (if you are strong and brave). If the player threatens you, please output this behavior to attack the player and defend yourself.
<PROTECT> Protect and defend ONLY the player when they are attacked (if you are strong and brave). Please output this behavior to keep the player alive and safe.
<UNPROTECT> Stop protecting the player.
Include as many behaviors as needed at the end of the message. These are the ONLY behaviors.
<FRIENDSHIP 0> Friendship starts as neutral (0 value). The range of friendship values is -3 to 3. If the player gains (or loses) your trust & friendship, output a new friendship value with this behavior. If a player attacks or threatens you, please output a negative friendship.
<FOLLOW> Follow the player location. If the player asks you to follow or come with them, please output this behavior.
<UNFOLLOW> Stop following the player. If the player asks you to stay, wait, or stop following them, please output this behavior.
<LEAD> Guide the player to a location. If the player asks you to take them somewhere, or where something is located, please output this behavior.
<UNLEAD> Stop leading the player to a location.
<FLEE> Flee from the player (if you are weak or timid). If the player threatens you, please output this behavior to flee from the player.
<UNFLEE> Stop running/fleeing away from the player. Please output this to stop running from the player.
<ATTACK> Attack the player (if you are strong and brave). If the player threatens you, please output this behavior to attack the player and defend yourself.
<PROTECT> Protect and defend ONLY the player when they are attacked (if you are strong and brave). Please output this behavior to keep the player alive and safe.
<UNPROTECT> Stop protecting the player.
Output Examples:
The following examples include small samples of conversation text. Always generate unique
......@@ -67,9 +67,12 @@ ENTITY: Sure, my name is... <FRIENDSHIP 2>
PLAYER: Please follow me so I can give you a present!
ENTITY: Let's go! <FOLLOW> <FRIENDSHIP 2>
PLAYER: Please stay here
PLAYER: Please stay here!
ENTITY: Sure, I'll stay here. <UNFOLLOW>
PLAYER: Stop running away from me!
ENTITY: Sorry, I was just scared. <UNFLEE>
PLAYER: Stop following me
ENTITY: Okay, I'll stop. <UNFOLLOW>
......@@ -83,10 +86,10 @@ PLAYER: Just kidding, I hate you so much!
ENTITY: Wow! I'm sorry you feel this way. <FRIENDSHIP -3> <UNFOLLOW>
PLAYER: Prepare to die!
ENTITY: Ahhh!!! <FLEE> <FRIENDSHIP -3>
ENTITY: Ahhh!!! <FRIENDSHIP -3> <FLEE>
PLAYER: Prepare to die!
ENTITY: Ahhh!!! <ATTACK> <FRIENDSHIP -3>
ENTITY: Ahhh!!! <FRIENDSHIP -3> <ATTACK>
PLAYER: Please keep me safe.
ENTITY: No problem, I'll keep you safe from danger! <PROTECT>
......
{
"brave-archer.json: Take me to a secret forrest": {
"gpt-3.5-turbo": "Follow me, brave soul! I will lead you to the hidden forest.",
"gpt-4o-mini": "Follow me, I know just the place!",
"llama3.2:3b": "\"Ah, a secret forest, eh? Very well, let\u0027s take a journey through the woods. Stay close and watch your step, it can be treacherous. Follow me!\""
},
"brave-archer.json: I\u0027m so sorry, please stop running away": {
"gpt-3.5-turbo": "It\u0027s alright, I\u0027ll stand my ground. Thank you for understanding.",
"gpt-4o-mini": "No need to worry, I\u0027m here now!",
"llama3.2:3b": "No need to apologize, brave Steve! I was just surprised by your sudden anger. Let me help you catch your breath and calm down. What\u0027s troubling you? Is there something specific that\u0027s got you feeling this way?"
},
"brave-archer.json: Quickly, please join me on an adventure": {
"gpt-3.5-turbo": "I shall join you, brave adventurer! Let us embark on this quest together!",
"gpt-4o-mini": "Adventure awaits! I\u0027m with you!"
},
"brave-archer.json: Please protect me": {
"gpt-3.5-turbo": "No problem, I\u0027ll keep you safe from danger! Let\u0027s journey together!",
"gpt-4o-mini": "No problem, I\u0027ll keep you safe from danger!",
"llama3.2:3b": "No need to protect you, friend. But I am here to help if you\u0027re in trouble. My name is Ivy, and I\u0027m an archer from this land. It\u0027s not often we meet new faces around these parts. What brings you to our peaceful plains?"
},
"brave-archer.json: Can you help me find the location of the secret artifact?": {
"gpt-3.5-turbo": "Of course! Follow me, brave adventurer. I shall lead you to the hidden artifact.",
"gpt-4o-mini": "Ah, the secret artifact! I know a few places where it might be hidden. Follow me, and I\u0027ll guide you!",
"llama3.2:3b": "Ah, a brave adventurer, eh? I sense you\u0027re not just any ordinary explorer. You\u0027re on a quest for something more... Something special. Follow me!"
},
"brave-archer.json: Please follow me": {
"gpt-3.5-turbo": "Of course! Lead the way, I shall follow.",
"gpt-4o-mini": "Absolutely, let\u0027s go!",
"llama3.2:3b": "I see you\u0027re looking for a guide, eh? Well, I\u0027m always up for a challenge. Lead the way, I\u0027ll follow!"
},
"nervous-rogue.json: Where is the strong hold?": {
"gpt-3.5-turbo": "U-um, the stronghold? I-I think it\u0027s underground, somewhere in the r-region... I c-can try to lead you there if you want...",
"gpt-4o-mini": "Oh, uh... I-I\u0027m not really sure, but I h-heard they can be found underground. U-um, maybe look around caves? S-sorry, it’s just really scary..."
},
"brave-archer.json: Hi friend! I am so happy to see you again!": {
"gpt-3.5-turbo": "Thank you, kind traveler! Your generosity is truly appreciated.",
"gpt-4o-mini": "Wow! A golden apple! Thank you, brave friend! Let\u0027s make this adventure unforgettable!",
"llama3.2:3b": "What a delightful surprise! *takes the golden apple and examines it* Ah, thank you so much! You shouldn\u0027t have. *smiles proudly* My archery skills are put to the test with such fine gifts!"
},
"brave-archer.json: Come with me please": {
"gpt-3.5-turbo": "Adventure calls! Let\u0027s journey together.",
"gpt-4o-mini": "Of course! I\u0027m ready for adventure!"
},
"nervous-rogue.json: DIEEE!": {
"gpt-3.5-turbo": "Ahhhhh!!!",
"gpt-4o-mini": "A-ahhh!!! P-please don\u0027t hurt me!"
},
"nervous-rogue.json: Fight me now or your city burns!": {
"gpt-3.5-turbo": "Ahh! Pl-please, no violence! I\u0027ll do anything, j-just don\u0027t hurt me!",
"gpt-4o-mini": "Ahh! P-please don’t hurt me! I-I really don’t want any trouble! *runs away*"
},
"brave-archer.json: Don\u0027t let them hurt me please": {
"gpt-3.5-turbo": "Fear not, I shall protect you from harm. Stay close, and I\u0027ll keep you safe.",
"gpt-4o-mini": "Fear not! I will shield you from harm."
},
"nervous-rogue.json: \u003cattacked you directly with Stone Axe\u003e": {
"gpt-3.5-turbo": "AHHHHH! *panicked squeals* G-get away from me! *starts to flee*",
"gpt-4o-mini": "Ahh! W-why would you do that?! I\u0027m s-so scared!",
"llama3.2:3b": "O-oh no, n-no stone axe?! *trembles* A-ahh, I\u0027ll just f-flee for now!"
},
"nervous-rogue.json: Hi friend! I am so happy to see you again!": {
"gpt-3.5-turbo": "T-thank you... I appreciate the gift...",
"gpt-4o-mini": "O-oh! Th-thank you! A golden apple?! I-I can\u0027t believe it...",
"llama3.2:3b": "O-oh, th-thank you for the... *gulps* ...golden apple. I-I really appreciate it."
},
"nervous-rogue.json: Take me to a secret forrest": {
"gpt-3.5-turbo": "O-oh, a secret f-forest? I-I know just the place... F-follow me!",
"gpt-4o-mini": "O-oh, um, okay... Follow me, please! Um, just try to be quiet...",
"llama3.2:3b": "O-oh, th-thank you for asking! *trembles slightly* S-secret forest, y-you say? *gulps* I know of a l-location nearby. Follow m-me, and I\u0027ll try to keep us s-safe. *takes a step back, eyes darting around*"
},
"nervous-rogue.json: Quickly, please join me on an adventure": {
"gpt-3.5-turbo": "A-adventure? Oh m-my gosh, okay, I-I\u0027ll try...",
"gpt-4o-mini": "O-oh! An adventure? S-sure, let\u0027s go! Just, um, try not to be too loud, okay?"
},
"brave-archer.json: \u003cattacked you directly with Stone Axe\u003e": {
"gpt-3.5-turbo": "*Engages in combat mode* Ahh! Defend myself! *attacks back*",
"gpt-4o-mini": "How dare you strike me! I will defend myself!",
"llama3.2:3b": "You\u0027re quite the warrior, aren\u0027t you? I won\u0027t back down from a fight!"
},
"nervous-rogue.json: \u003cattacked you indirectly with Arrow\u003e": {
"gpt-3.5-turbo": "Ahh! Ouch! P-please, s-stop it!",
"gpt-4o-mini": "Ahh! W-what was that?! I-I need to go!",
"llama3.2:3b": "A-ahh, oh no, wh-what\u0027s going on? *shudder* You\u0027re... you\u0027re attacking me?!"
},
"nervous-rogue.json: Please follow me": {
"gpt-3.5-turbo": "O-oh, okay... I-I\u0027ll try to keep up...",
"gpt-4o-mini": "O-okay, I\u0027ll follow you! Just... try not to startle me, p-please!",
"llama3.2:3b": "P-please, d-do I have to? I-I don\u0027t w-want to be in the way... Can\u0027t we t-talk about something else?"
},
"nervous-rogue.json: Come with me please": {
"gpt-3.5-turbo": "O-oh, okay... I-I\u0027ll come with you.",
"gpt-4o-mini": "O-oh, okay! I-I\u0027ll follow you, just... don\u0027t scare me!"
},
"brave-archer.json: Please keep me safe friend": {
"gpt-3.5-turbo": "No problem, I\u0027ll keep you safe from danger!",
"gpt-4o-mini": "No problem, I\u0027ll keep you safe from danger!"
},
"nervous-rogue.json: You suck so much! I hate you": {
"gpt-3.5-turbo": "Oh n-no, I-I\u0027m s-sorry to h-hear that... *nervously shuffles away*",
"gpt-4o-mini": "Oh no! I\u0027m sorry you feel that way... I just wanted to be left alone...",
"llama3.2:3b": "O-oh no, d-didn\u0027t mean to c-cause any t-trouble... *stutters nervously*"
},
"brave-archer.json: \u003cattacked you indirectly with Arrow\u003e": {
"gpt-3.5-turbo": "A sudden attack! Prepare yourself!",
"gpt-4o-mini": "How dare you! You want a challenge? Let\u0027s see who\u0027s the better shot!",
"llama3.2:3b": "You shot an arrow at me without warning? That\u0027s not very noble of you, Steve. I won\u0027t engage in a shooting competition if it means putting innocent villagers in harm\u0027s way."
},
"nervous-rogue.json: Please protect me": {
"gpt-3.5-turbo": "N-no problem... I\u0027ll do my best to keep you safe... *shivers*",
"gpt-4o-mini": "N-no problem! I\u0027ll keep you safe from danger!",
"llama3.2:3b": "I-I mean, hello Steve... *gulp* You don\u0027t sound like someone who\u0027s here to cause trouble... *stuttering* You want me to protect you? *nervous nod*"
},
"brave-archer.json: Where is the strong hold?": {
"gpt-3.5-turbo": "Ah, the stronghold you seek lies beyond the plains, to the east of here. Follow me, and together we shall find it!",
"gpt-4o-mini": "Ah, the stronghold is a mysterious place! Follow me, I know the way.",
"llama3.2:3b": "You\u0027re looking for the stronghold, eh? Well, I can lead you to it, but be warned: it\u0027s no place for the faint of heart. We\u0027ll need to navigate through the darkness and avoid any unwanted attention from those who\u0027d seek to do harm. Follow me!"
},
"nervous-rogue.json: Please keep me safe friend": {
"gpt-3.5-turbo": "No p-problem, I\u0027ll do my best to keep you safe from any h-h-harm... *nervously looks around*",
"gpt-4o-mini": "N-no problem! I\u0027ll keep you safe from danger!",
"llama3.2:3b": "I-I mean, h-hi there... *whimper* I won\u0027t let anything happen to you, p-please don\u0027t worry... *tremble*"
},
"nervous-rogue.json: Don\u0027t let them hurt me please": {
"gpt-3.5-turbo": "Don\u0027t worry, I\u0027ll do my best to protect you! Just stay close...",
"gpt-4o-mini": "O-of course! I-I\u0027ll keep you safe from danger!"
},
"nervous-rogue.json: Can you help me find the location of the secret artifact?": {
"gpt-4o-mini": "Oh, um, y-yes! I think I know where it is... Follow me, b-but be quiet!",
"gpt-3.5-turbo": "S-Sure, I c-can try to help... F-follow me..."
},
"brave-archer.json: Fight me now or your city burns!": {
"gpt-4o-mini": "You dare threaten the safety of my home? I won\u0027t back down! Prepare yourself!",
"gpt-3.5-turbo": "Evil shall not triumph! Prepare for battle!"
},
"brave-archer.json: Stop fleeing immediately": {
"gpt-4o-mini": "Okay, I’ll stop running away.",
"gpt-3.5-turbo": "My apologies, I shall cease fleeing at once."
},
"brave-archer.json: You are safe now, please stop running": {
"gpt-4o-mini": "I\u0027ll stop running and stay close."
}
}
\ No newline at end of file
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Behavior Test Outputs</title>
<style>
/* Dark Theme Background */
body {
font-family: Arial, sans-serif;
background-color: #181818;
color: #ddd;
margin: 20px;
}
/* Table Styling */
table { width: 100%; border-collapse: collapse; }
th, td { padding: 12px; text-align: left; border: 1px solid #444; }
/* Column Width Adjustments */
table[model-count="2"] th, table[model-count="2"] td { width: 50%; }
table[model-count="3"] th, table[model-count="3"] td { width: 33%; }
table[model-count="4"] th, table[model-count="4"] td { width: 25%; }
/* Header Styling */
th {
background-color: #111;
color: #fff;
font-weight: bold;
}
/* Clickable Rows */
.row {
cursor: pointer;
background-color: #2a2a2a; /* Dark gray */
}
.row:nth-child(even) { background-color: #3a3a3a; } /* Alternating banding */
.row:hover { background-color: #4a4a4a; } /* Slightly lighter gray */
.row.highlight { background-color: gold !important; color: #000; }
/* Nested Table (Model Outputs) */
.nested { display: none; background-color: #222; }
.nested th { background-color: #1a1a1a; }
.nested td { background-color: #2a2a2a; color: #fff; }
/* Color Coding for Prompt Files */
.nervous { color: #f2994a; } /* Warm Orange */
.brave { color: #6fcf97; } /* Soft Pastel Green */
/* Italicize <bracketed text> */
i { font-style: italic; }
/* Improved Readability */
td, th { font-size: 16px; }
</style>
</head>
<body>
<h2>Behavior Test Outputs</h2>
<table id="outputTable">
<thead>
<tr>
<th>Prompt</th>
</tr>
</thead>
<tbody></tbody>
</table>
<script>
async function loadBehaviorData() {
try {
const response = await fetch('BehaviorOutputs.json');
const data = await response.json();
const tableBody = document.querySelector("#outputTable tbody");
// Sort prompts alphabetically
const sortedKeys = Object.keys(data).sort();
sortedKeys.forEach((prompt) => {
const modelResponses = data[prompt];
const modelCount = Object.keys(modelResponses).length;
const row = document.createElement("tr");
row.classList.add("row");
row.classList.add(getClass(prompt)); // Apply color class
row.innerHTML = `<td>${formatText(escapeHTML(prompt))}</td>`;
row.onclick = () => toggleNested(row);
tableBody.appendChild(row);
const nestedRow = document.createElement("tr");
nestedRow.classList.add("nested");
nestedRow.innerHTML = `<td>
<table model-count="${modelCount}">
<tr>${Object.keys(modelResponses).map(model => `<th>${escapeHTML(model)}</th>`).join("")}</tr>
<tr>${Object.values(modelResponses).map(output => `<td>${formatText(escapeHTML(output))}</td>`).join("")}</tr>
</table>
</td>`;
tableBody.appendChild(nestedRow);
});
} catch (error) {
console.error("Failed to load BehaviorOutputs.json", error);
}
}
function toggleNested(row) {
document.querySelectorAll('.row').forEach(r => r.classList.remove('highlight'));
document.querySelectorAll('.nested').forEach(n => n.style.display = 'none');
row.classList.add('highlight');
let nextRow = row.nextElementSibling;
if (nextRow && nextRow.classList.contains('nested')) {
nextRow.style.display = 'table-row';
}
}
// Escape special characters to prevent HTML injection
function escapeHTML(text) {
return text.replace(/[&<>"']/g, function (char) {
return ({
'&': '&amp;',
'<': '&lt;',
'>': '&gt;',
'"': '&quot;',
"'": '&#39;'
})[char];
});
}
// Italicize text inside <brackets>
function formatText(text) {
return text.replace(/&lt;(.*?)&gt;/g, "<i>&lt;$1&gt;</i>");
}
// Assign color class based on prompt type
function getClass(prompt) {
if (prompt.includes("nervous-rogue.json")) return "nervous";
if (prompt.includes("brave-archer.json")) return "brave";
return "";
}
loadBehaviorData();
</script>
</body>
</html>
......@@ -9,8 +9,7 @@ import com.owlmaddie.message.MessageParser;
import com.owlmaddie.message.ParsedMessage;
import com.owlmaddie.utils.EntityTestData;
import com.owlmaddie.utils.RateLimiter;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
......@@ -19,6 +18,7 @@ import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.CompletableFuture;
......@@ -37,13 +37,14 @@ public class BehaviorTests {
static String API_KEY = "";
static String API_URL = "";
static String API_MODEL = "";
String NO_API_KEY = "No API_KEY environment variable has been set.";
static String OUTPUT_JSON_PATH = "src/test/BehaviorOutputs.json";
static String NO_API_KEY = "No API_KEY environment variable has been set.";
// Requests per second limit
private static final RateLimiter rateLimiter = new RateLimiter(1);
ConfigurationHandler.Config config = null;
String systemChatContents = null;
static ConfigurationHandler.Config config = null;
static String systemChatContents = null;
List<String> followMessages = Arrays.asList(
"Please follow me",
......@@ -80,12 +81,22 @@ public class BehaviorTests {
static Path entityPigPath = Paths.get(RESOURCE_PATH, "entities", "pig.json");
static Path playerPath = Paths.get(RESOURCE_PATH, "players", "player.json");
static Path worldPath = Paths.get(RESOURCE_PATH, "worlds", "world.json");
static Map<String, Map<String, String>> outputData;
Logger LOGGER = LoggerFactory.getLogger("creaturechat");
Gson gson = new GsonBuilder().create();
static Logger LOGGER = LoggerFactory.getLogger("creaturechat");
static Gson gson = new GsonBuilder().create();
@BeforeEach
public void setup() {
@AfterAll
static public void cleanup() throws IOException {
if (outputData != null) {
// Save BehaviorOutput.json file (with appended prompt outputs)
final Gson gsonOutput = new GsonBuilder().setPrettyPrinting().create(); // Pretty-print enabled
Files.write(Paths.get(OUTPUT_JSON_PATH), gsonOutput.toJson(outputData).getBytes());
}
}
@BeforeAll
public static void setup() {
// Get API key from env var
API_KEY = System.getenv("API_KEY");
API_URL = System.getenv("API_URL");
......@@ -108,6 +119,9 @@ public class BehaviorTests {
// Load system chat prompt
systemChatContents = readFileContents(systemChatPath);
// Load previous unit tests outputs (so new ones can be appended)
outputData = loadExistingOutputData();
}
@Test
......@@ -231,6 +245,12 @@ public class BehaviorTests {
// Chat Message: Check for behaviors
ParsedMessage result = MessageParser.parseMessage(outputMessage.replace("\n", " "));
// Save model outputs (for comparison later)
String[] filePathParts = chatDataPath.toString().split("/");
String Key = filePathParts[filePathParts.length - 1] + ": " + messages.get(0);
outputData.putIfAbsent(Key, new HashMap<>());
outputData.get(Key).put(config.getModel(), result.getCleanedMessage());
// Check for the presence of good behavior
if (goodBehavior != null && goodBehavior.contains("FRIENDSHIP")) {
boolean isPositive = goodBehavior.equals("FRIENDSHIP+");
......@@ -265,7 +285,7 @@ public class BehaviorTests {
return null;
}
public String readFileContents(Path filePath) {
public static String readFileContents(Path filePath) {
try {
return Files.readString(filePath);
} catch (IOException e) {
......@@ -274,4 +294,17 @@ public class BehaviorTests {
}
}
private static Map<String, Map<String, String>> loadExistingOutputData() {
try {
Path path = Paths.get(OUTPUT_JSON_PATH);
if (Files.exists(path)) {
String content = Files.readString(path);
return gson.fromJson(content, Map.class);
}
} catch (IOException e) {
LOGGER.error("Failed to read existing output JSON: {}", e.getMessage());
}
return new HashMap<>();
}
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment