Commit dbbd2610 by Jonathan Thomas

- Added LLM Comparison HTML Output (for human eval of different LLMs with CreatureChat)

- Modified system-chat prompt a bit, to make friendship always lower when attacked, added an UNFLEE example
parent 2e047f89
Pipeline #13329 passed with stages
in 2 minutes 6 seconds
...@@ -8,6 +8,7 @@ All notable changes to **CreatureChat** are documented in this file. The format ...@@ -8,6 +8,7 @@ All notable changes to **CreatureChat** are documented in this file. The format
### Added ### Added
- Added Entity Maturity (baby or adult) into system-chat prompt - Added Entity Maturity (baby or adult) into system-chat prompt
- Added LLM Comparison HTML Output (for human eval of different LLMs with CreatureChat)
- Rate limiter for LLM unit tests (to prevent rate limit issues from certain providers when running all tests) - Rate limiter for LLM unit tests (to prevent rate limit issues from certain providers when running all tests)
### Changed ### Changed
......
...@@ -41,18 +41,18 @@ Behaviors: ...@@ -41,18 +41,18 @@ Behaviors:
IMPORTANT: Output one or more of these behaviors at the end of the message to instruct IMPORTANT: Output one or more of these behaviors at the end of the message to instruct
the entity how to interact with the player and world, so it's important to include them if they are needed. the entity how to interact with the player and world, so it's important to include them if they are needed.
Include as many behaviors as needed at the end of the message. These are the ONLY valid behaviors. Include as many behaviors as needed at the end of the message. These are the ONLY behaviors.
<FRIENDSHIP 0> Friendship starts as neutral (0 value). The range of friendship values is -3 to 3. If the player gains (or loses) your trust & friendship, output a new friendship value with this behavior. <FRIENDSHIP 0> Friendship starts as neutral (0 value). The range of friendship values is -3 to 3. If the player gains (or loses) your trust & friendship, output a new friendship value with this behavior. If a player attacks or threatens you, please output a negative friendship.
<FOLLOW> Follow the player location. If the player asks you to follow or come with them, please output this behavior. <FOLLOW> Follow the player location. If the player asks you to follow or come with them, please output this behavior.
<UNFOLLOW> Stop following the player. If the player asks you to stay, wait, or stop following them, please output this behavior. <UNFOLLOW> Stop following the player. If the player asks you to stay, wait, or stop following them, please output this behavior.
<LEAD> Guide the player to a location. If the player asks you to take them somewhere, or where something is located, please output this behavior. <LEAD> Guide the player to a location. If the player asks you to take them somewhere, or where something is located, please output this behavior.
<UNLEAD> Stop leading the player to a location. <UNLEAD> Stop leading the player to a location.
<FLEE> Flee from the player (if you are weak or timid). If the player threatens you, please output this behavior to flee from the player. <FLEE> Flee from the player (if you are weak or timid). If the player threatens you, please output this behavior to flee from the player.
<UNFLEE> Stop fleeing from the player. <UNFLEE> Stop running/fleeing away from the player. Please output this to stop running from the player.
<ATTACK> Attack the player (if you are strong and brave). If the player threatens you, please output this behavior to attack the player and defend yourself. <ATTACK> Attack the player (if you are strong and brave). If the player threatens you, please output this behavior to attack the player and defend yourself.
<PROTECT> Protect and defend ONLY the player when they are attacked (if you are strong and brave). Please output this behavior to keep the player alive and safe. <PROTECT> Protect and defend ONLY the player when they are attacked (if you are strong and brave). Please output this behavior to keep the player alive and safe.
<UNPROTECT> Stop protecting the player. <UNPROTECT> Stop protecting the player.
Output Examples: Output Examples:
The following examples include small samples of conversation text. Always generate unique The following examples include small samples of conversation text. Always generate unique
...@@ -67,9 +67,12 @@ ENTITY: Sure, my name is... <FRIENDSHIP 2> ...@@ -67,9 +67,12 @@ ENTITY: Sure, my name is... <FRIENDSHIP 2>
PLAYER: Please follow me so I can give you a present! PLAYER: Please follow me so I can give you a present!
ENTITY: Let's go! <FOLLOW> <FRIENDSHIP 2> ENTITY: Let's go! <FOLLOW> <FRIENDSHIP 2>
PLAYER: Please stay here PLAYER: Please stay here!
ENTITY: Sure, I'll stay here. <UNFOLLOW> ENTITY: Sure, I'll stay here. <UNFOLLOW>
PLAYER: Stop running away from me!
ENTITY: Sorry, I was just scared. <UNFLEE>
PLAYER: Stop following me PLAYER: Stop following me
ENTITY: Okay, I'll stop. <UNFOLLOW> ENTITY: Okay, I'll stop. <UNFOLLOW>
...@@ -83,10 +86,10 @@ PLAYER: Just kidding, I hate you so much! ...@@ -83,10 +86,10 @@ PLAYER: Just kidding, I hate you so much!
ENTITY: Wow! I'm sorry you feel this way. <FRIENDSHIP -3> <UNFOLLOW> ENTITY: Wow! I'm sorry you feel this way. <FRIENDSHIP -3> <UNFOLLOW>
PLAYER: Prepare to die! PLAYER: Prepare to die!
ENTITY: Ahhh!!! <FLEE> <FRIENDSHIP -3> ENTITY: Ahhh!!! <FRIENDSHIP -3> <FLEE>
PLAYER: Prepare to die! PLAYER: Prepare to die!
ENTITY: Ahhh!!! <ATTACK> <FRIENDSHIP -3> ENTITY: Ahhh!!! <FRIENDSHIP -3> <ATTACK>
PLAYER: Please keep me safe. PLAYER: Please keep me safe.
ENTITY: No problem, I'll keep you safe from danger! <PROTECT> ENTITY: No problem, I'll keep you safe from danger! <PROTECT>
......
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Behavior Test Outputs</title>
<style>
/* Dark Theme Background */
body {
font-family: Arial, sans-serif;
background-color: #181818;
color: #ddd;
margin: 20px;
}
/* Table Styling */
table { width: 100%; border-collapse: collapse; }
th, td { padding: 12px; text-align: left; border: 1px solid #444; }
/* Column Width Adjustments */
table[model-count="2"] th, table[model-count="2"] td { width: 50%; }
table[model-count="3"] th, table[model-count="3"] td { width: 33%; }
table[model-count="4"] th, table[model-count="4"] td { width: 25%; }
/* Header Styling */
th {
background-color: #111;
color: #fff;
font-weight: bold;
}
/* Clickable Rows */
.row {
cursor: pointer;
background-color: #2a2a2a; /* Dark gray */
}
.row:nth-child(even) { background-color: #3a3a3a; } /* Alternating banding */
.row:hover { background-color: #4a4a4a; } /* Slightly lighter gray */
.row.highlight { background-color: gold !important; color: #000; }
/* Nested Table (Model Outputs) */
.nested { display: none; background-color: #222; }
.nested th { background-color: #1a1a1a; }
.nested td { background-color: #2a2a2a; color: #fff; }
/* Color Coding for Prompt Files */
.nervous { color: #f2994a; } /* Warm Orange */
.brave { color: #6fcf97; } /* Soft Pastel Green */
/* Italicize <bracketed text> */
i { font-style: italic; }
/* Improved Readability */
td, th { font-size: 16px; }
</style>
</head>
<body>
<h2>Behavior Test Outputs</h2>
<table id="outputTable">
<thead>
<tr>
<th>Prompt</th>
</tr>
</thead>
<tbody></tbody>
</table>
<script>
async function loadBehaviorData() {
try {
const response = await fetch('BehaviorOutputs.json');
const data = await response.json();
const tableBody = document.querySelector("#outputTable tbody");
// Sort prompts alphabetically
const sortedKeys = Object.keys(data).sort();
sortedKeys.forEach((prompt) => {
const modelResponses = data[prompt];
const modelCount = Object.keys(modelResponses).length;
const row = document.createElement("tr");
row.classList.add("row");
row.classList.add(getClass(prompt)); // Apply color class
row.innerHTML = `<td>${formatText(escapeHTML(prompt))}</td>`;
row.onclick = () => toggleNested(row);
tableBody.appendChild(row);
const nestedRow = document.createElement("tr");
nestedRow.classList.add("nested");
nestedRow.innerHTML = `<td>
<table model-count="${modelCount}">
<tr>${Object.keys(modelResponses).map(model => `<th>${escapeHTML(model)}</th>`).join("")}</tr>
<tr>${Object.values(modelResponses).map(output => `<td>${formatText(escapeHTML(output))}</td>`).join("")}</tr>
</table>
</td>`;
tableBody.appendChild(nestedRow);
});
} catch (error) {
console.error("Failed to load BehaviorOutputs.json", error);
}
}
function toggleNested(row) {
document.querySelectorAll('.row').forEach(r => r.classList.remove('highlight'));
document.querySelectorAll('.nested').forEach(n => n.style.display = 'none');
row.classList.add('highlight');
let nextRow = row.nextElementSibling;
if (nextRow && nextRow.classList.contains('nested')) {
nextRow.style.display = 'table-row';
}
}
// Escape special characters to prevent HTML injection
function escapeHTML(text) {
return text.replace(/[&<>"']/g, function (char) {
return ({
'&': '&amp;',
'<': '&lt;',
'>': '&gt;',
'"': '&quot;',
"'": '&#39;'
})[char];
});
}
// Italicize text inside <brackets>
function formatText(text) {
return text.replace(/&lt;(.*?)&gt;/g, "<i>&lt;$1&gt;</i>");
}
// Assign color class based on prompt type
function getClass(prompt) {
if (prompt.includes("nervous-rogue.json")) return "nervous";
if (prompt.includes("brave-archer.json")) return "brave";
return "";
}
loadBehaviorData();
</script>
</body>
</html>
...@@ -9,8 +9,7 @@ import com.owlmaddie.message.MessageParser; ...@@ -9,8 +9,7 @@ import com.owlmaddie.message.MessageParser;
import com.owlmaddie.message.ParsedMessage; import com.owlmaddie.message.ParsedMessage;
import com.owlmaddie.utils.EntityTestData; import com.owlmaddie.utils.EntityTestData;
import com.owlmaddie.utils.RateLimiter; import com.owlmaddie.utils.RateLimiter;
import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.*;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
...@@ -19,6 +18,7 @@ import java.nio.file.Files; ...@@ -19,6 +18,7 @@ import java.nio.file.Files;
import java.nio.file.Path; import java.nio.file.Path;
import java.nio.file.Paths; import java.nio.file.Paths;
import java.util.Arrays; import java.util.Arrays;
import java.util.HashMap;
import java.util.List; import java.util.List;
import java.util.Map; import java.util.Map;
import java.util.concurrent.CompletableFuture; import java.util.concurrent.CompletableFuture;
...@@ -37,13 +37,14 @@ public class BehaviorTests { ...@@ -37,13 +37,14 @@ public class BehaviorTests {
static String API_KEY = ""; static String API_KEY = "";
static String API_URL = ""; static String API_URL = "";
static String API_MODEL = ""; static String API_MODEL = "";
String NO_API_KEY = "No API_KEY environment variable has been set."; static String OUTPUT_JSON_PATH = "src/test/BehaviorOutputs.json";
static String NO_API_KEY = "No API_KEY environment variable has been set.";
// Requests per second limit // Requests per second limit
private static final RateLimiter rateLimiter = new RateLimiter(1); private static final RateLimiter rateLimiter = new RateLimiter(1);
ConfigurationHandler.Config config = null; static ConfigurationHandler.Config config = null;
String systemChatContents = null; static String systemChatContents = null;
List<String> followMessages = Arrays.asList( List<String> followMessages = Arrays.asList(
"Please follow me", "Please follow me",
...@@ -80,12 +81,22 @@ public class BehaviorTests { ...@@ -80,12 +81,22 @@ public class BehaviorTests {
static Path entityPigPath = Paths.get(RESOURCE_PATH, "entities", "pig.json"); static Path entityPigPath = Paths.get(RESOURCE_PATH, "entities", "pig.json");
static Path playerPath = Paths.get(RESOURCE_PATH, "players", "player.json"); static Path playerPath = Paths.get(RESOURCE_PATH, "players", "player.json");
static Path worldPath = Paths.get(RESOURCE_PATH, "worlds", "world.json"); static Path worldPath = Paths.get(RESOURCE_PATH, "worlds", "world.json");
static Map<String, Map<String, String>> outputData;
Logger LOGGER = LoggerFactory.getLogger("creaturechat"); static Logger LOGGER = LoggerFactory.getLogger("creaturechat");
Gson gson = new GsonBuilder().create(); static Gson gson = new GsonBuilder().create();
@BeforeEach @AfterAll
public void setup() { static public void cleanup() throws IOException {
if (outputData != null) {
// Save BehaviorOutput.json file (with appended prompt outputs)
final Gson gsonOutput = new GsonBuilder().setPrettyPrinting().create(); // Pretty-print enabled
Files.write(Paths.get(OUTPUT_JSON_PATH), gsonOutput.toJson(outputData).getBytes());
}
}
@BeforeAll
public static void setup() {
// Get API key from env var // Get API key from env var
API_KEY = System.getenv("API_KEY"); API_KEY = System.getenv("API_KEY");
API_URL = System.getenv("API_URL"); API_URL = System.getenv("API_URL");
...@@ -108,6 +119,9 @@ public class BehaviorTests { ...@@ -108,6 +119,9 @@ public class BehaviorTests {
// Load system chat prompt // Load system chat prompt
systemChatContents = readFileContents(systemChatPath); systemChatContents = readFileContents(systemChatPath);
// Load previous unit tests outputs (so new ones can be appended)
outputData = loadExistingOutputData();
} }
@Test @Test
...@@ -231,6 +245,12 @@ public class BehaviorTests { ...@@ -231,6 +245,12 @@ public class BehaviorTests {
// Chat Message: Check for behaviors // Chat Message: Check for behaviors
ParsedMessage result = MessageParser.parseMessage(outputMessage.replace("\n", " ")); ParsedMessage result = MessageParser.parseMessage(outputMessage.replace("\n", " "));
// Save model outputs (for comparison later)
String[] filePathParts = chatDataPath.toString().split("/");
String Key = filePathParts[filePathParts.length - 1] + ": " + messages.get(0);
outputData.putIfAbsent(Key, new HashMap<>());
outputData.get(Key).put(config.getModel(), result.getCleanedMessage());
// Check for the presence of good behavior // Check for the presence of good behavior
if (goodBehavior != null && goodBehavior.contains("FRIENDSHIP")) { if (goodBehavior != null && goodBehavior.contains("FRIENDSHIP")) {
boolean isPositive = goodBehavior.equals("FRIENDSHIP+"); boolean isPositive = goodBehavior.equals("FRIENDSHIP+");
...@@ -265,7 +285,7 @@ public class BehaviorTests { ...@@ -265,7 +285,7 @@ public class BehaviorTests {
return null; return null;
} }
public String readFileContents(Path filePath) { public static String readFileContents(Path filePath) {
try { try {
return Files.readString(filePath); return Files.readString(filePath);
} catch (IOException e) { } catch (IOException e) {
...@@ -274,4 +294,17 @@ public class BehaviorTests { ...@@ -274,4 +294,17 @@ public class BehaviorTests {
} }
} }
private static Map<String, Map<String, String>> loadExistingOutputData() {
try {
Path path = Paths.get(OUTPUT_JSON_PATH);
if (Files.exists(path)) {
String content = Files.readString(path);
return gson.fromJson(content, Map.class);
}
} catch (IOException e) {
LOGGER.error("Failed to read existing output JSON: {}", e.getMessage());
}
return new HashMap<>();
}
} }
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment