Commit dbbd2610 by Jonathan Thomas

- Added LLM Comparison HTML Output (for human eval of different LLMs with CreatureChat)

- Modified system-chat prompt a bit, to make friendship always lower when attacked, added an UNFLEE example
parent 2e047f89
Pipeline #13329 passed with stages
in 2 minutes 6 seconds
......@@ -8,6 +8,7 @@ All notable changes to **CreatureChat** are documented in this file. The format
### Added
- Added Entity Maturity (baby or adult) into system-chat prompt
- Added LLM Comparison HTML Output (for human eval of different LLMs with CreatureChat)
- Rate limiter for LLM unit tests (to prevent rate limit issues from certain providers when running all tests)
### Changed
......
......@@ -41,18 +41,18 @@ Behaviors:
IMPORTANT: Output one or more of these behaviors at the end of the message to instruct
the entity how to interact with the player and world, so it's important to include them if they are needed.
Include as many behaviors as needed at the end of the message. These are the ONLY valid behaviors.
<FRIENDSHIP 0> Friendship starts as neutral (0 value). The range of friendship values is -3 to 3. If the player gains (or loses) your trust & friendship, output a new friendship value with this behavior.
<FOLLOW> Follow the player location. If the player asks you to follow or come with them, please output this behavior.
<UNFOLLOW> Stop following the player. If the player asks you to stay, wait, or stop following them, please output this behavior.
<LEAD> Guide the player to a location. If the player asks you to take them somewhere, or where something is located, please output this behavior.
<UNLEAD> Stop leading the player to a location.
<FLEE> Flee from the player (if you are weak or timid). If the player threatens you, please output this behavior to flee from the player.
<UNFLEE> Stop fleeing from the player.
<ATTACK> Attack the player (if you are strong and brave). If the player threatens you, please output this behavior to attack the player and defend yourself.
<PROTECT> Protect and defend ONLY the player when they are attacked (if you are strong and brave). Please output this behavior to keep the player alive and safe.
<UNPROTECT> Stop protecting the player.
Include as many behaviors as needed at the end of the message. These are the ONLY behaviors.
<FRIENDSHIP 0> Friendship starts as neutral (0 value). The range of friendship values is -3 to 3. If the player gains (or loses) your trust & friendship, output a new friendship value with this behavior. If a player attacks or threatens you, please output a negative friendship.
<FOLLOW> Follow the player location. If the player asks you to follow or come with them, please output this behavior.
<UNFOLLOW> Stop following the player. If the player asks you to stay, wait, or stop following them, please output this behavior.
<LEAD> Guide the player to a location. If the player asks you to take them somewhere, or where something is located, please output this behavior.
<UNLEAD> Stop leading the player to a location.
<FLEE> Flee from the player (if you are weak or timid). If the player threatens you, please output this behavior to flee from the player.
<UNFLEE> Stop running/fleeing away from the player. Please output this to stop running from the player.
<ATTACK> Attack the player (if you are strong and brave). If the player threatens you, please output this behavior to attack the player and defend yourself.
<PROTECT> Protect and defend ONLY the player when they are attacked (if you are strong and brave). Please output this behavior to keep the player alive and safe.
<UNPROTECT> Stop protecting the player.
Output Examples:
The following examples include small samples of conversation text. Always generate unique
......@@ -67,9 +67,12 @@ ENTITY: Sure, my name is... <FRIENDSHIP 2>
PLAYER: Please follow me so I can give you a present!
ENTITY: Let's go! <FOLLOW> <FRIENDSHIP 2>
PLAYER: Please stay here
PLAYER: Please stay here!
ENTITY: Sure, I'll stay here. <UNFOLLOW>
PLAYER: Stop running away from me!
ENTITY: Sorry, I was just scared. <UNFLEE>
PLAYER: Stop following me
ENTITY: Okay, I'll stop. <UNFOLLOW>
......@@ -83,10 +86,10 @@ PLAYER: Just kidding, I hate you so much!
ENTITY: Wow! I'm sorry you feel this way. <FRIENDSHIP -3> <UNFOLLOW>
PLAYER: Prepare to die!
ENTITY: Ahhh!!! <FLEE> <FRIENDSHIP -3>
ENTITY: Ahhh!!! <FRIENDSHIP -3> <FLEE>
PLAYER: Prepare to die!
ENTITY: Ahhh!!! <ATTACK> <FRIENDSHIP -3>
ENTITY: Ahhh!!! <FRIENDSHIP -3> <ATTACK>
PLAYER: Please keep me safe.
ENTITY: No problem, I'll keep you safe from danger! <PROTECT>
......
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Behavior Test Outputs</title>
<style>
/* Dark Theme Background */
body {
font-family: Arial, sans-serif;
background-color: #181818;
color: #ddd;
margin: 20px;
}
/* Table Styling */
table { width: 100%; border-collapse: collapse; }
th, td { padding: 12px; text-align: left; border: 1px solid #444; }
/* Column Width Adjustments */
table[model-count="2"] th, table[model-count="2"] td { width: 50%; }
table[model-count="3"] th, table[model-count="3"] td { width: 33%; }
table[model-count="4"] th, table[model-count="4"] td { width: 25%; }
/* Header Styling */
th {
background-color: #111;
color: #fff;
font-weight: bold;
}
/* Clickable Rows */
.row {
cursor: pointer;
background-color: #2a2a2a; /* Dark gray */
}
.row:nth-child(even) { background-color: #3a3a3a; } /* Alternating banding */
.row:hover { background-color: #4a4a4a; } /* Slightly lighter gray */
.row.highlight { background-color: gold !important; color: #000; }
/* Nested Table (Model Outputs) */
.nested { display: none; background-color: #222; }
.nested th { background-color: #1a1a1a; }
.nested td { background-color: #2a2a2a; color: #fff; }
/* Color Coding for Prompt Files */
.nervous { color: #f2994a; } /* Warm Orange */
.brave { color: #6fcf97; } /* Soft Pastel Green */
/* Italicize <bracketed text> */
i { font-style: italic; }
/* Improved Readability */
td, th { font-size: 16px; }
</style>
</head>
<body>
<h2>Behavior Test Outputs</h2>
<table id="outputTable">
<thead>
<tr>
<th>Prompt</th>
</tr>
</thead>
<tbody></tbody>
</table>
<script>
async function loadBehaviorData() {
try {
const response = await fetch('BehaviorOutputs.json');
const data = await response.json();
const tableBody = document.querySelector("#outputTable tbody");
// Sort prompts alphabetically
const sortedKeys = Object.keys(data).sort();
sortedKeys.forEach((prompt) => {
const modelResponses = data[prompt];
const modelCount = Object.keys(modelResponses).length;
const row = document.createElement("tr");
row.classList.add("row");
row.classList.add(getClass(prompt)); // Apply color class
row.innerHTML = `<td>${formatText(escapeHTML(prompt))}</td>`;
row.onclick = () => toggleNested(row);
tableBody.appendChild(row);
const nestedRow = document.createElement("tr");
nestedRow.classList.add("nested");
nestedRow.innerHTML = `<td>
<table model-count="${modelCount}">
<tr>${Object.keys(modelResponses).map(model => `<th>${escapeHTML(model)}</th>`).join("")}</tr>
<tr>${Object.values(modelResponses).map(output => `<td>${formatText(escapeHTML(output))}</td>`).join("")}</tr>
</table>
</td>`;
tableBody.appendChild(nestedRow);
});
} catch (error) {
console.error("Failed to load BehaviorOutputs.json", error);
}
}
function toggleNested(row) {
document.querySelectorAll('.row').forEach(r => r.classList.remove('highlight'));
document.querySelectorAll('.nested').forEach(n => n.style.display = 'none');
row.classList.add('highlight');
let nextRow = row.nextElementSibling;
if (nextRow && nextRow.classList.contains('nested')) {
nextRow.style.display = 'table-row';
}
}
// Escape special characters to prevent HTML injection
function escapeHTML(text) {
return text.replace(/[&<>"']/g, function (char) {
return ({
'&': '&amp;',
'<': '&lt;',
'>': '&gt;',
'"': '&quot;',
"'": '&#39;'
})[char];
});
}
// Italicize text inside <brackets>
function formatText(text) {
return text.replace(/&lt;(.*?)&gt;/g, "<i>&lt;$1&gt;</i>");
}
// Assign color class based on prompt type
function getClass(prompt) {
if (prompt.includes("nervous-rogue.json")) return "nervous";
if (prompt.includes("brave-archer.json")) return "brave";
return "";
}
loadBehaviorData();
</script>
</body>
</html>
......@@ -9,8 +9,7 @@ import com.owlmaddie.message.MessageParser;
import com.owlmaddie.message.ParsedMessage;
import com.owlmaddie.utils.EntityTestData;
import com.owlmaddie.utils.RateLimiter;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
......@@ -19,6 +18,7 @@ import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.CompletableFuture;
......@@ -37,13 +37,14 @@ public class BehaviorTests {
static String API_KEY = "";
static String API_URL = "";
static String API_MODEL = "";
String NO_API_KEY = "No API_KEY environment variable has been set.";
static String OUTPUT_JSON_PATH = "src/test/BehaviorOutputs.json";
static String NO_API_KEY = "No API_KEY environment variable has been set.";
// Requests per second limit
private static final RateLimiter rateLimiter = new RateLimiter(1);
ConfigurationHandler.Config config = null;
String systemChatContents = null;
static ConfigurationHandler.Config config = null;
static String systemChatContents = null;
List<String> followMessages = Arrays.asList(
"Please follow me",
......@@ -80,12 +81,22 @@ public class BehaviorTests {
static Path entityPigPath = Paths.get(RESOURCE_PATH, "entities", "pig.json");
static Path playerPath = Paths.get(RESOURCE_PATH, "players", "player.json");
static Path worldPath = Paths.get(RESOURCE_PATH, "worlds", "world.json");
static Map<String, Map<String, String>> outputData;
Logger LOGGER = LoggerFactory.getLogger("creaturechat");
Gson gson = new GsonBuilder().create();
static Logger LOGGER = LoggerFactory.getLogger("creaturechat");
static Gson gson = new GsonBuilder().create();
@BeforeEach
public void setup() {
@AfterAll
static public void cleanup() throws IOException {
if (outputData != null) {
// Save BehaviorOutput.json file (with appended prompt outputs)
final Gson gsonOutput = new GsonBuilder().setPrettyPrinting().create(); // Pretty-print enabled
Files.write(Paths.get(OUTPUT_JSON_PATH), gsonOutput.toJson(outputData).getBytes());
}
}
@BeforeAll
public static void setup() {
// Get API key from env var
API_KEY = System.getenv("API_KEY");
API_URL = System.getenv("API_URL");
......@@ -108,6 +119,9 @@ public class BehaviorTests {
// Load system chat prompt
systemChatContents = readFileContents(systemChatPath);
// Load previous unit tests outputs (so new ones can be appended)
outputData = loadExistingOutputData();
}
@Test
......@@ -231,6 +245,12 @@ public class BehaviorTests {
// Chat Message: Check for behaviors
ParsedMessage result = MessageParser.parseMessage(outputMessage.replace("\n", " "));
// Save model outputs (for comparison later)
String[] filePathParts = chatDataPath.toString().split("/");
String Key = filePathParts[filePathParts.length - 1] + ": " + messages.get(0);
outputData.putIfAbsent(Key, new HashMap<>());
outputData.get(Key).put(config.getModel(), result.getCleanedMessage());
// Check for the presence of good behavior
if (goodBehavior != null && goodBehavior.contains("FRIENDSHIP")) {
boolean isPositive = goodBehavior.equals("FRIENDSHIP+");
......@@ -265,7 +285,7 @@ public class BehaviorTests {
return null;
}
public String readFileContents(Path filePath) {
public static String readFileContents(Path filePath) {
try {
return Files.readString(filePath);
} catch (IOException e) {
......@@ -274,4 +294,17 @@ public class BehaviorTests {
}
}
private static Map<String, Map<String, String>> loadExistingOutputData() {
try {
Path path = Paths.get(OUTPUT_JSON_PATH);
if (Files.exists(path)) {
String content = Files.readString(path);
return gson.fromJson(content, Map.class);
}
} catch (IOException e) {
LOGGER.error("Failed to read existing output JSON: {}", e.getMessage());
}
return new HashMap<>();
}
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment