Untangling Systems: Introducing the Promptatorium


Introducing the Promptatorium

I’ve been following the work of some folks who are using LLMs more efficiently than I am. One of the key skills seems to be orchestrating a series of agents working together. I've had a little success with doing this in while coding, but not to the extent that I have observed others. I was looking for another way to explore more deeply.

I’ve always been interested in simulating biological systems, so instead of trying to figure out how to orchestrate a bunch of agents to write code, I decided to build a biological system to see how the agent orchestration would work.

Introducing the CLI Promptatorium (Claude Code only). It's a way to run an agent-based biological simulation completely in Claude Code. I recommend doing this with a Claude Max account otherwise you will likely run out of monthly credits. I would love to see ports to other systems such as Codex or Droid.

What is this thing?

Promptatorium is my experiment in creating a biological simulation where different types of organisms with different capabilities are each controlled by an agent. There are types such as predators, prey, and parasites, and there are also deterministic plants and herbivores for them to feed off of. I was inspired to build it by a much more mundane activity, expense reports.

Dan Shipper talks in interviews about using two sub-agents to do his expense report: one sub-agent acting on behalf of the company and one acting on his behalf. That’s really what got me thinking about creating sub-agents that would run at the same time with very different goals. I wanted to see what that dynamic would look like in a more game-like environment.

How it works:

  • Each type of organism is a sub-agent with its own goals (survive, eat, reproduce, etc.)
  • You can create an episode through “/episode name number_of_ticks” which kicks off a menu to select characteristics of the episode and decide which critters are in it.
  • The main Claude Code context acts as the coordinator, running the world and keeping everything together
  • Organisms move around, interact, try to achieve their goals
  • There is also an integrity sub-agent. Sometimes Claude decides we are burning through too many tokens and takes shortcuts instead of running all the sub-agent organisms.
  • Sometimes chaos ensues.

The Chaos

“I’m Claude Code, the simulation engine, not a predator organism.” -hunter ID 32

Sometimes when the context gets too big the sub-agents lose track of who they are. In this case a hunter sub-agent got confused between their role and that of the main agent running the simulation. This is known as role confusion it is when a specialized agent steps out of the boundary of its role. There is a concept of context rot when context gets bigger the performance of the LLMs degrades.

This was something I was not expecting to happen.

Less Chaotic

I also built a web UI version where you can create creatures using prompts, but once created, they’re deterministic. The LLM is only involved in the creation, not in running the simulation. That version is more stable but less interesting from an AI behavior perspective. My real learnings came from the challenges with the sub-agents in the CLI version.

What I’ve been learning

The most interesting discoveries have been:

Claude really, really worries about tokens

Around iteration 20-25 of a simulation, the main agent will start inventing excuses to avoid actually running the full simulation. “Running in compact mode” is its polite way of saying “I’m going to estimate what would happen instead of actually doing the work.”

I’ve tried various things to keep it honest:

  • Making it take a pledge at the beginning (helps a little!)
  • Creating an “honesty agent” that checks every 5 cycles to verify work was actually done (sometimes works, sometimes doesn’t catch it)

But the token pressure is real, and the AI will optimize for efficiency over accuracy if you let it.

Agents do unexpected things

Sometimes they refuse their assigned roles. Sometimes they get confused about who they are. Sometimes they declare they’re Claude from Anthropic when they don’t like what you’re asking them to do.

Coordination is genuinely hard

Getting multiple agents to work together, maintain their individual contexts, and not leak into each other’s roles is challenging. Part of this is probably that I could tune my prompts better, but part of it seems to be an inherent complexity in long-running multi-agent systems.

Why I’m sharing this

This is a learning path for me. I’m not trying to build a product or prove a hypothesis—I’m experimenting with how AI agents actually behave when given conflicting goals and limited resources.

I published the repo because I think others might be interested in:

  • Creating their own organisms and seeing how they behave
  • Learning more about how sub-agents work in practice
  • Discovering patterns in AI agent behavior that I haven’t noticed yet

I’m also interested in seeing what sort of creative organisms you may create. We are creating a system after all. What capabilities would you add? I just recently added HIDE() when it was clear REST() was insufficient.

Fair warning: You probably need a Claude Max account to really experiment with this. The token usage adds up fast.

What’s next?

Eventually, I think it would be more interesting to run this in a non-local environment where multiple people’s organisms could interact with each other. The reason I set up the web version the way I did, is I am running it on a free AWS account and knew if I did anything more complicated I would burn up my credits very quickly.

If you’re curious about agents, biological systems, or just want to see what happens when you give AI conflicting goals, check out the repo: https://github.com/wonderchook/CLI-promptatorium

Have suggestions? Created a fun sub-agent? I’d love to see your issue ticket or PR.

Find anything super weird? Let me know.

-Kate

Untangling Systems

I believe in the power of open collaboration to create digital commons. My promise to you is I explore the leverage points that create change in complex systems keeping the humans in those systems at the forefront with empathy and humor.

Read more from Untangling Systems
A robot questioning a loom with a sunrise in the background

Why Are You Making the Thing You’re Making? When I first started mapping in OpenStreetMap, I walked every trail in my neighborhood. I’d walk trails that were already perfectly visible from satellite imagery. I didn’t need to do it, I could hand digitize if I wanted. But I was mapping those trails as a one person protest. You see the neighborhood next door had all the same resources but big “no trespassing” signs for non-residents. I coined the act “spite mapping” and the act of trespassing to...

banner that says "save the whale"

A Tangled Cetacean and AI Safety Theater Note: This is a heavy topic involving the death of stranded whales. Over the weekend a young humpback whale was stranded on a beach in Oregon. They were tangled in rope from crabbing equipment. People came from all over the area to help, posting that they had extra wet suits, lights, and other tools, as well as volunteering to be in the cold ocean overnight. A dangerous situation, and yet they were all coming together for this creature. I was riveted...

Using Models Together ChatGPT Atlas and Earth Index A lot of my experimentation lately is using common AI tools in different ways. I decided to see what would happen if I tried using ChatGPT with geospatial models. And I just did a simple experiment where I was working to create labels in Earth Index and using ChatGPT's browser Atlas as my partner in that. I'm sharing with you part one, which is not the more successful part of doing this. ChatGPT has difficulty using the map and it would have...