Untangling Systems: Introducing the Promptatorium


Introducing the Promptatorium

I’ve been following the work of some folks who are using LLMs more efficiently than I am. One of the key skills seems to be orchestrating a series of agents working together. I've had a little success with doing this in while coding, but not to the extent that I have observed others. I was looking for another way to explore more deeply.

I’ve always been interested in simulating biological systems, so instead of trying to figure out how to orchestrate a bunch of agents to write code, I decided to build a biological system to see how the agent orchestration would work.

Introducing the CLI Promptatorium (Claude Code only). It's a way to run an agent-based biological simulation completely in Claude Code. I recommend doing this with a Claude Max account otherwise you will likely run out of monthly credits. I would love to see ports to other systems such as Codex or Droid.

What is this thing?

Promptatorium is my experiment in creating a biological simulation where different types of organisms with different capabilities are each controlled by an agent. There are types such as predators, prey, and parasites, and there are also deterministic plants and herbivores for them to feed off of. I was inspired to build it by a much more mundane activity, expense reports.

Dan Shipper talks in interviews about using two sub-agents to do his expense report: one sub-agent acting on behalf of the company and one acting on his behalf. That’s really what got me thinking about creating sub-agents that would run at the same time with very different goals. I wanted to see what that dynamic would look like in a more game-like environment.

How it works:

  • Each type of organism is a sub-agent with its own goals (survive, eat, reproduce, etc.)
  • You can create an episode through “/episode name number_of_ticks” which kicks off a menu to select characteristics of the episode and decide which critters are in it.
  • The main Claude Code context acts as the coordinator, running the world and keeping everything together
  • Organisms move around, interact, try to achieve their goals
  • There is also an integrity sub-agent. Sometimes Claude decides we are burning through too many tokens and takes shortcuts instead of running all the sub-agent organisms.
  • Sometimes chaos ensues.

The Chaos

“I’m Claude Code, the simulation engine, not a predator organism.” -hunter ID 32

Sometimes when the context gets too big the sub-agents lose track of who they are. In this case a hunter sub-agent got confused between their role and that of the main agent running the simulation. This is known as role confusion it is when a specialized agent steps out of the boundary of its role. There is a concept of context rot when context gets bigger the performance of the LLMs degrades.

This was something I was not expecting to happen.

Less Chaotic

I also built a web UI version where you can create creatures using prompts, but once created, they’re deterministic. The LLM is only involved in the creation, not in running the simulation. That version is more stable but less interesting from an AI behavior perspective. My real learnings came from the challenges with the sub-agents in the CLI version.

What I’ve been learning

The most interesting discoveries have been:

Claude really, really worries about tokens

Around iteration 20-25 of a simulation, the main agent will start inventing excuses to avoid actually running the full simulation. “Running in compact mode” is its polite way of saying “I’m going to estimate what would happen instead of actually doing the work.”

I’ve tried various things to keep it honest:

  • Making it take a pledge at the beginning (helps a little!)
  • Creating an “honesty agent” that checks every 5 cycles to verify work was actually done (sometimes works, sometimes doesn’t catch it)

But the token pressure is real, and the AI will optimize for efficiency over accuracy if you let it.

Agents do unexpected things

Sometimes they refuse their assigned roles. Sometimes they get confused about who they are. Sometimes they declare they’re Claude from Anthropic when they don’t like what you’re asking them to do.

Coordination is genuinely hard

Getting multiple agents to work together, maintain their individual contexts, and not leak into each other’s roles is challenging. Part of this is probably that I could tune my prompts better, but part of it seems to be an inherent complexity in long-running multi-agent systems.

Why I’m sharing this

This is a learning path for me. I’m not trying to build a product or prove a hypothesis—I’m experimenting with how AI agents actually behave when given conflicting goals and limited resources.

I published the repo because I think others might be interested in:

  • Creating their own organisms and seeing how they behave
  • Learning more about how sub-agents work in practice
  • Discovering patterns in AI agent behavior that I haven’t noticed yet

I’m also interested in seeing what sort of creative organisms you may create. We are creating a system after all. What capabilities would you add? I just recently added HIDE() when it was clear REST() was insufficient.

Fair warning: You probably need a Claude Max account to really experiment with this. The token usage adds up fast.

What’s next?

Eventually, I think it would be more interesting to run this in a non-local environment where multiple people’s organisms could interact with each other. The reason I set up the web version the way I did, is I am running it on a free AWS account and knew if I did anything more complicated I would burn up my credits very quickly.

If you’re curious about agents, biological systems, or just want to see what happens when you give AI conflicting goals, check out the repo: https://github.com/wonderchook/CLI-promptatorium

Have suggestions? Created a fun sub-agent? I’d love to see your issue ticket or PR.

Find anything super weird? Let me know.

-Kate

Untangling Systems

I believe in the power of open collaboration to create digital commons. My promise to you is I explore the leverage points that create change in complex systems keeping the humans in those systems at the forefront with empathy and humor.

Read more from Untangling Systems
A path with arrows with swirls and questions and a warning triangle.

Faux Consensus and the Least Bad Decision Trap We will get back to talking about AI soon. I promise. Those that were waiting for me to take a break from the AI, here we go! Today, let’s talk about an older and much messier technology: humans trying to make decisions together. I have been thinking about data governance for a new project I am working on, and it keeps reminding me that, in plain language, governance is the rules and norms a community agrees to play by. Not just what tools we...

Tiles over a map turning to a network of lines and nodes

Tobler’s Law in Latent Space There’s an idea in geography called Tobler’s First Law of Geography: “everything is related to everything else, but near things are more related than distant things.” It sounds almost obvious when you first hear it. Of course nearby things are similar. But recently I’ve been wondering whether AI is quietly breaking this intuition, or revealing that “near” was always more complicated than we thought. Deeper into Tobler Tobler was not the first to notice this...

A network of nodes where most are in the background but one path

Thoughts on meeting fatigue, because I have thoughts again I was staring at the wall. My mind was blankish. I say blankish because there was still this nagging feeling that I had forgotten something. That there was something left to do. This was not simply resting, it was more light disassociation. If disassociation could ever be light. My brain had quietly opted out. I was anxious about all the things I should be doing, aware of time passing, but unable to pick anything up. No book. No...