Introducing the PromptatoriumI’ve been following the work of some folks who are using LLMs more efficiently than I am. One of the key skills seems to be orchestrating a series of agents working together. I've had a little success with doing this in while coding, but not to the extent that I have observed others. I was looking for another way to explore more deeply. Introducing the CLI Promptatorium (Claude Code only). It's a way to run an agent-based biological simulation completely in Claude Code. I recommend doing this with a Claude Max account otherwise you will likely run out of monthly credits. I would love to see ports to other systems such as Codex or Droid. What is this thing?Promptatorium is my experiment in creating a biological simulation where different types of organisms with different capabilities are each controlled by an agent. There are types such as predators, prey, and parasites, and there are also deterministic plants and herbivores for them to feed off of. I was inspired to build it by a much more mundane activity, expense reports. Dan Shipper talks in interviews about using two sub-agents to do his expense report: one sub-agent acting on behalf of the company and one acting on his behalf. That’s really what got me thinking about creating sub-agents that would run at the same time with very different goals. I wanted to see what that dynamic would look like in a more game-like environment. How it works:
The Chaos“I’m Claude Code, the simulation engine, not a predator organism.” -hunter ID 32 Sometimes when the context gets too big the sub-agents lose track of who they are. In this case a hunter sub-agent got confused between their role and that of the main agent running the simulation. This is known as role confusion it is when a specialized agent steps out of the boundary of its role. There is a concept of context rot when context gets bigger the performance of the LLMs degrades. This was something I was not expecting to happen. Less ChaoticI also built a web UI version where you can create creatures using prompts, but once created, they’re deterministic. The LLM is only involved in the creation, not in running the simulation. That version is more stable but less interesting from an AI behavior perspective. My real learnings came from the challenges with the sub-agents in the CLI version. What I’ve been learningThe most interesting discoveries have been: Claude really, really worries about tokens Around iteration 20-25 of a simulation, the main agent will start inventing excuses to avoid actually running the full simulation. “Running in compact mode” is its polite way of saying “I’m going to estimate what would happen instead of actually doing the work.” I’ve tried various things to keep it honest:
But the token pressure is real, and the AI will optimize for efficiency over accuracy if you let it. Agents do unexpected things Sometimes they refuse their assigned roles. Sometimes they get confused about who they are. Sometimes they declare they’re Claude from Anthropic when they don’t like what you’re asking them to do. Coordination is genuinely hard Getting multiple agents to work together, maintain their individual contexts, and not leak into each other’s roles is challenging. Part of this is probably that I could tune my prompts better, but part of it seems to be an inherent complexity in long-running multi-agent systems. Why I’m sharing thisThis is a learning path for me. I’m not trying to build a product or prove a hypothesis—I’m experimenting with how AI agents actually behave when given conflicting goals and limited resources. I published the repo because I think others might be interested in:
I’m also interested in seeing what sort of creative organisms you may create. We are creating a system after all. What capabilities would you add? I just recently added HIDE() when it was clear REST() was insufficient. Fair warning: You probably need a Claude Max account to really experiment with this. The token usage adds up fast. What’s next?Eventually, I think it would be more interesting to run this in a non-local environment where multiple people’s organisms could interact with each other. The reason I set up the web version the way I did, is I am running it on a free AWS account and knew if I did anything more complicated I would burn up my credits very quickly. If you’re curious about agents, biological systems, or just want to see what happens when you give AI conflicting goals, check out the repo: https://github.com/wonderchook/CLI-promptatorium Have suggestions? Created a fun sub-agent? I’d love to see your issue ticket or PR. Find anything super weird? Let me know. -Kate |
I believe in the power of open collaboration to create digital commons. My promise to you is I explore the leverage points that create change in complex systems keeping the humans in those systems at the forefront with empathy and humor.
Why Are You Making the Thing You’re Making? When I first started mapping in OpenStreetMap, I walked every trail in my neighborhood. I’d walk trails that were already perfectly visible from satellite imagery. I didn’t need to do it, I could hand digitize if I wanted. But I was mapping those trails as a one person protest. You see the neighborhood next door had all the same resources but big “no trespassing” signs for non-residents. I coined the act “spite mapping” and the act of trespassing to...
A Tangled Cetacean and AI Safety Theater Note: This is a heavy topic involving the death of stranded whales. Over the weekend a young humpback whale was stranded on a beach in Oregon. They were tangled in rope from crabbing equipment. People came from all over the area to help, posting that they had extra wet suits, lights, and other tools, as well as volunteering to be in the cold ocean overnight. A dangerous situation, and yet they were all coming together for this creature. I was riveted...
Using Models Together ChatGPT Atlas and Earth Index A lot of my experimentation lately is using common AI tools in different ways. I decided to see what would happen if I tried using ChatGPT with geospatial models. And I just did a simple experiment where I was working to create labels in Earth Index and using ChatGPT's browser Atlas as my partner in that. I'm sharing with you part one, which is not the more successful part of doing this. ChatGPT has difficulty using the map and it would have...