.png)
At Eve, we built our own background coding agent and named it WALL-E.
The name fits. Like its namesake, our WALL-E is persistent, resourceful, and happiest when left alone with a messy problem and a clear mission. It doesn’t just write code. It keeps going: booting a full application stack, testing changes in a real browser, checking data in a live database, and opening merge requests — all from a Slack message.
An engineer types @wall-e add a Resend Invite button to the user management page in Slack. A few minutes later, WALL-E posts back with a merge request link and a URL to a live sandbox where you can view the real change. No one provisioned a server. No one wrote a test plan. No one opened an IDE.
WALL-E started as a hackathon project. Two weeks later, it opened its first real merge request. Right now, WALL-E authors 22% of all merge requests in our monorepo, and 53% of those are one-shot — a single Slack message in, a production grade MR out, with no human iteration needed.
It is now part of how we ship software at Eve — and the story of how we built it is inseparable from why it works.
Every major AI coding tool — Copilot, Cursor, Claude Code — is designed around the same model: a human sits at a computer, the agent assists in real time. That’s valuable, but it doesn’t solve our actual bottleneck.
Our problem wasn’t that engineers coded too slowly. It was that we had more tasks than engineers. Bug fixes sat in the backlog for weeks. Small improvements never got prioritized. Feature requests took forever. And our developers were spread thin across frontend, backend, platform, and infrastructure.
The work was well-defined and straightforward — it just needed someone to do it. We needed a teammate that could work in the background, around the clock, across every layer of the stack, without waiting for a human to hold its hand.
The experience is simple: you talk to WALL-E in Slack like you’d talk to a colleague.
@wall-e The "Copy Conversation" feature is broken. When users click the copy button, they get a 500 error. Can you fix it?
WALL-E spins up its own isolated environment — a full copy of our application with a real database, a real backend, and a real frontend. It investigates the bug, writes a fix, runs the test suite, opens the app in a browser to verify the UI, and posts a merge request back to the thread. You get a link to the live sandbox so you can test the fix yourself before approving.
Each Slack thread is a separate workspace. Follow up with “now also add a success toast notification” and WALL-E picks up right where it left off — same branch, same database state, same context. Start a new thread, and it gets a fresh environment for a fresh task.
.png)
The key insight is that a coding agent is only as good as its environment. An agent that can edit files but can’t run them is guessing. An agent that can run tests but can’t open a browser is blind. We gave WALL-E everything a human engineer gets:
Within its sandbox, WALL-E has the same permissions as any engineer on the team. The sandbox is completely isolated from production, so the blast radius is zero — but within that boundary, maximum agency produces the best results. Every restriction we tried adding (limiting bash, gating git operations) made the output worse.
WALL-E is built on Modal for on-demand sandboxes, OpenCode as the coding agent inside each sandbox, and the Model Context Protocol (MCP) to connect to external tools like Linear, Sentry, Figma, and Playwright.
Before WALL-E, our backlog was a graveyard of P2 bugs and “nice-to-have” improvements that never got prioritized. Now those tasks have somewhere to go. An engineer can triage a bug, decide it’s straightforward, toss it to WALL-E, and move on to the hard architectural problem they’ve been putting off. The backlog shrinks not because we hired more people, but because the straightforward work no longer competes with the complex work for the same human attention.
WALL-E is bad at architectural decisions, product intuition, and ambiguous requirements. It’s great at well-scoped implementation: “add this endpoint,” “fix this bug,” “build this UI component to match the Figma.” This isn’t a limitation — it’s a feature. It means our human engineers spend less time on mechanical work and more time on design, code review, and the creative problem-solving that actually requires human judgment.
WALL-E runs 24/7. Drop a task into Slack at anytime, and the merge request will be ready in about 15 minutes — faster than most humans can context-switch into a problem, let alone solve it. And unlike a human engineer, WALL-E has unlimited concurrency. Five tasks at once? Five sandboxes, five merge requests, no bottleneck.
Because WALL-E works through Slack and produces live sandbox URLs, you don’t need a laptop to participate. Review the merge request diff on your phone. Click the sandbox link to test the running app. Approve and merge. Engineering doesn’t have to happen at a desk.
Most coding agents are stateless: they finish a task, forget what happened, and repeat the same mistakes next time. WALL-E improves with every session.
When a sandbox is about to shut down, WALL-E runs a quick retrospective over the thread: what it tried, what failed, and what finally worked. If there’s a reusable lesson, it commits it to the nearest AGENTS.md so future sessions start smarter. For example:
yarn migration:run after modifying any entity file. Tests will fail with cryptic column-not-found errors otherwise.”These are the kind of lessons that usually live in one engineer’s head, passed around in code review comments that no one rereads. WALL-E instead writes them into files that every future session — human or agent — benefits from.
The effect compounds. Early sessions were rough: wrong test commands, missed build steps, unfamiliar conventions. Now a fresh session starts smarter because previous work distilled the quirks into AGENTS.md.
That creates a flywheel most teams don’t have. Human knowledge walks out the door. WALL-E’s learnings are materialized in the repo: mistakes documented once, conventions written once, and institutional knowledge versioned alongside the code.
We think this is the most underexplored idea in agentic coding: the agent as a contributor to its own instruction set. Not fine-tuning a model, not RAG over documentation — just a plain markdown file that the agent reads, uses, and improves. It’s simple, auditable (it’s a git diff), and it works.
Invest in the environment, not the prompt. We spent weeks tuning system prompts. The real breakthrough was giving the agent a working test runner and a browser. When the agent can see that its code doesn’t work, it fixes it. The environment is a better teacher than any prompt.
Design for multi-turn from day one. Our v1 sandboxes were ephemeral — every follow-up message started from scratch. Adding session persistence (preserving database state, branches, and conversation context across messages) transformed WALL-E from a single-shot tool into something people actually kept using.
Meet engineers where they already work. We considered building a custom web UI. Slack was the right call. Zero adoption friction. Everyone already knows how to use it. The task-per-thread model maps naturally to how engineers think.
Give the agent maximum agency within an isolation boundary. Don’t restrict tools inside the sandbox. Restrict the sandbox’s access to production. Within the boundary, let the agent do whatever a human engineer would do.
The codebase is the best long-term memory. Capture knowledge in the repo with AGENTS.md, “why”-focused comments, and instructions next to the code so it stays current, versioned, and accurate.
Interactive planning. Today WALL-E works best on well-scoped tasks. For larger features, we’re building a mode where the agent proposes a plan with tradeoffs, and an engineer refines it before any code is written.
Expanding the agent’s skills. WALL-E already has access to Linear, Sentry, Figma, and Playwright. Next we’re adding deeper issue tracker integration, CI/CD log access, and more MCP servers to reduce how often a human needs to step in.
Automated code review. WALL-E already opens merge requests, and now it will review them too. It will scan diffs for bugs, style issues, and convention mismatches as a first pass before human review.
Read-only production access. We’re planning to give WALL-E read-only access to production data. This unlocks tasks like investigating live bugs, verifying data integrity, and monitoring feature rollouts.
Opening WALL-E to non-engineers. We want to make WALL-E accessible to designers, PMs, and other non-engineering teammates with appropriate guardrails. The goal is to let non-engineers file straightforward fixes through Slack and verify results in a sandbox without pulling an engineer into the loop.
Eve is a legal technology company, and our engineering challenges span platform, AI, and product development. We’re building agentic systems that operate in real environments and AI-native development workflows.
We’re a small team in San Mateo. WALL-E isn’t a demo — it’s how we work. If you want to build the systems that make background agents reliable, we’d like to talk.
Reach out at eve.legal/careers.