Open Rhapsody

We're designing bottari's brain. Here's how, so far.

May 22, 2026

Two questions show up in our standup almost every week, like clockwork:

"So… what should we even be working on next?"
"Did that last release actually move the numbers, or did we just ship?"

Nobody loves being the person who has to stop and answer them. It means digging through Slack, the analytics, and the support inbox, then stitching it all together in your own head before you can even start. So this week we started designing the thing that should answer them instead — the brain inside bottari.

bottari is the AI-native workspace we're building. The dream is simple: not a place you go to work, but a workspace that proposes the next move your product needs before you've even sat down. For that to be real, one thing has to be true first — it has to hold all the context about your product. Not just the code. Analytics, user feedback emails, the team conversations buried in Slack. Every signal that says where the product actually stands right now.

Most AI tools today do the opposite. Every time you hand them a task, you spoon-feed the context first: "here's our situation, and last week someone said this in Slack…" You burn your energy being the connective tissue between the tools. We want to stop doing that.

Where Grip came from

Here's the thing — we already have a piece of this. It's called Grip.

Grip didn't start as a grand vision. Our engineers built it internally because we wanted to get E2E testing right, and to do that it had to understand our codebase deeply — where everything lived, what was wired to what. (The name isn't deep either: it had such a tight grip on the codebase that "Grip" pretty much named itself. We're builders, not poets.) It's been running against our own code ever since. Smart — but it only ever looked at one thing: the code.

We're expanding Grip into the workspace's brain

Now we're growing Grip from a code-understanding module into the brain of the whole workspace — one that sees not just the code, but the product's entire context. Same Grip, expanded.

On top of the code, Grip takes in everything it used to be blind to:

Slack conversations
Notion docs
the real values in the database (signups, payments, …)
product usage data (which features get used, and how much)
store reviews and ratings

Because it reads the code and this data together, Grip runs in two modes off the same engine.

Advisor — it answers when asked. Anyone, engineer or PM or whoever, asks about the product and gets an answer grounded in the code and the data. "Why did checkout conversion drop?" "Is anyone actually using this feature?"

Propose — it speaks up before you ask. Without anyone prompting it, Grip watches the data and surfaces "here's what the product needs to grow." That lands in front of the team as a single unit of work — a story — and the team just decides whether it goes into the sprint.

The point is that it isn't one-and-done. It's a closed loop: spot something in the data → propose → the team decides whether it makes the sprint → build and ship → results flow back in → analyze again. And because Grip remembers whether its past proposals actually worked, every lap should make the next one sharper. Not just a loop that spins — a loop that learns.

What it learns doesn't evaporate, either. A new teammate joins, or you pick up something you dropped months ago — ask, and the context is still there. No more re-explaining "wait, what was the background on this?" to every person, every time.

And those two questions from the top of this post? That's the whole point — they should just stop coming up.

So, how are we designing it?

This is the part we're actually in the middle of right now. "It sees all the context" is easy to say, but build it naively and it dies one of two deaths: you dump raw data into the context window and it gets slow and expensive, or you hand-draw and hand-maintain a graph until the team burns out. Here are the three bets we're making to avoid both.

Bet 1: Layer the data by abstraction — never park raw data inside the LLM

We don't pour all the data into the context at once. We split it into four layers by abstraction:

Raw Snapshot — normalized source data. Cold storage that piles up as a time series. The thing you search and aggregate over.
Signal — metrics, deltas, and anomalies computed from raw data in code. E.g. "checkout conversion, down 8% week-over-week."
Use-driven Graph — links between code symbols and product entities.
Insight / Memory — narrative the LLM writes. Small, and the thing the team actually reads. The real body of long-term memory.

Two principles hold it together. The LLM only keeps the lightweight Signal and Insight layers warm; the heavy raw data gets pulled in with a targeted query only when a question needs row-level detail. And metrics are computed deterministically in code, not by the LLM — the model only handles the "why" and the "so what." We're not leaving the numbers for it to eyeball.

Bet 2: Don't pre-draw the graph — let it grow from use

An explicitly maintained entity graph is a trap. Approving every edge by hand is impossible, and trusting auto-inferred edges wholesale invites hallucination. So we redefined the graph as a retrieval requirement, not a stored artifact.

The spine is the one code graph — the only structure Grip already maintains automatically and accurately.
Every other source is a searchable document, tagged with timestamps and identifiers.
Edges aren't stored. They're computed at query time, building a local subgraph on the spot, just for that question.
Of the edges found that way, only the ones validated by outcomes get written permanently into the Insight layer; unvalidated edges evaporate.

The graph becomes a byproduct of use, not a maintenance project. The hallucination risk is contained by one rule: nothing is stored permanently until it's been validated.

Bet 3: Let it grade its own proposals — provenance

For a closed loop to actually learn, structure isn't enough. So we tie signal → proposal → story → sprint → ship → results into one thread — a provenance thread. That thread lets Grip adjust its own confidence area by area, by watching adoption rates and whether the metric actually moved.

The part we like most: a proposal getting cut from the sprint is itself a signal. A rejection is feedback on proposal quality. Explicit adoption (it becomes a story, someone thumbs it up) scores high; implicit signals (PRs drifting in the direction we proposed) score low. Sum them up, and once a proposal crosses a threshold it gets promoted from personal memory to shared team memory.

What we're still chewing on

We haven't figured all of this out — and honestly, the open questions are where we'd love a second brain. The big ones right now:

What kinds of proposals should Grip make first? The whole design is meant to be planned top-down from the decisions we want it to support — so picking that first list matters a lot, and we're not settled on it.
What do we ingest first? Phase 0 can't take in everything at once. Slack? Analytics? Support? We're still arguing about the order.
Extend Grip, or split off a separate service? Grip is a tidy package today. Bolting the whole brain onto it versus standing up something new alongside it is a real fork we haven't committed to.

Where we are right now

It's not built yet. Phase 0 is just the retrieval substrate + Advisor + a minimal provenance stub. Propose and the full loop get layered on thinly on top of that, because retrieval is the common dependency for everything, and Advisor Q&A is the cheapest way to validate and dogfood it.

So that's the design, as of this week. If you've built something like this — or you'd argue any of these three bets differently — I'd genuinely love to hear it.