MatzekMedia/Human-authored-judgment

matzekmedia / outputs

Every tool building AI on top of your knowledge is making the same move: give the AI more of you. More memory, more retrieval, more of your notes in context at the right moment. All of it points at one goal, an AI that works from everything you know. And all of it skips the part that would make the work yours. The AI can hold your entire vault and still not know where you stand: which note you settled and which you abandoned, what a project is actually for, which of two decisions still binds after you overruled the other last month. Access was never the constraint. Judgment was.

That judgment, authored over your own knowledge one decision at a time, is governance. The narrow version is the layer that tells an AI which of your notes to trust. The wider one is your standing sense of how the work should go, what it is for, and what good looks like to you. No tool ships it to you, because it is not a feature. It is the residue of decisions only you can make.

What a system can do is make that authoring deliberate: catch each decision as you make it and load it back the next time you work, so the judgment compounds instead of evaporating. Settle a question once, and every session after starts further ahead. How that turns a single call into a standing principle, and why no competitor can clone your judgment, is the rest of this piece.

The calls the problem can't answer

Start with what the AI is genuinely good for. We have computers that make judgment calls now, and a whole class of them they make well: the best tool for the job, the working code for an idea you've spelled out. What those share is a right answer that does not depend on you. Given the problem, the answer is checkable, and the AI reaches it about as well as you would, faster.

The calls it cannot make are the ones the problem cannot answer. Picture two notes on one client's buttons. One is a brand guideline you wrote: square, because the brand reads as structured and you want it distinctive. The other is a result from the client's own test: rounded converted better. Both are current, both are yours, and they contradict. You have to ship one. The AI cannot make this call, and the reason is not access. It has both notes, weighted equally, and the tiebreak is written in neither. It is a call you already made about this brand: distinctiveness outranks a conversion bump, or it does not. There is nothing to measure that against except what you are building toward, and only you can see that. Give the model perfect recall of both notes and it still has nothing to rank them by, because the ranking lives in a judgment you never wrote down as data. Governance is the accumulated record of that second kind of call, kept so the AI can reason from it.

A fair objection: if the AI makes calls, maybe deciding what to trust is just one more call it makes, and governance lives in the AI alongside everything else. It half does, and the half is worth being exact about. The AI does real work here, reading back each conversation and filing what got decided into the registers that hold your state of mind. Routing that content is genuine labor, and the machine does it. Filing is not deciding. What it files are the calls you already made, where you pushed back and redirected and kept what held. The deciding happened upstream, by you.

So governance lives in neither the data nor the harness layer. The AI surfaces it, the data holds it, the harness loads and routes it, and not one of the three authors it. That part is the human, threaded through all of them. The clearest way to see it is to watch the two layers you can actually buy each reach for the decision and miss.

Both data and harness layers reach for it and miss

The two things you can buy are the data layer and the harness. The data layer is typed notes, schemas, templates, frontmatter the AI can read with confidence. The harness is the runtime that loads your context and executes across sessions without losing the thread. One stores a fact, the other fetches it. Governance is neither storing nor fetching. It is deciding. Watch each reach for the decision and stop a step short.

Storage:

The case for typed structure starts by naming the failure it fixes: a vault that absorbs exclusively AI-generated text and ideas becomes a knowledge base with no knower, full of content and emptied of the person whose judgment it was built to serve. Typed structure closes part of that gap. Schema-validated frontmatter, strict templates, entity definitions, defined relations: these make notes parseable and catch structural drift. Strong typing is genuinely valuable.

What it cannot reach is which notes matter. A schema can guarantee every meeting note has a date and a decisions block. It cannot say which meeting's decisions still bind. So push authority into the data itself: mark a note's standing in its frontmatter, flag it authoritative, let a routine surface everything flagged current. You can, and doing it shows exactly what governance is. The flag does nothing on its own. Two things have to exist for it to mean anything: a routine that reads the flag and acts on it, and your decision to put the flag there. The first is machinery, and machinery copies. The decision of placing the flag is the governance. The data layer can hold the decision but holding is not deciding.

Retrieval:

The harness reaches for that same decision from the other side, acting on the flag the data layer could only store. GBrain by Garry Tan is the harness done well: hybrid retrieval that fuses vector and keyword search over a typed entity graph, plus a job queue that keeps deterministic tasks alive through crashes. The pitch is honest: "Search gives you raw pages. GBrain gives you the answer."

What it cannot settle is which of two live claims should win. Take the two button notes back to it. Both sit in the same project, both are current, both are yours, and retrieval surfaces them at equal weight. GBrain can supersede by recency, marking a belief inactive or letting a fact expire, but recency is not a standing decision about which note outranks which. Even per-user scoping, the feature that most resembles authority, is only access control: who sees what, not who to trust. It cannot rank the two, because the ranking turns on the call you already made about this brand, and no measurement recovers a call. Better search makes it worse: higher recall surfaces more contradictions with nothing to weigh them by. This is the whole memory cohort's ceiling, not GBrain's alone. They optimize what the agent can recall, and the best add temporal supersession. None weigh trust, because retrieval ranks by what is measurable and trust is not.

Both layers reach the decision and stop. One can hold it, the other can act on it, neither can make it. The data layer trusts everything its schema admits, the harness ranks everything its retrieval surfaces, and a vault with no governance is a well-formatted, instantly-retrievable version of the same problem. What the layers cannot do, you can. The interesting part is what happens the moment you do.

What your decision does next

Here is the important move: You make the call once, and it does not stay where you left it. The decision is recorded with its reasoning and loaded into the next session before you ask for it, so when a parallel fork appears, a different client, the same tension between distinctiveness and a conversion bump, the call is already half-settled. The judgment generalizes. One concrete decision about one client's buttons becomes a principle the AI reasons from across every situation shaped like it.

The mechanism is real, not a picture of how it ought to work. At the close of a conversation, the decision and the reasoning behind it are filed into a register. At the start of the next, that register loads into context, and the work that follows is read against it. Nothing is forced. The recorded judgment is simply present, the way your own settled opinions are present when you sit down to work, so the AI reasons from where you already stand instead of relitigating it. The mechanics of that capture and load are the subject of Congruence: An Architecture for AI Augmented Knowledge Work.

The result has a name. When the judgment you authored loads ahead of the work and the work is read against it, that is congruence, the similarity between the state of mind of the user and the state of "mind" of the model. The judgment never has to be a box you can point to in the code. The machinery and the judgment stay cleanly apart: give two people the identical system, every script and hook the same, and you get two different shapes of governance. The system is the same in every pair of hands but the decisions it holds are unique.

The one layer you cannot buy

Every discipline misses this layer in its own vocabulary. From inside the data layer the gap looks like missing structure, so the fix looks like more types and stricter templates. From inside the harness it looks like missing recall, so the fix looks like better memory. Both are chasing a symptom. The cause is upstream of either: the deciding was never authored, and authoring is the one thing engineering cannot route around, because the mechanical half only carries out the calls the operator already made.

Enterprise tools have built a real version of this, and it is worth conceding plainly. An admin marks a source verified, and the AI is told to trust it. Companies sell this as a governed layer for enterprise AI, where the model is blocked from anything a subject-matter expert has not signed off. That authority is assigned, handed down to a team by a knowledge manager, and assigned authority is copyable by design. That is the feature. It carries a ceiling of its own: standing principles that are safe when the author audits his own output become invisible, propagating errors the moment the author and the affected party are different people. I watched one such system get cut back to a narrow safety floor for exactly that reason. Authored authority works for one operator because the author is the auditor.

The version that matters for one person works the other way. You author the authority yourself, decision by decision, and it compounds the way the last section described: every call you settle loads into the next session and pre-decides the forks shaped like it. The engine is the judgment still running, and it runs on your own work.

This is why the gap widens as the models improve. A better model settles more of the right-answer calls and makes the rest sound more convincing, and it surfaces more of your notes at the right moment. None of that decides which of them you would stand behind. The better the AI gets at producing plausible answers, the more it matters that something you authored settles which one is right.

The cognitive registers that hold that judgment, loaded at the start of every session, are the piece nobody else has built. Every other layer on the stack can be bought, cloned, or rented. This one you author, one decision at a time, and it grows more yours with every session that loads it.

Long-term, I see this as one of the cycles that creates differentiation between entities. Anyone who can make higher-quality decisions faster gets a stronger system. Any broad training set includes too much counterproductive information to augment work optimally, so at the time, with the strength of the models that we see, proprietary data injected into prompts is what improves performance.

Grae.