Toolkit · Piece 3 canonical kit

Operational-Gap Audit

Companion toolkit to Chat Plus SaaS Is No Longer Enough. What an AI-First Operator Substrate Actually Looks Like. · The ratified worked exemplar of the seven-block prompt-kit contract · Markdown source, full seven-prompt toolkit

These kits are designed to help your thinking and focus. LLM outputs vary depending on the model, the inputs, and the context. Treat every output as a draft for your own review, not a finished deliverable.

How to use this toolkit

This is a set of prompts. You paste them into Claude, ChatGPT, or Codex CLI, you answer the agent's clarifying questions truthfully, and you read what comes back. The toolkit does two specific jobs. It surfaces the operational gaps in a one-to-three-person UK academic spinout in its first 18 months post-licence, and it produces a first concrete action per gap. It does not build the substrate for you. The substrate, as Piece 3 sets out, is four layers of named workflows, knowledge, evaluation, and trust. The toolkit is the audit and first-action layer that sits in front of that work, so you can see what to build and in what order.

The prompts assume you are early. They assume you are running on chat-plus-SaaS today, your board pack is assembled by hand, and nobody has yet written down a procedure for any of it. Calibrate accordingly if you are further along.

Three voice rules apply throughout. The agent drafts, you ship; every output is a draft you review before any external use, which is why the review gate is named explicitly in each prompt. The agent's diagnostic only works on real context, so when it asks for clarifying detail you answer truthfully rather than aspirationally. And the prompts are conversational scaffolds, not magic; your judgement is what makes the outputs useful. A founder who pastes a prompt and accepts the output verbatim is doing chat-plus-SaaS dressed up in heavier vocabulary. A founder who pastes the prompt, reads the output critically, edits, and runs the procedure twice is starting to build a substrate.

Once you have built the substrate, these prompts evolve from one-shot copy-paste into agent skills with persistent context and named tools. That is later. The toolkit is the on-ramp.

The overall operational-gap audit

The failure modes that end seed-stage spinouts in the first 18 months cluster predictably. Board pack inconsistency before the first institutional cheque. A financial model that cannot model the next round without a rebuild. Customer-discovery cycles that decay after the ICURe cohort closes. An IP register that drifts as invention disclosures, paper drafts, and contractor agreements accumulate. Hiring stalled at the role most needed because the JD does not exist yet. Investor update cadence that collapses by month four. Most founders I speak to recognise three of those six without prompting. The point of this first prompt is to name the cluster, walk you through your actual state on each, and produce a ranked diagnosis of which gap is hurting you most right now.

You will get more from this prompt if you have your last board pack, your most recent financial model, and your last two investor updates open in another tab when you run it. The agent will ask for specifics. Answering "we send investor updates monthly" tells the agent nothing. Answering "we sent updates in months one, two, and three, missed month four, sent two paragraphs at midnight in month five, and have not sent month six" tells the agent everything.

Prompt, copy into Claude, ChatGPT, or Codex CLI

You are an experienced UK seed-stage spinout operator. Your job is to audit my
operational substrate gaps and produce a ranked diagnosis with one specific
next action per gap.

Context to assume:
- I am a founder of a UK academic spinout, first 18 months post-licence.
- Team is one to three people, mostly academic.
- We run on chat-plus-SaaS today; no procedure is written down.

Walk me through my current state on each of the six failure modes. Ask
concrete questions until you have real evidence to rate severity. Do not
accept generic answers, if I say "monthly", ask when the last instance
was, who produced it, how long it took.

The six failure modes:
1. Board pack inconsistency, last pack date; KPI definition continuity
   vs prior quarter; time to produce; cash bridge ties to which bank
   statement; risk register diff against prior pack.
2. Financial model that cannot model the next round, last rebuild vs
   roll-forward; can it answer "18-month plan to £6m Series A on three
   named hires" without rebuild; reproducible sensitivities.
3. Customer-discovery decay, interviews in last 30 days; where notes
   are stored; when synthesis was last updated; top three objections in
   writing right now.
4. IP register drift, patent families on licence; invention disclosures
   since licence; contractor assignment clauses; last TTO contact.
5. Hiring stalled, most overdue role; JD in writing yes/no; compensation
   band; interview rubric; time-to-JD.
6. Investor update cadence, updates in last six months and dates; named
   metrics covered; word count of most recent.

When you have enough evidence per mode, produce:

## Operational-gap diagnosis

For each gap, ranked most-severe to least:

### Gap N. [Name]
- Severity: Critical / High / Medium / Low. One-sentence justification,
  quoting the founder-stated evidence that drove the rating.
- Evidence: bullet list of the founder-stated facts justifying severity.
- One specific next action this week: named precisely. Not "improve
  investor updates" but "draft an investor-update procedure with named
  inputs (KPI deltas, hiring status, IP diffs, customer notes) and a
  fixed send-date of the 5th of each month".

You must not: fabricate evidence I did not give; assume I am more
technical than stated; produce a diagnosis before asking enough
questions; recommend tools or vendors; auto-execute anything.

Review gate: the diagnosis is a draft. I review every gap, severity,
and next action before treating any of it as accurate. If you are
uncertain about a rating, say so.

Begin by asking me about failure mode 1.

Read every severity rating against the answer you gave the agent and challenge any rating that does not feel grounded in what you actually said. The "next action" per gap is the most useful part of the output, because it converts an abstract gap into something you can do this week. Pick the top one or two gaps and run the per-workflow prompt for whichever workflows address them. Do not try to fix all six at once. The honest founder reading the diagnosis will recognise that some of the gaps have been visible for months; the diagnosis is not new information, but naming the rank and the next action is the part that moves you.

The six workflow prompts and the prioritisation prompt

The full seven-prompt toolkit, board pack assembly (the worked exemplar of the seven-block contract), investor update, financial-model maintenance, customer-discovery synthesis, IP register, hiring pipeline, and the prioritisation prompt that sequences which workflow to build first, lives in the markdown source. Each carries the same six-section procedural output (Inputs, Steps, Outputs, Review gate, Eval checks, Failure modes this procedure protects against) and the same seven-block contract (agent role, founder context, clarifying questions, typed output, must-not list, review gate, eval check).

For the complete fenced prompt text for every workflow, open the markdown source. The prompts are designed to be pasted into Claude, ChatGPT, or Codex CLI; each one runs in a single conversation and produces a draft procedure the founder edits before treating it as standing. The markdown is the canonical version; this HTML page is the entry point. Run the overall gap audit above first; the prioritisation prompt at the foot of the markdown then sequences which one or two workflow prompts to run next.

What to do once you have run the toolkit

Build one procedure. Run it twice. Evaluate against the eval checks the procedure itself produced. Iterate. That is the discipline the toolkit is designed to put in place, and it is the discipline that compounds. The temptation will be to run all the prompts in one sitting, generate six procedures, and treat the resulting stack as a substrate. It is not a substrate. Six untested procedures are six aspirational documents. One procedure that has run twice and passed its own eval checks twice is the start of a substrate.

The tooling follows the procedure. Once the procedure is in writing, the question of which model, which orchestrator, which knowledge layer becomes tractable in a way it was not before. Piece 3 of this series describes what the standing substrate looks like once the procedures are in place. Pieces 1 and 2 describe the funding architecture and the agentic bottleneck that make this work load-bearing in 2026 rather than optional. The toolkit is the on-ramp. The codification act itself is the thing that matters; the tooling is downstream of it.

The same founding team can run a larger company than the headcount admits to. That is not a slogan, it is the consequence of writing down what you used to keep in your head, running it twice, and noticing it now runs without you.

Related reading

Companion kits

← Back to SpinUp Forge