White Paper

What this is

Next Shell is not a boilerplate. It is a reusable engineering foundation designed to be cloned as the base for many future products. Every convention, quality gate, and AI governance rule ships with the repository and activates automatically — no setup, no prompting, no onboarding.

What makes it distinct: the AI layer is not a hints file stapled onto a starter template. It is a structured governance system with defined layers, one-directional information flow, invariant protections, cascade termination guarantees, and a self-evaluation loop that audits its own correctness. The system is designed to grow autonomously with minimal to no human intervention while maintaining logical coherence at scale.

The autonomous AI governance system

Architecture

The AI layer is organised into four logical layers. Each layer has a distinct activation model, a defined scope, and a strict dependency direction.

Layer	What it contains	When it activates
Always-on instructions	Workspace-level coding rules, conventions, constraints	Injected into every AI interaction automatically
Contextual instructions	Specialised rules scoped to file types and directories	Loaded only when the active file matches — context budget stays lean
On-demand workflows	Evaluation workflows, skills, and agent modes	Loaded explicitly when invoked — zero always-on overhead
Reference documentation	Authoritative descriptions of what exists	Describes the system; audited by evaluation workflows

Information flows in one direction only: the instruction layers update the reference documentation, never the reverse. This unidirectional design is the foundation of the system's scalability — it eliminates the class of bugs where updating documentation triggers instruction changes which trigger documentation changes, ad infinitum.

flow

Always-onevery interaction

Core coding rules, conventions, quality constraints — injected into every AI session automatically

Contextualcurrent work only

Relevant rules appear automatically for the files being worked on, keeping guidance focused and avoiding unnecessary context

On-demandexplicit invocation

Evaluation workflows, scaffolding skills, and agent modes — zero always-on context overhead

Referenceaudited by layer above

System descriptions and capability references are kept consistent, clear, and regularly checked for accuracy

Four-layer architecture — each layer activates at a different scope, information flows strictly downward

Contextual instructions keep guidance scoped to the task at hand. When an AI agent works on a component, it receives component rules. When it works on tests, it receives testing rules. The active context stays proportional to the task, not the total rule count — keeping interactions precise and scalable as the governance layer grows.

Governance reference documents

Three reference documents must always be mutually consistent:

AI file registry — authoritative description of every AI infrastructure file
Capability index — enumeration of every active governance capability
Evaluation reading list — the mandatory file set every audit workflow must load

When any AI infrastructure file is added, removed, or renamed, all three are updated in the same change. Any divergence between them is caught the next time an evaluation workflow runs.

Single update authority

One dedicated location is the only place in the entire system that enumerates update targets. When any instruction, prompt, skill, or agent is added, the AI has exactly one location to consult for which files need updating. No scattered rules. No redundant target lists that inevitably drift. When the reference documentation changes structure, only this one location needs modification, and every downstream rule stays correct.

Unidirectional documentation flow

Updates flow in one direction only: the reference documentation receives updates but never sends instructions back to the instruction layer. This is not a convention — it is a structural guarantee. Mutual dependency between documentation and instruction layers would create an unresolvable loop. Making this guarantee structural prevents the entire class of this failure.

Bounded update propagation

When a file changes, every related file is checked for consistency and updated if needed. Propagation is bounded by design — it covers direct dependencies and terminates, never cascading into further rounds of updates. The agent resolves the immediate update targets, synchronises them, and halts. This guarantees that every change converges quickly and no update cycle can run unbounded.

Self-evaluation loop

Evaluation workflows autonomously audit every layer of the shell. They operate as isolated diagnostic systems — assessing correctness without ever modifying codebase files. Each workflow publishes structured findings independently, and the evaluation report always shows the current state of every layer.

The evaluation workflows cover:

Scope	What it audits
Project layer	Full shell: architecture, conventions, AI setup, documentation accuracy, module system, TypeScript, security, testing, growth readiness
AI layer	Instruction files, prompts, skills, sync drift between reference docs, stale cross-references, contradictions, coverage gaps
Infrastructure layer	CI/CD, env config, release automation, deployment configuration, logging setup
End-to-end	The evaluation infrastructure itself — reading list completeness, update coverage, unregistered file detection, infrastructure directory coverage
Governance logic	Update rule self-reference, multi-hop update chain detection, bounded propagation enforcement, unidirectional flow adherence, authority hierarchy, forward scalability coherence

Project

Architecture, conventions, AI setup, modules, TypeScript, security, testing

Instructions, prompts, skills, sync drift, cross-references, coverage gaps

Infra

CI/CD, env config, release automation, deployment, logging

Governance

Update chains, bounded propagation, unidirectional flow, forward scalability

End-to-end

Evaluation infrastructure itself — the auditors are audited

Complete coverage — no layer audits itself without being audited in turn

Five autonomous audit scopes — each covers a distinct layer, including the evaluation infrastructure itself

Loop safety: no evaluation workflow can invoke another. No circular execution is possible. The end-to-end audit reads the other workflows as data — it cannot cause them to run.

Unregistered file detection

Any AI infrastructure file not registered in the reference documentation has no clear ownership and is invisible to evaluation workflows. It can silently give wrong instructions to any agent that loads it. The end-to-end audit scans all AI infrastructure directories and detects unregistered files before they can accumulate into governance debt.

Protection mechanisms

The system's integrity rests on three architectural protections that work together:

↓

Unidirectional flow

Updates propagate from instruction layers down to reference documents — never the reverse. Eliminates circular dependency bugs structurally.

⊢

Bounded propagation

Every update covers its direct dependencies and stops — no chain reactions can form. Convergence is guaranteed and fast.

↻

Self-auditing loop

Isolated diagnostic workflows cover every layer — including the evaluation infrastructure itself. The auditors are audited.

Combined: structural integrity without runtime overhead

Three structural protections — each eliminates a distinct class of system failure

1. Unidirectional flow — Updates flow in one direction only. Reference documents receive updates from the instruction layer but never send instructions back. This eliminates the entire class of circular dependency bugs at the structural level.

2. Bounded update propagation — Even where update rules contain bidirectional relationships, each update covers its direct dependencies and stops. Re-entry is structurally prevented, so A → B → C → A chains cannot become infinite loops.

3. Self-auditing loop — The evaluation workflows operate as isolated diagnostic systems — non-mutating and architecturally unable to trigger each other. They cover every layer of the shell including the evaluation infrastructure itself, ensuring complete coverage and that no AI file exists without being registered in the reference documentation.

Verified system health

Property	Status	What it means
Unidirectional flow	Healthy	Reference docs receive updates from the instruction layer but never cascade instructions back
Single update authority	Healthy	Exactly one location enumerates which files need updating when AI infrastructure changes
Bounded propagation	Healthy	Every bidirectional relationship carries explicit stop language
Evaluation loop safety	Healthy	Isolated diagnostic systems, independent publishing, no circular triggers
Reference documentation consistency	Healthy	All three reference documents stay mutually consistent
Registration patterns	Healthy	Complete checklists for each AI infrastructure file type
Sync pair	Healthy	A dedicated sync rule enforces identical content across both workspace-level instruction files
applyTo accuracy	Healthy	All contextual instruction globs match their intended file sets
Zombie detection coverage	Healthy	All AI infrastructure directory types are scanned
Multi-hop chain detection	Healthy	Structural integrity audit traces the full update graph for cycles

Cognitive load management

The system is designed to scale without overwhelming the AI's context window or the contributor's mental model.

Always-on

Core coding rules — loaded in every interaction

Contextual

File-specific rules activate based on the active file

On-demand

Workflows and skills loaded only when explicitly invoked

Total context scales with task scope, not total rule count

Context budget stays proportional to the task — only relevant rules are loaded at each tier

Always-on context stays lean. The always-on workspace instructions carry the core coding rules — the essentials that apply everywhere. Everything else is scoped. A soft ceiling is enforced: if the always-on rule count grows without consolidation, a contextual instruction activates to flag it.

Contextual instructions activate only when relevant. Each .instructions.md file declares an applyTo glob. The AI loads it only when the active file matches. Specialised rules for components, pages, tests, SEO, environment variables, and contributing discipline activate at the right moment rather than inflating every interaction.

On-demand workflows add zero always-on overhead. Workflows, skills, and agent modes are loaded only when explicitly invoked. The evaluation workflows, project sync, shell change proposals, PR reviews, and module scaffolding workflows exist in the repository but consume no context budget until called.

Pruning is a first-class concern. When a category of rules grows large enough to warrant its own context, it is extracted into a dedicated contextual instruction file with an appropriate applyTo glob, and the corresponding rules are removed from the always-on file. The system gets more organised as it grows, not more bloated.

Extensibility and growth

Adding AI infrastructure is mechanical, not creative

Every type of AI infrastructure file — instruction, prompt, skill, agent — has a complete registration pattern documented in the AI infrastructure guide. The pattern is a checklist: add the file, update the three reference documents, update the propagation rules if applicable. No judgment calls about where things should be registered. No risk of forgetting a location. An AI agent can follow the pattern without ambiguity.

The module system scales without coupling

Optional features are gated behind boolean flags in a single configuration file using constant literal types. The bundler constant-folds these at build time. Combined with explicit side-effect declarations, disabled module code is fully eliminated from the production bundle — zero bytes shipped, no manual cleanup.

Every file that participates in a module checks its flag with a static guard at the top. No dynamic imports, no conditional requires, no runtime overhead. Adding a new module follows the same mechanical pattern: add a flag, create the files, gate every participant, verify build exclusion.

The self-evaluation loop catches its own degradation

If a new evaluation workflow is added but not registered in the end-to-end audit's reading list, the reference update — which runs on the same change — adds it. If the update is missed, the end-to-end audit detects the gap on its next run. If a new AI infrastructure directory type is introduced, the infrastructure directory coverage dimension flags it.

The system does not rely on contributors remembering to update documentation. It relies on structural guarantees that surface violations mechanically.

Forward scalability

The governance logic audit includes a forward scalability dimension that evaluates whether the governance logic remains coherent as the system grows:

Does the update rule set scale linearly without increasing the risk of undetected multi-hop chains?
Are registration patterns mechanical enough for AI agents to follow without ambiguity?
Is the reference documentation set explicitly bounded, preventing uncontrolled growth?
Would the self-evaluation loop detect its own degradation?

These are not retrospective checks. They are forward-looking structural audits that ensure the system's growth path is sound.

Governance rules guarding key audit concerns

The self-evaluation loop explicitly covers the structural threats that matter most as the system scales:

Concern	Where it is audited	What it prevents
Bidirectional / multi-hop update chains	Governance logic audit — chain detection	A → B → C → A infinite loops in the update rules
Bounded propagation enforcement	Governance logic audit — termination verification	Update rules that lack explicit stop language at loop entry points
Zombie blind spots	End-to-end audit — infrastructure directory coverage	AI infrastructure directories not covered by the registration scan
System coherence at scale	Governance logic audit — forward scalability	Registration patterns that require judgment, unbounded reference documentation growth
Unidirectional flow	Governance logic audit — unidirectional flow	Reference documents cascading back to the instruction layer

Production readiness

The system is production-ready for plug-and-play:

No unresolvable loops. Bounded update propagation prevents infinite chains. Unidirectional flow eliminates circular dependencies between layers. Multi-hop chain detection traces the full update graph for cycles.
No blind spots. Unregistered file detection covers all AI infrastructure directories. Every registered file is verified against its actual existence on disk. Every evaluation workflow's reading list is checked for completeness.
Unidirectional by design. Information flows from instruction layer to reference documentation, never the reverse. Single-hop updates with explicit termination at every bidirectional relationship.
Autonomously self-auditing. Evaluation workflows cover every layer of the shell — project, AI, infrastructure, the evaluation loop itself, and governance logic correctness. The end-to-end audit covers the auditors.
Extensible. Registration patterns make adding new AI infrastructure files mechanical. The module system makes adding new features a flag flip. Skills automate multi-step scaffolding workflows.
Agent-proof. All rules use outcome language, not intent language. Every update rule has an explicit stop condition. Every evaluation dimension produces concrete, verifiable findings — not subjective assessments.

The engineering foundation

The AI governance system sits on top of a production-ready Next.js engineering foundation that provides the full development lifecycle out of the box.

Stack

Layer	Choice
Framework	Next.js 15 (App Router)
Language	TypeScript (strict mode + additional safety flags)
Styling	Tailwind CSS v4
Validation	Zod (environment variables, API boundaries, external data)
Unit / integration tests	Vitest + Testing Library
E2E tests	Playwright
Linting	ESLint v9 (flat config) with accessibility rules
Formatting	Prettier with Tailwind class sorting
Commit enforcement	Commitlint + conventional commits
Git hooks	Husky + lint-staged
Package manager	pnpm (engine-strict)
CI	GitHub Actions
Releases	release-please (automated versioning and changelogs)
Dependency management	Dependabot (weekly, grouped)

Production hardening

Every security and reliability concern is addressed before the first line of product code:

Security headers — CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy applied to every route via next.config.ts
Environment validation — all env vars Zod-validated at startup; direct process.env access blocked by ESLint across all source files
Error handling — a single safe serialisation path for API error responses; returning raw error messages or internal details to clients is structurally prevented
Logging seam — all logging goes through a central logger module, which is the designated swap point for any vendor SDK per project

CI pipeline

Four CI jobs plus PR title validation as a separate workflow — nothing merges without passing every gate:

Job	What it enforces
Quality	typecheck → lint → format check → debug statement and secret pattern scan
Unit Tests	Vitest unit test suite with coverage
Build	Production build (depends on quality + unit tests passing)
E2E	Playwright smoke tests against the production build
PR title	Validates conventional commit format (separate workflow)

Code quality

TypeScript strict mode with noUncheckedIndexedAccess, noImplicitOverride, and forceConsistentCasingInFileNames. ESLint enforces no-explicit-any, no-non-null-assertion, consistent-type-imports, import/order, and accessibility rules at error level. Prettier auto-formats on save. Pre-commit hooks run lint and format checks on staged files. Commit messages are validated against conventional commit format.

SEO baseline

Metadata, OpenGraph, Twitter cards, robots.txt, and sitemap.xml are wired from day one — all driven by a single environment variable (NEXT_PUBLIC_APP_URL). A contextual instruction enforces a per-page metadata checklist on every AI interaction in the app directory.

Optional module system

Pre-built features controlled by boolean flags. When a module is off, its code is statically eliminated from the production bundle — zero bytes shipped, no runtime overhead. When it is on, everything wires up automatically. The pattern scales to any number of modules with no additional build configuration per feature.

How to use it

1Clone the shell into a new project

2Write a product brief describing the project scope and goals

3Run project sync — it reads the brief, interviews for missing sections, and generates a phased execution plan

4Enable modules as needed — disabled modules ship zero bytes

5Build — infrastructure, quality gates, AI governance, and CI are already in place

1Clone

Copy the shell into a new project

2Write brief

Describe the product scope and goals

3Project sync

Run the sync workflow — interview, plan, generate

4Enable modules

Flip flags for any pre-built capabilities you need

5Build

Infrastructure, governance, and CI are already in place

Five steps from clone to a fully equipped, production-ready engineering foundation

On subsequent runs, the project sync reconciles the plan against an updated brief. It is the single lifecycle prompt for taking a project from blank brief to evolving execution plan.

Beyond a copilot

Most AI-assisted development today follows the same model: a general-purpose assistant sitting in a chat window, answering questions and generating code on demand. The developer prompts, the AI responds, the developer reviews and pastes. Every session starts from scratch. Every convention must be re-explained. Every architectural decision must be re-stated. The AI has no memory of what this project is, how it is structured, or what rules it operates under.

That is a copilot. It is useful. It is also fundamentally stateless.

Next Shell is a different model entirely.

The problem with general-purpose assistants

A general-purpose AI assistant knows how to write React components. It does not know that your components must live in named folders with a specific suffix. It does not know that your error responses must pass through a dedicated serialisation function. It does not know that your environment variables are validated through Zod and cannot be accessed directly. It does not know that your imports follow a strict dependency direction, that your tests follow a specific philosophy, or that your SEO metadata has a per-page checklist.

Every time a developer opens a chat and asks for help, they must either:

Re-explain all of this — wasting time, introducing inconsistency, hoping they remember every rule
Skip the explanation — and get code that compiles but violates the project's conventions

Both outcomes degrade quality. The first is slow. The second is silent technical debt.

What structured governance changes

Next Shell eliminates this problem by embedding the rules in the repository itself. The AI does not need to be told — it reads the rules automatically when it opens a file.

Capability	General-purpose copilot	Structured governance
Convention awareness	None — starts from generic best practices	Full — every rule is loaded contextually
Rule activation	Manual — developer must prompt every rule	Automatic — rules activate based on the file being edited
Context efficiency	Flat — all rules or no rules	Layered — only relevant rules are loaded
Drift detection	None — conventions erode silently	Built-in — evaluation workflows catch drift
Onboarding cost	High — every session requires re-prompting	Zero — the AI is configured by the repository
Consistency across agents	Impossible — each developer prompts differently	Guaranteed — rules are structural, not conversational
Self-correction	None — errors compound until caught	Systematic — audit workflows verify correctness

General-purpose copilot

Developer opens chat

↓

Re-explain conventions, rules, patterns

↓

AI generates code

↓

Manual review for convention drift

↓

Session ends — context lost

Structured governance

Developer opens file

↓

Rules load automatically from repository

↓

AI generates convention-compliant code

↓

Evaluation workflows verify correctness

↓

Rules persist in repo — zero re-prompting

Session lifecycle — stateless copilot vs. repository-embedded governance

The structural advantage

The difference is not incremental. A copilot with better prompts is still a copilot — it still depends on the developer to provide context, enforce rules, and catch drift. Structured governance removes the developer from that loop entirely. The rules are not suggestions in a README. They are executable constraints that the AI picks up, follows, and is audited against.

This is the foundation that makes everything else possible. When the AI reliably understands and follows the project's engineering standards without being told, it stops being an assistant and starts being a team member — one that never forgets a convention, never drifts from the architecture, and never needs onboarding.

Summary

Next Shell delivers a production-ready engineering foundation with an embedded AI governance system that is:

Self-maintaining — autonomous evaluation workflows audit every layer, including the evaluation infrastructure itself
Loop-free — unidirectional flow, bounded update propagation, and single-hop design eliminate circular dependencies at the structural level
Scalable — contextual instructions keep the context budget lean; registration patterns make growth mechanical; forward scalability audits verify coherence as the system expands
Agent-proof — outcome language, explicit stop conditions, and verifiable evaluation dimensions ensure any AI agent can operate within the system correctly
Production-ready — security headers, env validation, error handling, logging seams, CI gates, and automated releases are in place before the first line of product code

The system is designed to grow. Every addition follows a defined registration pattern. Every structural guarantee is audited. Every potential failure mode — zombie files, update loops, blind spots, stale documentation — has a detection mechanism. The result is an engineering foundation that maintains its own coherence as it scales, with minimal human intervention.

Curious where this is heading?

Read the roadmap

Layer

What it contains

When it activates

Always-on instructions

Workspace-level coding rules, conventions, constraints

Injected into every AI interaction automatically

Contextual instructions

Specialised rules scoped to file types and directories

Loaded only when the active file matches — context budget stays lean

On-demand workflows

Evaluation workflows, skills, and agent modes

Loaded explicitly when invoked — zero always-on overhead

Reference documentation

Authoritative descriptions of what exists

Describes the system; audited by evaluation workflows

Scope

What it audits

Project layer

Full shell: architecture, conventions, AI setup, documentation accuracy, module system, TypeScript, security, testing, growth readiness

AI layer

Instruction files, prompts, skills, sync drift between reference docs, stale cross-references, contradictions, coverage gaps

Infrastructure layer

CI/CD, env config, release automation, deployment configuration, logging setup

End-to-end

The evaluation infrastructure itself — reading list completeness, update coverage, unregistered file detection, infrastructure directory coverage

Governance logic

Update rule self-reference, multi-hop update chain detection, bounded propagation enforcement, unidirectional flow adherence, authority hierarchy, forward scalability coherence

Property

Status

What it means

Unidirectional flow

Healthy

Reference docs receive updates from the instruction layer but never cascade instructions back

Single update authority

Healthy

Exactly one location enumerates which files need updating when AI infrastructure changes

Bounded propagation

Healthy

Every bidirectional relationship carries explicit stop language

Evaluation loop safety

Healthy

Isolated diagnostic systems, independent publishing, no circular triggers

Reference documentation consistency

Healthy

All three reference documents stay mutually consistent

Registration patterns

Healthy

Complete checklists for each AI infrastructure file type

Sync pair

Healthy

A dedicated sync rule enforces identical content across both workspace-level instruction files

applyTo accuracy

Healthy

All contextual instruction globs match their intended file sets

Zombie detection coverage

Healthy

All AI infrastructure directory types are scanned

Multi-hop chain detection

Healthy

Structural integrity audit traces the full update graph for cycles

Concern

Where it is audited

What it prevents

Bidirectional / multi-hop update chains

Governance logic audit — chain detection

A → B → C → A infinite loops in the update rules

Bounded propagation enforcement

Governance logic audit — termination verification

Update rules that lack explicit stop language at loop entry points

Zombie blind spots

End-to-end audit — infrastructure directory coverage

AI infrastructure directories not covered by the registration scan

System coherence at scale

Governance logic audit — forward scalability

Registration patterns that require judgment, unbounded reference documentation growth

Unidirectional flow

Governance logic audit — unidirectional flow

Reference documents cascading back to the instruction layer

Layer

Choice

Framework

Next.js 15 (App Router)

Language

TypeScript (strict mode + additional safety flags)

Styling

Tailwind CSS v4

Validation

Zod (environment variables, API boundaries, external data)

Unit / integration tests

Vitest + Testing Library

E2E tests

Playwright

Linting

ESLint v9 (flat config) with accessibility rules

Formatting

Prettier with Tailwind class sorting

Commit enforcement

Commitlint + conventional commits

Git hooks

Husky + lint-staged

Package manager

pnpm (engine-strict)

GitHub Actions

Releases

release-please (automated versioning and changelogs)

Dependency management

Dependabot (weekly, grouped)

Job

What it enforces

Quality

typecheck → lint → format check → debug statement and secret pattern scan

Unit Tests

Vitest unit test suite with coverage

Build

Production build (depends on quality + unit tests passing)

E2E

Playwright smoke tests against the production build

PR title

Validates conventional commit format (separate workflow)

Capability

General-purpose copilot

Structured governance

Convention awareness

None — starts from generic best practices

Full — every rule is loaded contextually

Rule activation

Manual — developer must prompt every rule

Automatic — rules activate based on the file being edited

Context efficiency

Flat — all rules or no rules

Layered — only relevant rules are loaded

Drift detection

None — conventions erode silently

Built-in — evaluation workflows catch drift

Onboarding cost

High — every session requires re-prompting

Zero — the AI is configured by the repository

Consistency across agents

Impossible — each developer prompts differently

Guaranteed — rules are structural, not conversational

Self-correction

None — errors compound until caught

Systematic — audit workflows verify correctness