← Back
What's next →
White Paper

Next Shell

An autonomous, self-maintaining AI governance system embedded in a production-ready Next.js engineering foundation.

Table of contents

  1. 01What this is
  2. 02The autonomous AI governance system
  3. 03Protection mechanisms
  4. 04Cognitive load management
  5. 05Extensibility and growth
  6. 06Production readiness
  7. 07The engineering foundation
  8. 08How to use it
  9. 09Beyond a copilot
  10. 10Summary

What this is

Next Shell is not a boilerplate. It is a reusable engineering foundation designed to be cloned as the base for many future products. Every convention, quality gate, and AI governance rule ships with the repository and activates automatically — no setup, no prompting, no onboarding.

What makes it distinct: the AI layer is not a hints file stapled onto a starter template. It is a structured governance system with defined layers, one-directional information flow, invariant protections, cascade termination guarantees, and a self-evaluation loop that audits its own correctness. The system is designed to grow autonomously with minimal to no human intervention while maintaining logical coherence at scale.

The autonomous AI governance system

Architecture

The AI layer is organised into four logical layers. Each layer has a distinct activation model, a defined scope, and a strict dependency direction.

LayerWhat it containsWhen it activates
Always-on instructionsWorkspace-level coding rules, conventions, constraintsInjected into every AI interaction automatically
Contextual instructionsSpecialised rules scoped to file types and directoriesLoaded only when the active file matches — context budget stays lean
On-demand workflowsEvaluation workflows, skills, and agent modesLoaded explicitly when invoked — zero always-on overhead
Reference documentationAuthoritative descriptions of what existsDescribes the system; audited by evaluation workflows

Information flows in one direction only: the instruction layers update the reference documentation, never the reverse. This unidirectional design is the foundation of the system's scalability — it eliminates the class of bugs where updating documentation triggers instruction changes which trigger documentation changes, ad infinitum.

flow
Always-onevery interaction

Core coding rules, conventions, quality constraints — injected into every AI session automatically

Contextualcurrent work only

Relevant rules appear automatically for the files being worked on, keeping guidance focused and avoiding unnecessary context

On-demandexplicit invocation

Evaluation workflows, scaffolding skills, and agent modes — zero always-on context overhead

Referenceaudited by layer above

System descriptions and capability references are kept consistent, clear, and regularly checked for accuracy

Four-layer architecture — each layer activates at a different scope, information flows strictly downward

Contextual instructions keep guidance scoped to the task at hand. When an AI agent works on a component, it receives component rules. When it works on tests, it receives testing rules. The active context stays proportional to the task, not the total rule count — keeping interactions precise and scalable as the governance layer grows.

Governance reference documents

Three reference documents must always be mutually consistent:

  • AI file registry — authoritative description of every AI infrastructure file
  • Capability index — enumeration of every active governance capability
  • Evaluation reading list — the mandatory file set every audit workflow must load

When any AI infrastructure file is added, removed, or renamed, all three are updated in the same change. Any divergence between them is caught the next time an evaluation workflow runs.

Single update authority

One dedicated location is the only place in the entire system that enumerates update targets. When any instruction, prompt, skill, or agent is added, the AI has exactly one location to consult for which files need updating. No scattered rules. No redundant target lists that inevitably drift. When the reference documentation changes structure, only this one location needs modification, and every downstream rule stays correct.

Unidirectional documentation flow

Updates flow in one direction only: the reference documentation receives updates but never sends instructions back to the instruction layer. This is not a convention — it is a structural guarantee. Mutual dependency between documentation and instruction layers would create an unresolvable loop. Making this guarantee structural prevents the entire class of this failure.

Bounded update propagation

When a file changes, every related file is checked for consistency and updated if needed. Propagation is bounded by design — it covers direct dependencies and terminates, never cascading into further rounds of updates. The agent resolves the immediate update targets, synchronises them, and halts. This guarantees that every change converges quickly and no update cycle can run unbounded.

Self-evaluation loop

Evaluation workflows autonomously audit every layer of the shell. They operate as isolated diagnostic systems — assessing correctness without ever modifying codebase files. Each workflow publishes structured findings independently, and the evaluation report always shows the current state of every layer.

The evaluation workflows cover:

ScopeWhat it audits
Project layerFull shell: architecture, conventions, AI setup, documentation accuracy, module system, TypeScript, security, testing, growth readiness
AI layerInstruction files, prompts, skills, sync drift between reference docs, stale cross-references, contradictions, coverage gaps
Infrastructure layerCI/CD, env config, release automation, deployment configuration, logging setup
End-to-endThe evaluation infrastructure itself — reading list completeness, update coverage, unregistered file detection, infrastructure directory coverage
Governance logicUpdate rule self-reference, multi-hop update chain detection, bounded propagation enforcement, unidirectional flow adherence, authority hierarchy, forward scalability coherence
Project

Architecture, conventions, AI setup, modules, TypeScript, security, testing

AI

Instructions, prompts, skills, sync drift, cross-references, coverage gaps

Infra

CI/CD, env config, release automation, deployment, logging

Governance

Update chains, bounded propagation, unidirectional flow, forward scalability

End-to-end

Evaluation infrastructure itself — the auditors are audited

Complete coverage — no layer audits itself without being audited in turn
Five autonomous audit scopes — each covers a distinct layer, including the evaluation infrastructure itself

Loop safety: no evaluation workflow can invoke another. No circular execution is possible. The end-to-end audit reads the other workflows as data — it cannot cause them to run.

Unregistered file detection

Any AI infrastructure file not registered in the reference documentation has no clear ownership and is invisible to evaluation workflows. It can silently give wrong instructions to any agent that loads it. The end-to-end audit scans all AI infrastructure directories and detects unregistered files before they can accumulate into governance debt.

Protection mechanisms

The system's integrity rests on three architectural protections that work together:

↓
Unidirectional flow

Updates propagate from instruction layers down to reference documents — never the reverse. Eliminates circular dependency bugs structurally.

⊢
Bounded propagation

Every update covers its direct dependencies and stops — no chain reactions can form. Convergence is guaranteed and fast.

↻
Self-auditing loop

Isolated diagnostic workflows cover every layer — including the evaluation infrastructure itself. The auditors are audited.

Combined: structural integrity without runtime overhead
Three structural protections — each eliminates a distinct class of system failure

1. Unidirectional flow — Updates flow in one direction only. Reference documents receive updates from the instruction layer but never send instructions back. This eliminates the entire class of circular dependency bugs at the structural level.

2. Bounded update propagation — Even where update rules contain bidirectional relationships, each update covers its direct dependencies and stops. Re-entry is structurally prevented, so A → B → C → A chains cannot become infinite loops.

3. Self-auditing loop — The evaluation workflows operate as isolated diagnostic systems — non-mutating and architecturally unable to trigger each other. They cover every layer of the shell including the evaluation infrastructure itself, ensuring complete coverage and that no AI file exists without being registered in the reference documentation.

Verified system health

PropertyStatusWhat it means
Unidirectional flowHealthyReference docs receive updates from the instruction layer but never cascade instructions back
Single update authorityHealthyExactly one location enumerates which files need updating when AI infrastructure changes
Bounded propagationHealthyEvery bidirectional relationship carries explicit stop language
Evaluation loop safetyHealthyIsolated diagnostic systems, independent publishing, no circular triggers
Reference documentation consistencyHealthyAll three reference documents stay mutually consistent
Registration patternsHealthyComplete checklists for each AI infrastructure file type
Sync pairHealthyA dedicated sync rule enforces identical content across both workspace-level instruction files
applyTo accuracyHealthyAll contextual instruction globs match their intended file sets
Zombie detection coverageHealthyAll AI infrastructure directory types are scanned
Multi-hop chain detectionHealthyStructural integrity audit traces the full update graph for cycles

Cognitive load management

The system is designed to scale without overwhelming the AI's context window or the contributor's mental model.

Always-on

Core coding rules — loaded in every interaction

Contextual

File-specific rules activate based on the active file

On-demand

Workflows and skills loaded only when explicitly invoked

Total context scales with task scope, not total rule count
Context budget stays proportional to the task — only relevant rules are loaded at each tier

Always-on context stays lean. The always-on workspace instructions carry the core coding rules — the essentials that apply everywhere. Everything else is scoped. A soft ceiling is enforced: if the always-on rule count grows without consolidation, a contextual instruction activates to flag it.

Contextual instructions activate only when relevant. Each .instructions.md file declares an applyTo glob. The AI loads it only when the active file matches. Specialised rules for components, pages, tests, SEO, environment variables, and contributing discipline activate at the right moment rather than inflating every interaction.

On-demand workflows add zero always-on overhead. Workflows, skills, and agent modes are loaded only when explicitly invoked. The evaluation workflows, project sync, shell change proposals, PR reviews, and module scaffolding workflows exist in the repository but consume no context budget until called.

Pruning is a first-class concern. When a category of rules grows large enough to warrant its own context, it is extracted into a dedicated contextual instruction file with an appropriate applyTo glob, and the corresponding rules are removed from the always-on file. The system gets more organised as it grows, not more bloated.

Extensibility and growth

Adding AI infrastructure is mechanical, not creative

Every type of AI infrastructure file — instruction, prompt, skill, agent — has a complete registration pattern documented in the AI infrastructure guide. The pattern is a checklist: add the file, update the three reference documents, update the propagation rules if applicable. No judgment calls about where things should be registered. No risk of forgetting a location. An AI agent can follow the pattern without ambiguity.

The module system scales without coupling

Optional features are gated behind boolean flags in a single configuration file using constant literal types. The bundler constant-folds these at build time. Combined with explicit side-effect declarations, disabled module code is fully eliminated from the production bundle — zero bytes shipped, no manual cleanup.

Every file that participates in a module checks its flag with a static guard at the top. No dynamic imports, no conditional requires, no runtime overhead. Adding a new module follows the same mechanical pattern: add a flag, create the files, gate every participant, verify build exclusion.

The self-evaluation loop catches its own degradation

If a new evaluation workflow is added but not registered in the end-to-end audit's reading list, the reference update — which runs on the same change — adds it. If the update is missed, the end-to-end audit detects the gap on its next run. If a new AI infrastructure directory type is introduced, the infrastructure directory coverage dimension flags it.

The system does not rely on contributors remembering to update documentation. It relies on structural guarantees that surface violations mechanically.

Forward scalability

The governance logic audit includes a forward scalability dimension that evaluates whether the governance logic remains coherent as the system grows:

  • Does the update rule set scale linearly without increasing the risk of undetected multi-hop chains?
  • Are registration patterns mechanical enough for AI agents to follow without ambiguity?
  • Is the reference documentation set explicitly bounded, preventing uncontrolled growth?
  • Would the self-evaluation loop detect its own degradation?

These are not retrospective checks. They are forward-looking structural audits that ensure the system's growth path is sound.

Governance rules guarding key audit concerns

The self-evaluation loop explicitly covers the structural threats that matter most as the system scales:

ConcernWhere it is auditedWhat it prevents
Bidirectional / multi-hop update chainsGovernance logic audit — chain detectionA → B → C → A infinite loops in the update rules
Bounded propagation enforcementGovernance logic audit — termination verificationUpdate rules that lack explicit stop language at loop entry points
Zombie blind spotsEnd-to-end audit — infrastructure directory coverageAI infrastructure directories not covered by the registration scan
System coherence at scaleGovernance logic audit — forward scalabilityRegistration patterns that require judgment, unbounded reference documentation growth
Unidirectional flowGovernance logic audit — unidirectional flowReference documents cascading back to the instruction layer

Production readiness

The system is production-ready for plug-and-play:

  • No unresolvable loops. Bounded update propagation prevents infinite chains. Unidirectional flow eliminates circular dependencies between layers. Multi-hop chain detection traces the full update graph for cycles.
  • No blind spots. Unregistered file detection covers all AI infrastructure directories. Every registered file is verified against its actual existence on disk. Every evaluation workflow's reading list is checked for completeness.
  • Unidirectional by design. Information flows from instruction layer to reference documentation, never the reverse. Single-hop updates with explicit termination at every bidirectional relationship.
  • Autonomously self-auditing. Evaluation workflows cover every layer of the shell — project, AI, infrastructure, the evaluation loop itself, and governance logic correctness. The end-to-end audit covers the auditors.
  • Extensible. Registration patterns make adding new AI infrastructure files mechanical. The module system makes adding new features a flag flip. Skills automate multi-step scaffolding workflows.
  • Agent-proof. All rules use outcome language, not intent language. Every update rule has an explicit stop condition. Every evaluation dimension produces concrete, verifiable findings — not subjective assessments.

The engineering foundation

The AI governance system sits on top of a production-ready Next.js engineering foundation that provides the full development lifecycle out of the box.

Stack

LayerChoice
FrameworkNext.js 15 (App Router)
LanguageTypeScript (strict mode + additional safety flags)
StylingTailwind CSS v4
ValidationZod (environment variables, API boundaries, external data)
Unit / integration testsVitest + Testing Library
E2E testsPlaywright
LintingESLint v9 (flat config) with accessibility rules
FormattingPrettier with Tailwind class sorting
Commit enforcementCommitlint + conventional commits
Git hooksHusky + lint-staged
Package managerpnpm (engine-strict)
CIGitHub Actions
Releasesrelease-please (automated versioning and changelogs)
Dependency managementDependabot (weekly, grouped)

Production hardening

Every security and reliability concern is addressed before the first line of product code:

  • Security headers — CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy applied to every route via next.config.ts
  • Environment validation — all env vars Zod-validated at startup; direct process.env access blocked by ESLint across all source files
  • Error handling — a single safe serialisation path for API error responses; returning raw error messages or internal details to clients is structurally prevented
  • Logging seam — all logging goes through a central logger module, which is the designated swap point for any vendor SDK per project

CI pipeline

Four CI jobs plus PR title validation as a separate workflow — nothing merges without passing every gate:

JobWhat it enforces
Qualitytypecheck → lint → format check → debug statement and secret pattern scan
Unit TestsVitest unit test suite with coverage
BuildProduction build (depends on quality + unit tests passing)
E2EPlaywright smoke tests against the production build
PR titleValidates conventional commit format (separate workflow)

Code quality

TypeScript strict mode with noUncheckedIndexedAccess, noImplicitOverride, and forceConsistentCasingInFileNames. ESLint enforces no-explicit-any, no-non-null-assertion, consistent-type-imports, import/order, and accessibility rules at error level. Prettier auto-formats on save. Pre-commit hooks run lint and format checks on staged files. Commit messages are validated against conventional commit format.

SEO baseline

Metadata, OpenGraph, Twitter cards, robots.txt, and sitemap.xml are wired from day one — all driven by a single environment variable (NEXT_PUBLIC_APP_URL). A contextual instruction enforces a per-page metadata checklist on every AI interaction in the app directory.

Optional module system

Pre-built features controlled by boolean flags. When a module is off, its code is statically eliminated from the production bundle — zero bytes shipped, no runtime overhead. When it is on, everything wires up automatically. The pattern scales to any number of modules with no additional build configuration per feature.

How to use it

  1. 1Clone the shell into a new project
  2. 2Write a product brief describing the project scope and goals
  3. 3Run project sync — it reads the brief, interviews for missing sections, and generates a phased execution plan
  4. 4Enable modules as needed — disabled modules ship zero bytes
  5. 5Build — infrastructure, quality gates, AI governance, and CI are already in place
1Clone

Copy the shell into a new project

↓→
2Write brief

Describe the product scope and goals

↓→
3Project sync

Run the sync workflow — interview, plan, generate

↓→
4Enable modules

Flip flags for any pre-built capabilities you need

↓→
5Build

Infrastructure, governance, and CI are already in place

Five steps from clone to a fully equipped, production-ready engineering foundation

On subsequent runs, the project sync reconciles the plan against an updated brief. It is the single lifecycle prompt for taking a project from blank brief to evolving execution plan.

Beyond a copilot

Most AI-assisted development today follows the same model: a general-purpose assistant sitting in a chat window, answering questions and generating code on demand. The developer prompts, the AI responds, the developer reviews and pastes. Every session starts from scratch. Every convention must be re-explained. Every architectural decision must be re-stated. The AI has no memory of what this project is, how it is structured, or what rules it operates under.

That is a copilot. It is useful. It is also fundamentally stateless.

Next Shell is a different model entirely.

The problem with general-purpose assistants

A general-purpose AI assistant knows how to write React components. It does not know that your components must live in named folders with a specific suffix. It does not know that your error responses must pass through a dedicated serialisation function. It does not know that your environment variables are validated through Zod and cannot be accessed directly. It does not know that your imports follow a strict dependency direction, that your tests follow a specific philosophy, or that your SEO metadata has a per-page checklist.

Every time a developer opens a chat and asks for help, they must either:

  • Re-explain all of this — wasting time, introducing inconsistency, hoping they remember every rule
  • Skip the explanation — and get code that compiles but violates the project's conventions

Both outcomes degrade quality. The first is slow. The second is silent technical debt.

What structured governance changes

Next Shell eliminates this problem by embedding the rules in the repository itself. The AI does not need to be told — it reads the rules automatically when it opens a file.

CapabilityGeneral-purpose copilotStructured governance
Convention awarenessNone — starts from generic best practicesFull — every rule is loaded contextually
Rule activationManual — developer must prompt every ruleAutomatic — rules activate based on the file being edited
Context efficiencyFlat — all rules or no rulesLayered — only relevant rules are loaded
Drift detectionNone — conventions erode silentlyBuilt-in — evaluation workflows catch drift
Onboarding costHigh — every session requires re-promptingZero — the AI is configured by the repository
Consistency across agentsImpossible — each developer prompts differentlyGuaranteed — rules are structural, not conversational
Self-correctionNone — errors compound until caughtSystematic — audit workflows verify correctness

General-purpose copilot

Developer opens chat
↓
Re-explain conventions, rules, patterns
↓
AI generates code
↓
Manual review for convention drift
↓
Session ends — context lost

Structured governance

Developer opens file
↓
Rules load automatically from repository
↓
AI generates convention-compliant code
↓
Evaluation workflows verify correctness
↓
Rules persist in repo — zero re-prompting
Session lifecycle — stateless copilot vs. repository-embedded governance

The structural advantage

The difference is not incremental. A copilot with better prompts is still a copilot — it still depends on the developer to provide context, enforce rules, and catch drift. Structured governance removes the developer from that loop entirely. The rules are not suggestions in a README. They are executable constraints that the AI picks up, follows, and is audited against.

This is the foundation that makes everything else possible. When the AI reliably understands and follows the project's engineering standards without being told, it stops being an assistant and starts being a team member — one that never forgets a convention, never drifts from the architecture, and never needs onboarding.

Summary

Next Shell delivers a production-ready engineering foundation with an embedded AI governance system that is:

  • Self-maintaining — autonomous evaluation workflows audit every layer, including the evaluation infrastructure itself
  • Loop-free — unidirectional flow, bounded update propagation, and single-hop design eliminate circular dependencies at the structural level
  • Scalable — contextual instructions keep the context budget lean; registration patterns make growth mechanical; forward scalability audits verify coherence as the system expands
  • Agent-proof — outcome language, explicit stop conditions, and verifiable evaluation dimensions ensure any AI agent can operate within the system correctly
  • Production-ready — security headers, env validation, error handling, logging seams, CI gates, and automated releases are in place before the first line of product code

The system is designed to grow. Every addition follows a defined registration pattern. Every structural guarantee is audited. Every potential failure mode — zombie files, update loops, blind spots, stale documentation — has a detection mechanism. The result is an engineering foundation that maintains its own coherence as it scales, with minimal human intervention.

Curious where this is heading?

Read the roadmap →
← Home