← Back
What's next →
White Paper

Next Shell

An autonomous, self-maintaining AI governance system embedded in a production-ready Next.js engineering foundation.

Table of contents

  1. 01What this is
  2. 02The autonomous AI governance system
  3. 03Protection mechanisms
  4. 04Cognitive load management
  5. 05Extensibility and growth
  6. 06Production readiness
  7. 07The engineering foundation
  8. 08How to use it
  9. 09Beyond a copilot
  10. 10Summary

What this is

Next Shell is not a boilerplate. It is a reusable engineering foundation designed to be cloned as the base for many future products. Every convention, quality gate, and AI governance rule ships with the repository and activates automatically — no setup, no prompting, no onboarding.

What makes it distinct: the AI layer is not a hints file stapled onto a starter template. It is a structured governance system with defined layers, one-directional information flow, invariant protections, cascade termination guarantees, and a self-evaluation loop that audits its own correctness. The system is designed to grow autonomously with minimal to no human intervention while maintaining logical coherence at scale.

The autonomous AI governance system

Architecture

The AI layer is organised into four logical layers. Each layer has a distinct activation model, a defined scope, and a strict dependency direction.

LayerWhat it containsWhen it activates
Always-on instructionsWorkspace-level coding rules, conventions, constraintsInjected into every AI interaction automatically
Contextual instructionsSpecialised rules scoped to file types and directoriesLoaded only when the active file matches — context budget stays lean
On-demand workflowsEvaluation workflows, skills, and agent modesLoaded explicitly when invoked — zero always-on overhead
Reference documentationAuthoritative descriptions of what existsDescribes the system; audited by evaluation workflows

Information flows in one direction only: the instruction layers update the reference documentation, never the reverse. This unidirectional design is the foundation of the system's scalability — it eliminates the class of bugs where updating documentation triggers instruction changes which trigger documentation changes, ad infinitum.

Governance reference documents

Three reference documents must always be mutually consistent:

  • AI file registry — authoritative description of every AI infrastructure file
  • Capability index — enumeration of every active governance capability
  • Evaluation reading list — the mandatory file set every audit workflow must load

When any AI infrastructure file is added, removed, or renamed, all three are updated in the same change. Any divergence between them is caught the next time an evaluation workflow runs.

Single update authority

One dedicated location is the only place in the entire system that enumerates update targets. When any instruction, prompt, skill, or agent is added, the AI has exactly one location to consult for which files need updating. No scattered rules. No redundant target lists that inevitably drift. When the reference documentation changes structure, only this one location needs modification, and every downstream rule stays correct.

Unidirectional documentation flow

Updates flow in one direction only: the reference documentation receives updates but never sends instructions back to the instruction layer. This is not a convention — it is a structural guarantee. Mutual dependency between documentation and instruction layers would create an unresolvable loop. Making this guarantee structural prevents the entire class of this failure.

Bounded update propagation

When a file changes, every related file is checked for consistency and updated if needed. Propagation is bounded by design — it covers direct dependencies and terminates, never cascading into further rounds of updates. This guarantees that every change converges quickly and no update cycle can run unbounded.

Self-evaluation loop

Evaluation workflows autonomously audit every layer of the shell. They operate as isolated diagnostic systems — assessing correctness across defined dimensions without ever modifying codebase files. Each workflow publishes structured diagnostic findings to a dedicated section in the evaluation report, with section ownership, meaning each workflow replaces only its own section and the report always shows when each layer was last evaluated.

The evaluation workflows cover:

ScopeWhat it audits
Project layerFull shell: architecture, conventions, AI setup, documentation accuracy, module system, TypeScript, security, testing, growth readiness
AI layerInstruction files, prompts, skills, sync drift between reference docs, stale cross-references, contradictions, coverage gaps
Infrastructure layerCI/CD, env config, release automation, deployment configuration, logging setup
End-to-endThe evaluation infrastructure itself — reading list completeness, update coverage, unregistered file detection, infrastructure directory coverage
Governance logicUpdate rule self-reference, multi-hop update chain detection, bounded propagation enforcement, unidirectional flow adherence, authority hierarchy, forward scalability coherence

Loop safety: no evaluation workflow can invoke another. No circular execution is possible. The end-to-end audit reads the other workflows as data — it cannot cause them to run.

Unregistered file detection

Any AI infrastructure file not registered in the reference documentation has no clear ownership and is invisible to evaluation workflows. It can silently give wrong instructions to any agent that loads it. The end-to-end audit scans all AI infrastructure directories and detects unregistered files before they can accumulate into governance debt.

Protection mechanisms

The system's integrity rests on three architectural protections that work together:

1. Unidirectional flow — Updates flow in one direction only. Reference documents receive updates from the instruction layer but never send instructions back. This eliminates the entire class of circular dependency bugs at the structural level.

2. Bounded update propagation — Even where update rules contain bidirectional relationships, updates are single-hop by design. An agent follows the rule for the file it changed, updates the listed targets, and stops. It never re-enters the update rules from a target it just updated. This prevents A → B → C → A chains from becoming infinite loops.

3. Self-auditing loop — The evaluation workflows operate as isolated diagnostic systems — section-owned, non-mutating, and architecturally unable to trigger each other. They cover every layer of the shell including the evaluation infrastructure itself. The end-to-end audit covers the auditors — verifying that every evaluation workflow covers every file relevant to its scope and that no AI file exists without being registered in the reference documentation.

Verified system health

PropertyStatusWhat it means
Unidirectional flowHealthyReference docs receive updates from the instruction layer but never cascade instructions back
Single update authorityHealthyExactly one location enumerates which files need updating when AI infrastructure changes
Bounded propagationHealthyEvery bidirectional relationship carries explicit stop language
Evaluation loop safetyHealthyIsolated diagnostic systems, section ownership, no circular triggers
Reference documentation consistencyHealthyAll three reference documents stay mutually consistent
Registration patternsHealthyComplete checklists for each AI infrastructure file type
Sync pairHealthyA dedicated sync rule enforces identical content across both workspace-level instruction files
applyTo accuracyHealthyAll contextual instruction globs match their intended file sets
Zombie detection coverageHealthyAll AI infrastructure directory types are scanned
Multi-hop chain detectionHealthyStructural integrity audit traces the full update graph for cycles

Cognitive load management

The system is designed to scale without overwhelming the AI's context window or the contributor's mental model.

Always-on context stays lean. The always-on workspace instructions carry the core coding rules — the essentials that apply everywhere. Everything else is scoped. A soft ceiling is enforced: if the always-on rule count grows without consolidation, a contextual instruction activates to flag it.

Contextual instructions activate only when relevant. Each .instructions.md file declares an applyTo glob. The AI loads it only when the active file matches. Specialised rules for components, pages, tests, SEO, environment variables, and contributing discipline activate at the right moment rather than inflating every interaction.

On-demand workflows add zero always-on overhead. Workflows, skills, and agent modes are loaded only when explicitly invoked. The evaluation workflows, project sync, shell change proposals, PR reviews, and module scaffolding workflows exist in the repository but consume no context budget until called.

Pruning is a first-class concern. When a category of rules grows large enough to warrant its own context, it is extracted into a dedicated contextual instruction file with an appropriate applyTo glob, and the corresponding rules are removed from the always-on file. The system gets more organised as it grows, not more bloated.

Extensibility and growth

Adding AI infrastructure is mechanical, not creative

Every type of AI infrastructure file — instruction, prompt, skill, agent — has a complete registration pattern documented in the AI infrastructure guide. The pattern is a checklist: add the file, update the three reference documents, update the propagation rules if applicable. No judgment calls about where things should be registered. No risk of forgetting a location. An AI agent can follow the pattern without ambiguity.

The module system scales without coupling

Optional features are gated behind boolean flags in a single configuration file using constant literal types. The bundler constant-folds these at build time. Combined with explicit side-effect declarations, disabled module code is fully eliminated from the production bundle — zero bytes shipped, no manual cleanup.

Every file that participates in a module checks its flag with a static guard at the top. No dynamic imports, no conditional requires, no runtime overhead. Adding a new module follows the same mechanical pattern: add a flag, create the files, gate every participant, verify build exclusion.

The self-evaluation loop catches its own degradation

If a new evaluation workflow is added but not registered in the end-to-end audit's reading list, the reference update — which runs on the same change — adds it. If the update is missed, the end-to-end audit detects the gap on its next run. If a new AI infrastructure directory type is introduced, the infrastructure directory coverage dimension flags it.

The system does not rely on contributors remembering to update documentation. It relies on structural guarantees that surface violations mechanically.

Forward scalability

The governance logic audit includes a forward scalability dimension that evaluates whether the governance logic remains coherent as the system grows:

  • Does the update rule set scale linearly without increasing the risk of undetected multi-hop chains?
  • Are registration patterns mechanical enough for AI agents to follow without ambiguity?
  • Is the reference documentation set explicitly bounded, preventing uncontrolled growth?
  • Would the self-evaluation loop detect its own degradation?

These are not retrospective checks. They are forward-looking structural audits that ensure the system's growth path is sound.

Governance rules guarding key audit concerns

The self-evaluation loop explicitly covers the structural threats that matter most as the system scales:

ConcernWhere it is auditedWhat it prevents
Bidirectional / multi-hop update chainsGovernance logic audit — chain detectionA → B → C → A infinite loops in the update rules
Bounded propagation enforcementGovernance logic audit — termination verificationUpdate rules that lack explicit stop language at loop entry points
Zombie blind spotsEnd-to-end audit — infrastructure directory coverageAI infrastructure directories not covered by the registration scan
System coherence at scaleGovernance logic audit — forward scalabilityRegistration patterns that require judgment, unbounded reference documentation growth
Unidirectional flowGovernance logic audit — unidirectional flowReference documents cascading back to the instruction layer

Production readiness

The system is production-ready for plug-and-play:

  • No unresolvable loops. Bounded update propagation prevents infinite chains. Unidirectional flow eliminates circular dependencies between layers. Multi-hop chain detection traces the full update graph for cycles.
  • No blind spots. Unregistered file detection covers all AI infrastructure directories. Every registered file is verified against its actual existence on disk. Every evaluation workflow's reading list is checked for completeness.
  • Unidirectional by design. Information flows from instruction layer to reference documentation, never the reverse. Single-hop updates with explicit termination at every bidirectional relationship.
  • Autonomously self-auditing. Evaluation workflows cover every layer of the shell — project, AI, infrastructure, the evaluation loop itself, and governance logic correctness. The end-to-end audit covers the auditors.
  • Extensible. Registration patterns make adding new AI infrastructure files mechanical. The module system makes adding new features a flag flip. Skills automate multi-step scaffolding workflows.
  • Agent-proof. All rules use outcome language, not intent language. Every update rule has an explicit stop condition. Every evaluation dimension produces concrete, verifiable findings — not subjective assessments.

The engineering foundation

The AI governance system sits on top of a production-ready Next.js engineering foundation that provides the full development lifecycle out of the box.

Stack

LayerChoice
FrameworkNext.js 15 (App Router)
LanguageTypeScript (strict mode + additional safety flags)
StylingTailwind CSS v4
ValidationZod (environment variables, API boundaries, external data)
Unit / integration testsVitest + Testing Library
E2E testsPlaywright
LintingESLint v9 (flat config) with accessibility rules
FormattingPrettier with Tailwind class sorting
Commit enforcementCommitlint + conventional commits
Git hooksHusky + lint-staged
Package managerpnpm (engine-strict)
CIGitHub Actions
Releasesrelease-please (automated versioning and changelogs)
Dependency managementDependabot (weekly, grouped)

Production hardening

Every security and reliability concern is addressed before the first line of product code:

  • Security headers — CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy applied to every route via next.config.ts
  • Environment validation — all env vars Zod-validated at startup; direct process.env access blocked by ESLint across all source files
  • Error handling — a single safe serialisation path for API error responses; returning raw error messages or internal details to clients is structurally prevented
  • Logging seam — all logging goes through a central logger module, which is the designated swap point for any vendor SDK per project

CI pipeline

Four CI jobs plus PR title validation as a separate workflow — nothing merges without passing every gate:

JobWhat it enforces
Qualitytypecheck → lint → format check → debug statement and secret pattern scan
Unit TestsVitest unit test suite with coverage
BuildProduction build (depends on quality + unit tests passing)
E2EPlaywright smoke tests against the production build
PR titleValidates conventional commit format (separate workflow)

Code quality

TypeScript strict mode with noUncheckedIndexedAccess, noImplicitOverride, and forceConsistentCasingInFileNames. ESLint enforces no-explicit-any, no-non-null-assertion, consistent-type-imports, import/order, and accessibility rules at error level. Prettier auto-formats on save. Pre-commit hooks run lint and format checks on staged files. Commit messages are validated against conventional commit format.

SEO baseline

Metadata, OpenGraph, Twitter cards, robots.txt, and sitemap.xml are wired from day one — all driven by a single environment variable (NEXT_PUBLIC_APP_URL). A contextual instruction enforces a per-page metadata checklist on every AI interaction in the app directory.

Optional module system

Pre-built features controlled by boolean flags. When a module is off, it is automatically excluded from the production bundle — zero bytes shipped. When it is on, everything wires up automatically. The pattern scales to any number of modules with no additional configuration per feature.

How to use it

  1. 1Clone the shell into a new project
  2. 2Fill PROJECT.md with the product brief
  3. 3Run /helper.sync-project — it reads the brief, interviews for missing sections, and generates a phased execution plan
  4. 4Enable modules as needed — disabled modules ship zero bytes
  5. 5Build — infrastructure, quality gates, AI governance, and CI are already in place

On subsequent runs, /helper.sync-project reconciles the plan against an updated brief. It is the single lifecycle prompt for taking a project from blank brief to evolving execution plan.

Beyond a copilot

Most AI-assisted development today follows the same model: a general-purpose assistant sitting in a chat window, answering questions and generating code on demand. The developer prompts, the AI responds, the developer reviews and pastes. Every session starts from scratch. Every convention must be re-explained. Every architectural decision must be re-stated. The AI has no memory of what this project is, how it is structured, or what rules it operates under.

That is a copilot. It is useful. It is also fundamentally stateless.

Next Shell is a different model entirely.

The problem with general-purpose assistants

A general-purpose AI assistant knows how to write React components. It does not know that your components must live in named folders with a specific suffix. It does not know that your error responses must pass through a dedicated serialisation function. It does not know that your environment variables are validated through Zod and cannot be accessed directly. It does not know that your imports follow a strict dependency direction, that your tests follow a specific philosophy, or that your SEO metadata has a per-page checklist.

Every time a developer opens a chat and asks for help, they must either:

  • Re-explain all of this — wasting time, introducing inconsistency, hoping they remember every rule
  • Skip the explanation — and get code that compiles but violates the project's conventions

Both outcomes degrade quality. The first is slow. The second is silent technical debt.

What structured governance changes

Next Shell eliminates this problem by embedding the rules in the repository itself. The AI does not need to be told — it reads the rules automatically when it opens a file.

CapabilityGeneral-purpose copilotStructured governance
Convention awarenessNone — starts from generic best practicesFull — every rule is loaded contextually
Rule activationManual — developer must prompt every ruleAutomatic — rules activate based on the file being edited
Context efficiencyFlat — all rules or no rulesLayered — only relevant rules are loaded
Drift detectionNone — conventions erode silentlyBuilt-in — evaluation workflows catch drift
Onboarding costHigh — every session requires re-promptingZero — the AI is configured by the repository
Consistency across agentsImpossible — each developer prompts differentlyGuaranteed — rules are structural, not conversational
Self-correctionNone — errors compound until caughtSystematic — audit workflows verify correctness

The structural advantage

The difference is not incremental. A copilot with better prompts is still a copilot — it still depends on the developer to provide context, enforce rules, and catch drift. Structured governance removes the developer from that loop entirely. The rules are not suggestions in a README. They are executable constraints that the AI picks up, follows, and is audited against.

This is the foundation that makes everything else possible. When the AI reliably understands and follows the project's engineering standards without being told, it stops being an assistant and starts being a team member — one that never forgets a convention, never drifts from the architecture, and never needs onboarding.

Summary

Next Shell delivers a production-ready engineering foundation with an embedded AI governance system that is:

  • Self-maintaining — autonomous evaluation workflows audit every layer, including the evaluation infrastructure itself
  • Loop-free — unidirectional flow, bounded update propagation, and single-hop design eliminate circular dependencies at the structural level
  • Scalable — contextual instructions keep the context budget lean; registration patterns make growth mechanical; forward scalability audits verify coherence as the system expands
  • Agent-proof — outcome language, explicit stop conditions, and verifiable evaluation dimensions ensure any AI agent can operate within the system correctly
  • Production-ready — security headers, env validation, error handling, logging seams, CI gates, and automated releases are in place before the first line of product code

The system is designed to grow. Every addition follows a defined registration pattern. Every structural guarantee is audited. Every potential failure mode — zombie files, update loops, blind spots, stale documentation — has a detection mechanism. The result is an engineering foundation that maintains its own coherence as it scales, with minimal human intervention.

Curious where this is heading?

Read the roadmap →
← Home