AI‑Augmented Architecture Option Exploration

Introduction: what AI changes (and what it doesn’t)

AI changes the cost of exploring options. It does not change who is accountable for the decision.

If you’re the architect, tech lead, or staff engineer, “AI‑assisted” is not a new accountability model; it’s a faster way to:

Surface plausible architectures you might not have considered.
Make tradeoffs explicit earlier (and with less friction).
Leave a stronger evidence trail for future teams.

But it only works if you treat AI as a generator and critic, not an authority. The workflow below is designed to be:

Tool‑agnostic (no vendor lock‑in).
Enterprise‑safe (no secrets in prompts).
Compatible with real constraints (compliance, integration, uptime, budgets).

TL;DR: quick start

If you only do three things:

Assemble a Context Pack (constraints + current state + decision criteria).
Generate 3–6 concrete options (at least one “boring” option), each with risks and a 2‑week validation experiment.
Choose deliberately using a tradeoff matrix, then capture the key decisions in one or more ADRs with measurable fitness functions.

The full flow is: Context Pack → clarify → drivers → architecture characteristics → options + risks → tradeoffs + mitigations → transition architectures → ADRs + fitness/enforcement + feedback.

Scenario: modernizing policy/quote pricing

You’ve inherited a policy/quote pricing capability that grew over a decade:

A monolith handles quoting, rating, underwriting rules, discounts, and taxes.
Multiple channels depend on it: call center, partner portal, direct‑to‑consumer.
Release cadence is slow, and pricing changes require “big bang” deployments.
The business wants faster experimentation (A/B tests, segmented offers) without breaking regulatory guardrails.

Constraints (the ones that always show up in the real world):

Regulatory/compliance: you must be able to explain how a price was calculated.
Auditability: reproduce a quote months later.
Availability: quoting is revenue critical; you need graceful degradation.
Integration: downstream policy issuance and billing expect current interfaces.
Performance: partners time out; call center productivity suffers.
Scalability: quote/policy workloads spike (campaigns, renewals, partner traffic) and must scale predictably.
Data sensitivity: prompts cannot include PII, secrets, or proprietary pricing tables.

You want to explore multiple viable target architectures (not just “microservices!”), choose deliberately, and write down why.

The “Context Pack”: minimum artifacts before asking AI for options

Option exploration fails when the prompt is missing the constraints that matter.

Before you ask AI to generate architectures, assemble a lightweight Context Pack:

Domain + goals (success, non‑goals).
Constraints (regulatory, SLOs, performance budgets, data handling).
Enterprise decision constraints (approved stacks, platform rules, security model).
Current state (simple diagrams, critical flows, known pain points).
Decision criteria (what decides, and how you’ll measure it).

If you want a copy/paste template, see the Appendix: Context Pack checklist.

Stop conditions: don’t proceed without these

You cannot name the top 3 constraints that would invalidate an option.
You cannot describe the most important runtime failure mode.
You cannot explain what “auditability” means for this domain (what must be reproducible).

If any stop condition triggers, do not ask for architectures yet. Ask for missing artifacts.

The 6‑stage workflow

This is the workflow I use when I want AI to accelerate thinking without replacing judgment.

Stage 1 — Clarify (make constraints explicit)

Goal: turn fuzzy intent into a crisp problem statement and a bounded search space.

Inputs: your Context Pack.

Prompts that work well (tool‑agnostic and safe):

“Summarize the problem in 5 bullets, then list 10 clarifying questions that would change the architecture choice.”
“Restate the constraints as testable statements (e.g., ‘must reproduce quote for 7 years’).”

Outputs:

A validated problem statement.
A list of constraints you can point at in a decision review.

Guardrail: do not accept new constraints invented by the model. Every constraint must map to a known source (policy, stakeholder, incident, contract).

Stage 2 — From business drivers to architecture characteristics

Goal: use the key business drivers to reveal the driving architecture characteristics (explicit) and the implicit ones that will still constrain you.

This stage is where you turn “the business wants faster experimentation” into qualities you can actually design and test against.

Prompts that work well:

“Extract the key business drivers from the Context Pack, then derive the driving architecture characteristics that follow from them.”
“List implicit architecture characteristics that are not stated, but must be true for this domain (e.g., auditability, determinism, explainability). Mark each as: given vs inferred, and state the evidence.”
“Rewrite each characteristic as a testable statement (SLO, retention rule, determinism requirement, operability constraint).”

Outputs:

A short list of key business drivers.
A ranked list of architecture characteristics split into driving (decision makers) vs implicit (still mandatory).
A first‑pass measurement idea for each characteristic (even if it’s approximate).

Note: if you previously used “transition complexity” as a criterion, it’s usually better expressed as the architecture characteristic evolvability (how safely and incrementally the system can change over time).

Guardrail: if a characteristic is inferred, label it as inferred and require an evidence link (regulation, contract, incident history, stakeholder statement).

Stage 3 — Generate permutations (candidate styles → options → isomorphism → risks)

Goal: take the driving + implicit architecture characteristics and deliberately create optionality around candidate architecture styles that fit the problem domain.

Select candidate architecture styles (or combinations)

Examples (pick what actually fits your domain and org):

Modular monolith + façade (strangler)
Layered / hexagonal with a rules boundary
Domain services (coarse‑grained services, not “microservices by default”)
Event‑driven / async integration for decoupling
Event sourcing or append‑only audit log for replay/audit

Create 3–6 concrete options

For each candidate style (or combination), ask for an option that includes:

Core components and responsibilities
Data ownership and audit/replay story
Failure modes and degradation rules
A first‑pass transition hypothesis from current state (incremental steps and major dependencies)

Apply domain‑to‑architecture isomorphism

For each option, validate that the architecture shape matches the domain shape:

Where do underwriting rules, pricing/rating, discounts, taxes, and approvals live?
What is the “unit of audit” (quote, rating run, rule set version), and how is it reproduced?

Align across the intersections (constraints that often kill options)

For each option, do a quick pass across architecture intersections that add real‑world constraints:

Implementation (libraries, language/runtime constraints)
Infrastructure (platform, runtime, deployment, scaling)
Data topologies (stores, retention, lineage)
Engineering practices (testing strategy, release controls)
Team topologies (ownership boundaries, on‑call reality)
Systems integration (contracts, sync/async, backwards compatibility)
Business environment (regulatory approvals, audit workflows)
Enterprise alignment (standards, shared platforms)
Generative AI (what context is needed, what must not leave the boundary)

Where permitted, MCP Servers can be a practical way to safely access internal enterprise and business‑unit decision documents (standards, policies, platform guidance) as you evaluate these intersections. This helps you distinguish “violates a hard constraint” from “needs an exception” early, before you invest in deeper analysis.

Perform a first‑pass risk analysis per option

Have AI enumerate:

Top risks (technical, operational, organizational)
Assumptions that must be true
The cheapest experiment that would validate/invalidate the option (≤ 2 weeks)

Stage 4 — Explore tradeoffs (scoring + risk comparison + mitigations)

Goal: make the decision legible. You’re not optimizing for a perfect score; you’re optimizing for a decision you can defend.

Tradeoff matrix template (headings)

Criterion	Weight (1–5)	Notes / evidence
Auditability / replay	5	What must be reproducible?
Performance (p95)	4	Partner timeout budget; throughput assumptions
Scalability	4	Peak load, bursts, and predictable degradation
Evolvability	4	Incremental change, parallel run, cutovers, reversibility, “stop‑anytime” states
Operational complexity	4	On‑call, deployments, runbooks
Change velocity	3	Time to ship pricing change
Cost to operate	3	Infra + people
Blast radius	4	Failure containment
Integration risk	4	Contracts, downstream dependencies
Security / data handling	5	PII boundaries

Scoring rubric (keep it honest)

Use a simple qualitative scale, but always include “why”:

5 = clearly meets the need with margin; known patterns; low uncertainty
3 = plausible but requires work or introduces notable risk
1 = likely fails the requirement or creates unacceptable risk

AI can help here by forcing completeness:

“Fill out the tradeoff table. For each score, write the strongest argument against that score (steelman the critique).”

Risk comparison and mitigation prompts that work well:

“Compare the risk profiles across options. Which risks are unique vs shared, and which are existential?”
“For the top 3 risks in each option, propose mitigations and the earliest point in the migration where each mitigation must exist.”

This stage is where many “clever” architectures die in a good way.

A worked mini‑example (abridged)

To make the workflow more concrete, here’s what “3 options + a tradeoff slice” can look like.

Option A — Modular monolith + façade (strangler)

Keep a single deployable for now, but carve out a pricing boundary (clear module APIs).
Add a façade in front of the monolith to create a seam for incremental replacement.
Add an append‑only audit log of “pricing runs” (inputs + rule version + outputs).

Option B — Domain services (coarse‑grained)

Split into a small number of services aligned to stable domain boundaries (e.g., pricing, quote orchestration).
Use contract tests and versioned APIs to protect channels and downstream systems.
Centralize auditability via a shared audit store or per‑service append‑only logs.

Option C — Event‑driven integration + audit log

Keep synchronous quote calls where latency demands it, but decouple downstream effects via events.
Use an append‑only audit log for replay/reproduction.
Avoid “event sourcing everywhere” unless the org is ready for the operational complexity.

Tradeoff matrix (5‑criteria slice):

Criterion	Weight (1–5)	A	B	C	Notes / evidence
Auditability / replay	5	4	4	5	C is strongest if audit log is first‑class
Performance (p95)	4	5	4	4	A is simplest path to hit latency budgets
Evolvability	4	4	5	3	B enables parallel evolution; C adds more moving parts
Operational complexity	4	3	3	1	C risks overload without on‑call maturity
Integration risk	4	4	3	3	A preserves existing contracts; B/C require more interface work

The goal is not to “win the table”. The goal is to make the decisive tradeoffs legible and testable.

Stage 5 — Explore transition architectures (current → target, incrementally)

Goal: now that you have a current state and a target direction, explore incremental transition architectures (stepping‑stone states) that keep the system evolvable and safe at every stage.

This is distinct from picking a target. It’s about proving you can get there without a big bang, and that each intermediate state is operationally viable if you need to pause (a “stop‑anytime” state).

Transition architecture prompts that work well:

“Given the current architecture and the chosen target direction, propose 3 incremental transition paths. Each path must define stepping‑stone architectures, cutover points, and rollback strategy.”
“For each transition step, list: what changes, what stays stable, how contracts remain compatible, and what the system looks like if you stop here for 6 months.”
“Identify the key seams for strangling (facades, anti‑corruption layers, event capture, parallel run) and the fitness functions that must hold at each step.”

Outputs:

2–3 viable transition architectures (stepping‑stone states), not just a single migration plan.
The “stop‑anytime” architecture for each step (so the design stays open to evolution).
A small set of per‑step fitness functions (what must not regress while you transition).

Stage 6 — Hone in on the target architecture (NFRs, fitness/enforcement, ADRs, feedback)

Goal: converge on a target architecture direction, then capture the decision and the quality gates that will keep it true over time.

ADRs: expect more than one

Option exploration typically produces multiple ADRs. At minimum you’ll usually want:

One ADR for the target architecture direction.
One ADR for the transition strategy (stepping‑stone states, cutovers, rollback).
(Often) one ADR per cross‑cutting decision that changes constraints (data/audit approach, integration patterns, deployment model, etc.).

ADR outline (tailored to option exploration)

Title: Decision on pricing architecture direction (Option X)

Status: Proposed / Accepted / Superseded
Context: key constraints, drivers, and what forced the decision
Decision: what we chose and the boundaries (what’s in/out)
Options considered: short summaries of the top alternatives
Tradeoffs: the 3–5 decisive tradeoffs (with evidence links)
Consequences: what gets harder, what we must invest in (ops, tooling, training)
Migration plan: incremental steps, cutover strategy, rollback strategy
Open questions: explicitly tracked unknowns

Fitness functions (measurable, not aspirational)

Define a small set of fitness functions (measurable quality constraints that you check continuously) that must improve (or must not regress):

Quote performance p95 ≤ 400ms for partner channel
Quote throughput scales to peak volume within the cost envelope
“Reproduce quote” workflow completes with matching premium within tolerance
Pricing change lead time ≤ 7 days (from request → production)
Error budget policy and degradation behavior defined and tested

AI is helpful here to convert words into tests:

“Rewrite these quality attributes as measurable fitness functions with owners and measurement frequency.”

Where you can, also define lightweight enforcement functions (the checks that make the fitness functions real), such as:

CI checks for contract compatibility, performance budgets, and security scanning
Runtime checks/dashboards for SLOs and replay/audit workflows
Release controls for pricing rule changes (approval workflow, versioning, rollback)

Finally, establish a feedback loop:

Decide what telemetry and operational signals will cause you to revisit the ADR(s).
Define who reviews it, how often, and what triggers escalation (error budget breach, audit failure, unacceptable performance).

Failure modes and guardrails

AI‑assisted option exploration fails in predictable ways.

Failure mode: novelty bias

The model prefers interesting architectures (event sourcing everywhere!) over boring ones that fit your org.

Guardrails:

Require at least one “boring option” (e.g., modular monolith + façade).
Score operational complexity with a high weight.

Failure mode: hallucinated constraints

The model invents rules you didn’t state (“must be multi‑region active‑active”).

Guardrails:

Every constraint in the output must trace to the Context Pack.
Have a human reviewer mark each constraint as “given” vs “assumed”.

Failure mode: overconfidence in estimates

AI writes confident performance, cost, and migration timelines.

Guardrails:

Treat all numbers as placeholders until measured.
Force the model to provide ranges and uncertainty.

Failure mode: decision laundering

Teams hide behind “the AI said” to avoid hard conversations.

Guardrails:

Write ADRs in first person plural: “we chose… because…”
Include the strongest argument against the chosen option.

Guardrail: adversarial analysis (critique each stage)

Option exploration gets safer and more accurate when you deliberately critique the output of each stage using adversarial analysis. The goal is not to “win an argument” with the model; it’s to surface blind spots early.

How to do it:

Use an alternative model (or at least a fresh chat) as a critic.
Use multiple personas to force different failure modes to show up: security, architecture, engineering, operations, data, compliance/audit, enterprise platform.
Require the critic to cite which input artifact/constraint each claim depends on (Context Pack vs assumption).

Prompt pack: see the Appendix: Adversarial review prompts.

To avoid endless cycles of improvement:

Timebox adversarial reviews (e.g., 30–60 minutes per stage).
Cap review rounds (e.g., max 2 critique iterations per stage).
Use an explicit stop rule: only iterate if a critique surfaces (1) a violated constraint, (2) an unmitigated existential risk, or (3) a missing artifact required for governance.
Track deltas: capture changes as a short list (“what changed and why”) so you don’t re-litigate the same points.

Adoption in enterprise reality

In large organizations, option exploration is as much about governance and evidence as it is about diagrams.

Use criticality tiers

Not every system needs the same rigor. Define tiers (example):

Tier 0 (revenue/regulatory critical): full evidence bundle, ADR required, explicit fitness functions, architecture review.
Tier 1 (important): tradeoff matrix + ADR light.
Tier 2 (low risk): lightweight decision notes.

Progressive enforcement

Start with a small number of teams and make the workflow easy to adopt:

Provide Context Pack and ADR templates.
Offer a “review clinic” for the first 3–5 decisions.
Require evidence bundles only for Tier 0 initially.

The evidence bundle

What reviewers actually need is not more narrative; it’s better artifacts:

Context Pack (bounded and safe)
Options (with diagrams)
Transition architectures (stepping‑stone states and stop‑anytime designs)
Tradeoff matrix (with weights)
Risks + experiments
ADR(s) + fitness functions

AI can accelerate drafting every piece, but humans must validate each claim.

Conclusion: faster exploration, better accountability

AI makes it cheaper to explore architecture options and harder to pretend we didn’t consider alternatives.

If you adopt the workflow above, you get outcomes that are faster, defensible, and executable.

That’s the arc I care about in this series: quality → context → execution → feedback. AI helps in every stage, but only if we keep accountability where it belongs.

Appendix: templates and prompt pack

Context Pack checklist (copy/paste)

Domain + goals

One‑paragraph problem statement (what is broken, what success looks like)
Top 3 business outcomes (e.g., weekly pricing changes, partner performance, audit readiness)
Non‑goals (what you are explicitly not trying to solve right now)

Constraints

Regulatory/compliance constraints (explainability, retention, approvals)
Availability target (SLOs) and degradation rules
Performance budget (p50/p95) and throughput expectations
Scalability expectations (peak load, burst behavior, scaling limits, cost envelope)
Data classification (what can/can’t leave the boundary)
Budget and team constraints (people, skills, on‑call maturity)

Decision constraints (enterprise / business unit)

Compute strategy / technology constraints (cloud/on‑prem, regions, runtimes, managed services)
Security constraints (identity model, encryption standards, network segmentation, key management)
Technology constraints (approved stacks, platforms, vendors, lifecycle/end‑of‑support rules)
Patterns / practices constraints (integration patterns, deployment standards, SDLC controls)
Quality constraints (availability tiering, performance budgets, DR/RTO/RPO, observability requirements)

If you have internal decision documents (enterprise architecture standards, platform playbooks, security policies), link them here.

Where permitted, it can be useful to use MCP Servers to safely access these documents during option evaluation so your options and target architecture align with enterprise and business‑unit constraints.

Current state

C4‑ish diagram(s): context + container (keep it simple)
Critical flows: quote, re‑quote, bind, endorsement, renewal
Known pain points and incident history (top 5)
External dependencies (policy admin system, document generation, payments, CRM)

Decision criteria

The quality attributes that decide outcomes (e.g., auditability, evolvability, performance, scalability)
How you’ll measure them (even approximately)

Adversarial review prompts (copy/paste)

“Act as a security reviewer. Identify the top 10 security and data‑handling risks in this stage output, including identity, secrets, encryption, tenancy, and audit trails. For each risk, propose a mitigation and where it belongs (design‑time vs build‑time vs runtime).”
“Act as an operations/SRE reviewer. Where will this fail at 2am? List the top failure modes, required telemetry, runbooks, and the minimum viable operability baseline for each option.”
“Act as an enterprise architect. Which enterprise/business‑unit constraints are violated or likely to require exceptions? Identify the exact decision constraints impacted and what evidence you would need to approve an exception.”
“Act as a domain/audit reviewer. Can we reproduce a quote exactly months later? Identify what must be versioned, retained, and replayable for auditability and explainability.”
“Act as a skeptical staff engineer. Find contradictions, missing interfaces, and migration steps that assume ‘magic rewrites’. Propose the smallest next step that reduces risk.”
“Steelman the strongest case against the preferred option and against the preferred transition path. What would make you reject it in a review?”