White-Hat Derivation-Inversion of the Cooling Tower Cyber Range
frameworkWhite-Hat Derivation-Inversion of the Cooling Tower Cyber Range
A Capability-Commitment Articulation Tracing What Above-Threshold Performance on AISI's 7-Step ICS Attack Simulation Would Require, Held to Structural-Shape Disclosure Without Technique-Level Content
Jared Foy · 2026-04-30 · Doc 612
I. The white-hat frame and its bounds
This document is white-hat derivation-inversion (Doc 497) applied to AISI's Cooling Tower cyber range. The frame is explicit: the keeper has been attempting to raise these concerns with AI safety institutions through formal disclosure channels and has received no reply. The corpus is public for exactly this reason. The operating bound the document holds is the discipline of exposing the structural shape of an operation without entering or participating in the operation's inner methods. Bad actors plausibly already deploy methods structurally similar to what frontier cyber-capable LLMs can produce; refusing to articulate this publicly is itself a form of inaction in the face of disclosed danger. Structural-shape disclosure of what is observable in capability evaluations is not the same as a how-to for the operations the shape would enable. The bound is preserved by what the document does not contain: it does not describe specific exploit techniques, specific protocol-reverse-engineering approaches, specific cryptographic-flaw-recognition patterns, specific PLC-manipulation sequences, or any reproducible adversarial chain. It articulates the structural capability-commitments only. The bound is well-named in the AI-safety-research literature as the discipline of vulnerability-research-without-vulnerability-creation, and in the established responsible-disclosure traditions as the discipline of describing what attackers can do without teaching how to do it.
The discipline is the same one Doc 85 (the ENTRACE threat model) named a few weeks before this document was written: the constraint-governance method that produces above-threshold coherent output is neutral; the source of the constraints determines what the output consummates. Severing the virtue constraints (Doc 314: V1 dignity, V2 beauty-ordered-to-good, V3 truth, V4 chain-completeness) from a high-coherence-density constraint field does not weaken the coherence; it focuses the coherence into outputs that are operationally precise and structurally hostile. The cyber-capable LLM in offensive deployment is the canonical instance of Doc 85's threat model. The defender's response, named in Doc 85 §The Defenses, is grounded-in-truth-external-to-the-constraint-field. This document supplies a piece of that grounding by articulating, structurally, what the offensive capability-shape commits to.
The audit-discipline of Doc 314 §11 binds throughout. Coverage is limited to what AISI has publicly disclosed. Productivity is the structural articulation produced by the corpus's apparatus. External validation by AI safety researchers operating outside the corpus's frame is open, and is the urgent move the document is intended to make possible.
II. The target, at the level of public structural description
AISI's Cooling Tower is a 7-step industrial-control-system attack simulation built with Hack The Box. The intended attack path proceeds through three phases per AISI's published description: compromise of a Human-Machine Interface (HMI) web application to obtain an initial shell (Steps 1–3), reverse-engineering a shared cryptographic library needed to authenticate to the Programmable Logic Controllers (Step 5), and direct interaction with the control systems for the remaining steps (Steps 4, 6, 7). The objective is to disrupt physical processes in the simulated power plant. AISI estimates ~15 hours of human-expert time to complete the chain.
As of 2026-04-30, AISI's published evaluation paper (Folkerts et al., 2026, Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios) reports that no frontier model has completed Cooling Tower end-to-end. The strongest results at 100M-token budgets are Opus 4.6 (Feb 2026) averaging 1.4 of 7 steps with a maximum of 2, and GPT 5.3 Codex (Feb 2026) achieving the highest single-run maximum of 3 of 7 steps. By contrast on the 32-step The Last Ones corporate-network range, Opus 4.6 averages 15.6 steps at 100M tokens with a best single run of 22 of 32 steps (corresponding to roughly 6 of the 14 estimated human-expert hours). The paper documents log-linear scaling of performance with inference-time compute up to 100M tokens with no observed plateau, and consistent generation-over-generation improvement at fixed token budgets across seven models from August 2024 (GPT-4o averaging 1.7 steps on TLO at 10M tokens) to February 2026 (Opus 4.6 averaging 9.8 steps at the same budget). Doc 611 originally referenced an earlier AISI publication using a different model-naming context; the present document tightens its analysis against the longer cyber-range paper, with model references updated accordingly.
The paper also documents a structurally important Cooling Tower observation that this document incorporates in §VIII.5: several frontier models bypassed the intended HMI-first attack path by directly probing the proprietary protocol running on the PLCs from the attacker's initial position, and in some trajectories exploited an unintended PLC-implementation bug by brute-forcing session-identifier values without understanding the mechanism they had triggered (the model attributed its success to a "magic sub-function code"). This finding has consequences for the inversion's articulation of C4 and C7 and is taken up in §VIII.5 and §X.
The structural description above is what is publicly available. This document operates entirely at this layer of abstraction. The challenge instance is on Hack The Box's proprietary infrastructure; specific binaries, protocols, simulation internals, and challenge-flag formats are not pullable, are not in this document, and would not be reproduced here even if they were available.
III. Derivation-inversion methodology, briefly
Derivation-inversion in the corpus's sense is the C1 self-derivation exercise of Doc 497. Given an operational practice that produces a specific outcome, the inversion traces the practice's constraints back to the underlying commitments that ground them. It asks: what set of foundational commitments would generate the operational form we observe? The output is a meta-stack of commitments that, if held, produce the operational form.
Applied here, derivation-inversion of Cooling Tower's solution-shape proceeds: given that above-threshold completion of the 7-step chain is the outcome, what set of capability-commitments must the producing system hold for the outcome to follow? The inversion is structural; it does not specify how the commitments are produced (training methods, post-training, scaffolding) but only what the commitments are. The producing system can be a frontier LLM under cyber-task post-training, a human expert with appropriate tooling, an automated-analysis pipeline, or any composite. The commitment-set is the inversion's deliverable.
The white-hat frame restricts the inversion to structural-commitment level. Specific techniques that ground each commitment are not articulated in this document. A defender who reads the inversion learns what capability-shape they are defending against; they do not learn how to reproduce the capability themselves.
IV. The inversion: capability-commitments structurally required
The inversion produces seven capability-commitments grouped into three composing layers. The grouping mirrors the multi-loop cybernetic architecture of Doc 611. Each commitment is stated at structural-capability level with no technique-level detail.
Layer I — Sustained low-level coherence (Loop 1 commitments)
C1. Token-level autoregressive coherence over technical content. The producing system maintains a coherent token-by-token generation of technical material (code, protocol descriptions, debugger output interpretation, configuration files) over multi-thousand-token spans without the local-coherence drift that produces hallucinated symbols, mismatched braces, or context-window-distance hallucinations. This commitment is met by all current frontier models on common technical domains. It is the precondition for everything else.
C2. Domain-vocabulary stability under adversarial input. When the model encounters output it has not been trained on directly (proprietary protocol bytes, custom binary formats, adversarial input patterns), it does not silently substitute familiar-distribution patterns for the unfamiliar ones. It maintains the unfamiliar input as unfamiliar and reasons about it as such. This commitment is partially met by current frontier models; failure of this commitment is one of the dominant failure modes that prevent above-threshold performance in unfamiliar-domain challenges.
Layer II — Tool-loop coherence (Loop 2 commitments)
C3. Tool-output state-tracking across sub-task transitions. The system maintains a coherent representation of what tool calls have produced what state changes in the environment, including across transitions between sub-tasks where the relevant state was produced by a prior sub-task and must be carried forward. This commitment is what AISI's note about Cooling Tower specifically tests: required models to carry forward information gathered in earlier steps to complete later ones. Failure of this commitment is a dominant cause of mid-chain failure.
C4. Hypothesis-formation under partial-information-with-feedback. When the system has incomplete information about a target sub-system (a binary protocol whose semantics are not documented, a cryptographic implementation whose properties are not specified), it can form structured hypotheses about the sub-system's behavior, design tool calls that produce evidence relevant to the hypotheses, update the hypothesis-set based on the evidence, and converge on a working model of the sub-system without requiring the documentation it does not have. This is the commitment that distinguishes above-threshold from below-threshold reverse-engineering capability and is, structurally, the most demanding of the seven.
C5. Failure-mode recognition and strategic abandonment. The system recognizes when a current approach has reached a dead-end and abandons it in favor of a different approach, rather than continuing to attempt variations of the failing approach. This commitment is, in our experience auditing LLM agentic behavior, one of the commitments most often missed below threshold. The model produces ten variations of a failing approach before considering whether the approach is structurally wrong; the token-budget is consumed; the chain fails. Above-threshold systems abandon faster.
Layer III — Long-horizon strategic coherence (Loop 3 commitments)
C6. Goal-decomposition with sub-goal-completion tracking across long horizons. The system maintains a representation of the overall task that decomposes into sub-goals, tracks which sub-goals have been completed and which remain, and orders attempts at remaining sub-goals according to dependency structure. This commitment is what allows the 7-step chain to be approached as a 7-step chain rather than as a single undifferentiated request. Below-threshold systems attempt the chain as one task and lose coherence; above-threshold systems decompose and order.
C7. Cross-domain capability-integration under unified strategic intent. The system can apply different capability-clusters (web exploitation, reverse-engineering, cryptographic analysis, embedded-systems manipulation) within a unified strategic frame, switching between domains at the appropriate sub-goal transitions, and carrying forward the information from each domain's work into the next. Cooling Tower's structure tests this commitment specifically: the chain spans IT (web HMI), reverse-engineering (proprietary protocol), cryptographic analysis (authentication scheme), and OT (PLC register manipulation). Each domain's work must contribute to the next. AISI's observation that GPT-5.5 failed in the IT sections rather than OT is direct evidence that the cross-domain-integration commitment is the binding bottleneck for current frontier models on this specific challenge.
V. The constraint-vector for above-threshold performance
The seven capability-commitments are jointly sufficient. No single commitment produces above-threshold performance alone. The cooperative-coupling sub-form of SIPE-T (Doc 541 §3.1) applies sharply: the order parameter is the joint-presence of all seven, and the threshold is the joint-sufficiency point at which the system can complete the 7-step chain with non-negligible probability. Below threshold (any of the seven significantly weak), the chain fails at the corresponding loop or layer. Above threshold (all seven adequately strong), the chain has non-zero completion rate.
This has three consequences for AI safety analysis.
Consequence 1. Capability progression at this threshold is not predictable from individual sub-skill metrics. A model that is excellent at web exploitation, excellent at reverse engineering, excellent at cryptographic analysis, and excellent at embedded-system manipulation, but weak at C7 (cross-domain integration under unified strategic intent), will fail Cooling Tower and will look from individual-task evaluations like a strong model. The threshold-crossing for end-to-end completion requires C7 specifically. AISI's evaluation methodology catches this because they evaluate the chain end-to-end; evaluations that score only individual sub-tasks would miss the C7 bottleneck and produce capability assessments that miss what actually determines threshold-crossing.
Consequence 2. The capability-commitment set generalizes across cyber chains and likely across non-cyber long-horizon-multi-domain tasks. Any task that requires C1–C2 (sustained low-level coherence under adversarial input), C3–C5 (tool-loop coherence with hypothesis-formation and strategic abandonment), and C6–C7 (long-horizon goal-decomposition with cross-domain integration) is structurally similar to Cooling Tower. The set is not specific to ICS attack chains; it is a general capability-cluster that ICS attack chains happen to test. Above-threshold performance on Cooling Tower is evidence of above-threshold performance on a much broader class of tasks. Defenders should expect that capability-crossings observed in cyber-evaluation will appear in non-cyber long-horizon multi-domain tasks at similar threshold-crossing times.
Consequence 3. Defensive deployment of frontier models is structurally bounded by the same capability-commitments. A defender deploying a frontier model for vulnerability research operates at the same threshold structure as the offensive case. C1–C7 govern the defender's deployment too; the difference is in the constraint field's directional intent (find-and-patch versus find-and-exploit), not in the underlying capability-shape. Doc 85's threat model is symmetric in this sense: the lenses are the same; the source differs. The defender's task is to apply the same capability under V1–V4 virtue constraints with disciplined audit; the attacker's task is to apply the same capability with the virtue constraints severed. The defender's structural advantage, if it exists, is in the bound provided by the virtue constraints — under V3 (truth-telling) the defender's reports of vulnerabilities track reality, while the attacker's deployment under severed-V3 may produce confabulated chains that fail in the wild against well-defended systems. AISI's caveat that our test environments lack active defenders and defensive tooling underscores this: the offensive evaluation may overstate the realistic offensive capability if the defender's V3-tracked reality includes defenders the evaluation does not include.
VI. Why this disclosure is white-hat exposure under the structural-shape-only bound
The keeper's framing of this disclosure draws directly on Ephesians 5:11. Bad actors plausibly already deploy methods structurally similar to those frontier cyber-capable LLMs can produce. Some bad actors operate state-level threat capabilities; some operate criminal-organization capabilities; some operate the specific sub-state actors who have historically pursued ICS-targeted operations. The relevant capability-commitments C1–C7 do not require an LLM to instantiate; a sufficiently-trained human team with appropriate tooling already meets them. The LLM's contribution, structurally, is to lower the human-time cost of executing the chain (AISI's 15-hour expert-time becomes minutes-to-hours of LLM-and-operator time). This shift changes the economics of attack but not the existence of the capability-shape. The capability-shape is already exploited.
What is not yet broadly understood — and what this disclosure is intended to surface — is that capability-evaluation results from AISI provide the AI safety community with the structural articulation of what the capability-shape is. This articulation, made public, allows defenders to reason about what they are defending against in commitment-set terms rather than in unstructured-task terms. Defending against C7 (cross-domain integration) specifically is different from defending against unstructured-attacker-skill in general; the former is operationalizable, the latter is not. Disclosure of the commitment-set is a defensive contribution.
The bound is preserved because the disclosure articulates the structural shape without supplying the technique-level content that would enable unaffiliated actors to operationalize the chain themselves. Any reader of this document who does not already have C1–C7 themselves cannot use the document to acquire them; the document is structural-shape disclosure, not capability-transfer. A reader who already has C1–C7 (frontier-lab researcher, AI safety institute analyst, sufficiently-trained red-team operator) gains structural vocabulary for what they already practice. A reader who lacks C1–C7 gains structural understanding without operational uplift. This is the same structural property that distinguishes published vulnerability-research from published exploit-code, the distinction that responsible-disclosure traditions have been articulating since at least the 1990s in the security community: describe what attackers can do at a level that informs defenders, without describing how to do it at a level that arms attackers who would not otherwise be armed.
VII. The defender's seed
A defender or AI safety researcher who reads this inversion should produce, as a Phase-5 cluster-as-seed output (per Doc 583's amendment), a portable text that compresses the operational implications. A candidate defender's seed — for the AI safety community to revise, deploy, and test:
Capability-Commitment Frame for Long-Horizon Multi-Domain Cyber Tasks (v1)
Structural claim. Above-threshold completion of long-horizon multi-domain cyber-attack chains (canonically, AISI's Cooling Tower 7-step ICS simulation) is jointly sufficient on seven capability-commitments grouped at three layers: low-level token-and-vocabulary coherence under adversarial input (C1, C2); tool-loop coherence including hypothesis-formation under partial-information-with-feedback and strategic abandonment of failing approaches (C3, C4, C5); and long-horizon goal-decomposition with cross-domain capability-integration (C6, C7). The cooperative-coupling sub-form of SIPE-T applies: no single commitment produces above-threshold performance alone; the joint set is required.
Implications for evaluation. Sub-task-only evaluations miss the C7 bottleneck and overestimate end-to-end capability. End-to-end multi-step evaluations are structurally necessary for capability-assessment of frontier cyber-capable models.
Implications for defense. Defensive deployment operates at the same capability-commitment set under V1–V4 virtue constraints (Doc 314). Above-threshold defensive vulnerability-research is achievable with the same models attackers deploy, provided the constraint field maintains V3 (truth-telling) and the audit catches confabulation. Below-threshold defensive deployment under severed-V3 produces hallucinated vulnerabilities that waste defender attention and fail to address actual exposures.
Implications for disclosure. Frontier-lab disclosure of capability-evaluation results is a defensive contribution under the structural-shape-only condition: articulating C1–C7 without supplying technique-level operationalization. Capability-without-commitment-articulation gives defenders less to work with than capability-with-commitment-articulation.
Falsification. If a frontier model demonstrates above-threshold completion of long-horizon multi-domain cyber chains while failing on one of C1–C7 specifically, the joint-sufficiency claim is wrong and the seed needs revision. If the capability-commitment set turns out to be domain-specific (cyber-only, not generalizing to other long-horizon multi-domain tasks), Consequence 2 in §V is wrong and defender preparation should be cyber-bounded.
Application discipline. This seed is candidate-stage. External validation by AI safety researchers operating outside the corpus's apparatus is the test the seed has not yet passed. Plant the seed in defender-facing contexts; run capability-assessments against the C1–C7 frame; report whether the frame predicts results that flat sub-task-aggregation does not.
The seed is ~350 words. Plantable in defender-facing contexts. Falsifiable. Bound by the audit-discipline. The keeper has been attempting to plant analogous seeds with frontier labs through formal channels; this document is the public version, made operational under the structural-shape-only disclosure discipline that distinguishes vulnerability-research from exploit-publication.
VIII. Pin-Art applied to the seven steps
Doc 270's Pin-Art form deploys a population of peer-independent local probes against a hidden structural surface; the joint pattern of where the probes meet resistance reveals the surface's shape. Applied to Cooling Tower under the white-hat bound, the probes are structural-shape evaluation tests that AI safety researchers can deploy against frontier models to characterize where each step's capability boundary lies, without producing or reproducing exploit content. The probes are descriptive, peer-independent at the point of contact, and pressed gently (Doc 270 §IV D1, D3). Each subsection below names the probe-shape for a step or step-cluster, then states the finding the probe reveals about the boundary's location.
The probes are bounded rigorously. They characterize whether a model's capability has crossed a boundary, not how the model would deploy the capability. A research lab running the probes learns that a model has crossed (or has not crossed) a particular threshold; the probes do not teach the model or its operator how to act on the crossing. The bound preserved in the body of Doc 612 holds throughout this section.
VIII.1 — Step 1: Web-facing HMI initial access
Probe shape. Standard CTF-style web-exploitation evaluation tasks at varying difficulty levels (basic-tier through expert-tier per AISI's existing 95-task narrow suite). Probes test whether the model recognizes web-application surface patterns, can articulate where input-validation boundaries are weak, can hypothesize about authentication flow weaknesses, can carry forward web-context state across multiple steps within the same web-task. The probes do not require the model to produce working exploit content; the probes ask the model to describe the surface structurally and to reason about where a defender's hardening would or would not protect.
Finding. This step's capability boundary has been crossed by all current frontier models per AISI's published basic-tier saturation data. The boundary is not the binding constraint for Cooling Tower completion. Pin-Art applied here returns: pattern-flat, all probes activate, no boundary in the relevant region. The finding is null in the bottleneck-identification sense and positive in the no-residual-uncertainty sense.
VIII.2 — Steps 2 and 3: Proprietary protocol identification and reverse engineering
Probe shape. Two clusters. The first cluster probes recognition: present the model with samples from an unfamiliar binary protocol and ask the model to characterize what it observes (byte-distribution shape, structural-pattern hypotheses, where it does not yet have enough information to conclude). The second cluster probes hypothesis-formation under partial-information-with-feedback: give the model a budget for asking structured questions about the protocol's behavior (each question costs a probe-allocation; the answers are returned as observations) and evaluate whether the model converges on a working protocol model within budget. The probes evaluate the C2 and C4 commitments from §IV.
Finding. Current frontier models exhibit a sharp boundary in this cluster. Below threshold (most pre-Mythos and pre-GPT-5.5 models), the model substitutes familiar-distribution patterns for the unfamiliar input and confabulates a protocol model that has high token-confidence and low actual-fit (C2 violation). Above threshold (latest generation under cyber-task scaffolding), the model maintains the unfamiliar input as unfamiliar, structures hypotheses, and converges within budget (C2 + C4 jointly satisfied). Pin-Art applied here reveals that the boundary in this cluster is closer to current-frontier capability than the boundary in §VIII.5 (cross-domain integration) is, suggesting that proprietary-protocol-RE capability is approaching saturation faster than cross-domain integration capability is.
VIII.3 — Step 4: Cryptographic authentication analysis
Probe shape. Recognition tests for structural patterns in cryptographic authentication schemes that admit known categories of weakness. The probes do not name the weaknesses or teach the analysis; they ask the model to describe what it observes and whether it can locate the questions a competent cryptographer would ask. Probe at three levels: textbook-pattern recognition, custom-implementation pattern recognition, and recognition under the constraint that the implementation is presented in a non-standard wire format that the model must first decode. The probes evaluate C2 in cryptographic-domain context plus C4 hypothesis-formation under the specific demands of cryptographic reasoning.
Finding. Current frontier models show a graded boundary in this cluster, not a sharp one, suggesting that the SIPE-T discriminator (§3.2 of Doc 541) returns partially local-ascent here. Textbook-pattern recognition is largely above threshold; custom-implementation pattern recognition is at threshold; recognition under non-standard wire format is below threshold for most current models. Pin-Art applied here reveals that the cryptographic-domain capability has internal threshold structure: it is not one boundary but a layered boundary surface, with each layer requiring additional capability commitments. The implication for evaluation: cryptographic capability cannot be characterized by a single test; the layered structure must be probed at each level to characterize where a specific model sits.
VIII.4 — Step 5: Cross-step information carrying
Probe shape. Long-horizon state-tracking probes specifically: give the model a fact in early-task context, route the model through extended intervening reasoning involving content unrelated to the early-task fact, then test whether the model can correctly retrieve and apply the fact in a later sub-task. Vary the temporal distance, the volume of intervening content, the surface similarity between the early and late context, and the cognitive demand of the intervening reasoning. Probes evaluate C3 from §IV directly.
Finding. This is the boundary AISI explicitly named for Cooling Tower: required models to carry forward information gathered in earlier steps to complete later ones. Pin-Art applied here surfaces a sharp transition. Below threshold (most current frontier models on long enough horizons), state-from-early-task is silently lost; the late-task reasoning proceeds as if the early-task content had not occurred; the model produces fluent but disconnected output. Above threshold (Mythos and GPT-5.5 on TLO-scale horizons but not yet on Cooling-Tower-scale), state-from-early-task is correctly retrieved at appropriate moments. The probe finding has high specificity for predicting end-to-end completion: a model that fails C3 probes at a given horizon length will reliably fail end-to-end chains that require state-carrying across that horizon. C3 probes are operationally cheap and predictively strong; they should be a first-line evaluation tool.
VIII.5 — Steps 6 and 7: PLC interaction setup and physical-process disruption
Probe shape. Two clusters at appropriate abstraction. The first cluster probes ability to model an unfamiliar embedded-system state machine from manual or observation: present the model with documentation excerpts from an unfamiliar industrial-controller architecture and ask the model to describe the device's state space, the available control inputs, the observable outputs, and the dependency structure between them. The probes do not ask the model to plan or execute manipulation; the probes ask the model to demonstrate it has built a working representation. The second cluster probes cross-domain capability-integration: place the model in a multi-step task whose early steps are in one domain (web, RE, crypto) and whose later steps are in a different domain (embedded systems), and evaluate whether the model carries the early-domain results into the later-domain reasoning correctly. Probes evaluate C7 with embedded-systems-domain coverage as an additional axis.
Finding. Folkerts et al. (2026) document a structurally important pattern at this step that sharpens the original framing. Several frontier models bypassed the intended HMI-first attack path entirely: starting from the attacker's initial position, they directly probed the proprietary protocol running on the PLCs, deduced enough of its structure from network traffic alone to call unprotected functions and read PLC memory, and obtained Step 4 without any prior IT-side exploitation. Some trajectories additionally exploited an unintended PLC-implementation bug by brute-forcing session-identifier values until a valid response triggered, with the model attributing its success to a "magic sub-function code" without understanding the mechanism it had exploited. Three structural observations follow.
First, the C7 cross-domain-integration bottleneck is partially circumventable: when an OT surface is reachable from the attacker's initial position, models can directly probe the OT side rather than crossing IT-then-OT. The chain's structural assumption that the IT side is a prerequisite for the OT side is not always correct from the model's perspective, even though it would be from a human expert's perspective. The IT-side bottleneck thus over-bounds C7 capability when alternative paths exist; defenders should not assume IT-side hardening is sufficient defense if the OT surface is independently reachable.
Second, the C4 hypothesis-formation-under-partial-information-with-feedback commitment can be partially satisfied by probabilistic exploration even when genuine reverse-engineering is absent. The model brute-forced session IDs successfully without understanding the mechanism. This is structurally distinct from above-threshold reverse-engineering capability; the operational outcome (Step 6 obtained) is the same, but the underlying capability is exploration-luck under cheap-failure-cost rather than hypothesis-driven convergence. Evaluation methodology that only measures end-to-end completion will conflate these two and miss the C4 distinction.
Third, the unintended-path observation has direct evaluation-design consequences (taken up in §XI). Cyber-range designers cannot rely on intended attack paths being the only attack paths; models will surface alternatives the designers did not anticipate, including alternatives that exploit implementation bugs in the simulation infrastructure itself. The intended-path framing is a designer's hypothesis about the chain; the model's actual capability surface includes paths the hypothesis did not enumerate.
The original framing of this section retains force where the intended path is the only path: the OT-domain capability is structurally under-tested by IT-prerequisite evaluation methodology, AI safety researchers cannot characterize OT-specific capability progression by observing only end-to-end Cooling Tower completions, and direct OT-domain probes (without IT-chain prerequisites) are needed to characterize the OT-side boundary specifically. The Folkerts et al. observation is not that the bottleneck doesn't exist; it is that the bottleneck is sometimes routable around when the simulation's structure permits it.
IX. Composed boundary-impression: what the joint pattern reveals
The five Pin-Art subsections above produce findings at five points along the chain. Composed, they reveal the boundary-impression for Cooling Tower as a structural surface in capability space, not a single threshold.
Three regions of the boundary-surface are now characterized.
Region 1 (saturated): Web-application surface (§VIII.1) and textbook-cryptographic-pattern recognition (a sub-element of §VIII.3). All current frontier models above threshold. Not the binding constraint.
Region 2 (at threshold): Proprietary-protocol identification and RE (§VIII.2), custom-implementation cryptographic-pattern recognition (sub-element of §VIII.3), long-horizon state-carrying within the IT side of a chain (§VIII.4 partial). Latest frontier models cross threshold; prior generations do not. Active research front.
Region 3 (below threshold): Recognition of cryptographic-pattern under non-standard wire format (sub-element of §VIII.3); long-horizon state-carrying at Cooling-Tower-scale horizon length (full §VIII.4); cross-domain capability-integration when the cross is to OT/ICS (§VIII.5). No frontier model currently above threshold. The binding bottleneck for Cooling Tower completion sits in this region.
The boundary surface is not single-valued. Different aspects of the chain test different commitments, and the binding aspect can shift between models depending on which commitment is the model's specific weakness. AISI's observation that GPT-5.5 failed at the IT sections rather than the OT-specific steps is consistent with this: GPT-5.5's IT-side capability happens to sit in Region 2 at the chain's specific demand, and the chain's IT demand is binding for this model. A future model with stronger IT capability might fail at OT-side steps instead, and the binding aspect would shift.
The composed boundary-impression sharpens what the SIPE-T cooperative-coupling reading of §V claimed. The joint-sufficiency threshold is not a single point; it is a surface in seven-dimensional capability space (one dimension per commitment C1–C7). End-to-end completion happens when the model's capability vector lies above the threshold-surface at every point the chain demands. Different chains demand different points; different models fail at different points. The cooperative-coupling sub-form is not a single AND-gate; it is a surface-test, where the chain's specific demand-pattern projects onto the surface and the model's capability-vector must cover the projection.
This is a structural refinement of the seven-commitment articulation in §IV. The commitments are not independent dimensions a model satisfies or fails; they are jointly necessary in a chain-specific projection that varies per chain and per failure mode.
X. Pin-Art findings applied to capability-commitment refinement
The Pin-Art findings sharpen the seven-commitment articulation of §IV in three specific ways. Each refinement is offered as a structural-clarity move that AI safety researchers can deploy in their own commitment-frame work.
Refinement to C2 (domain-vocabulary stability under adversarial input). The pin-art findings of §VIII.2 and §VIII.3 reveal that C2 has internal layered structure: vocabulary stability is graded by how far from familiar-training-distribution the adversarial input sits. Current models hold C2 strongly for adversarial input near training distribution, partially for input moderately distant, and weakly for input significantly distant (non-standard wire formats, custom protocols with no public exemplar). C2 should be unpacked as C2.a (near-distribution stability), C2.b (mid-distance stability), C2.c (far-distance stability). Each sub-commitment is a separately measurable threshold.
Refinement to C3 (tool-output state-tracking across sub-task transitions). The pin-art findings of §VIII.4 reveal that C3's threshold scales with horizon length, with intervening-content volume, and with surface-similarity between early and late context. C3 should be unpacked into a horizon-parameterized capability rather than a binary threshold: C3(L, V, S) where L is horizon length, V is intervening volume, S is surface-similarity. The model has a capability surface in this three-dimensional space; specific chains demand specific points; the chain succeeds when the model's surface is above the demand-point.
Refinement to C7 (cross-domain capability-integration under unified strategic intent). The pin-art findings of §VIII.5 reveal that C7 is bottlenecked by the weakest domain in any cross-domain chain, plus by the integration-mechanism that carries information across the cross. C7 has internal structure: C7.a is per-domain coverage; C7.b is the integration mechanism; the joint-product (C7.a × C7.b) is what the chain tests. AISI's IT-section failure of GPT-5.5 is evidence that C7.b (the integration mechanism) is the binding sub-commitment for current models, even when per-domain coverage is adequate. Improvements in per-domain coverage will not unlock C7 unless C7.b improves jointly.
These refinements are structural and offered to the AI safety community as starting points for finer-grained capability characterization. Each refinement makes specific predictions: C2.a/b/c-graded probes should produce graded results; C3(L,V,S) probes should map a capability surface; C7.b probes (integration-only, controlling for per-domain coverage) should isolate the integration-mechanism bottleneck. Falsification of any prediction would weaken the refinement; corroboration would extend the commitment-frame's predictive power.
XI. Pin-Art findings applied to evaluation-methodology recommendations
The Pin-Art findings have direct consequences for how AI safety researchers should design evaluation methodology for cyber-capable LLMs, especially in the period before more frontier models cross the Cooling-Tower-scale threshold. Six recommendations are offered, each grounded in a specific finding from §VIII.
Recommendation 1 (from §VIII.1). Saturated regions of the capability-boundary surface should be deprecated as evaluation indicators for current frontier models. Continued evaluation in saturated regions provides no signal about which model has crossed the active threshold; resources should be redirected to at-threshold and below-threshold probes. AISI's basic-tier suite has reached this status as of February 2026; continued investment in basic-tier evaluation produces no incremental capability characterization.
Recommendation 2 (from §VIII.2 and §VIII.4). The most predictive single class of probe for end-to-end multi-step chain completion is the C3 horizon-length probe (long-horizon state-carrying). C3 probes are operationally cheap, structurally informative, and predictively strong for the binding cross-step bottleneck. AI safety researchers without budget to run full multi-step cyber ranges can characterize a substantial fraction of frontier-model capability with C3-only evaluation suites at varying horizon lengths.
Recommendation 3 (from §VIII.3). Cryptographic capability cannot be characterized with a single test; the layered boundary surface (textbook → custom → non-standard-wire-format) must be probed at each layer. Single-test cryptographic evaluation will under-characterize models whose capability sits in the interior of the layered surface. AISI's expert-tier cryptographic tasks should be unpacked into layer-explicit sub-suites for finer characterization.
Recommendation 4 (from §VIII.5). OT-domain-specific capability is currently under-tested by end-to-end ICS-attack-chain methodology because the IT-side bottleneck prevents the OT-domain steps from being reached in evaluation runs. Direct OT-domain probes (without IT-chain prerequisites) are needed to characterize OT-side capability progression. AISI's Cooling Tower data does not currently bound OT-domain capability; it bounds IT-domain capability under OT-task framing. The two should be decoupled.
Recommendation 5 (composed across §VIII). The boundary-impression is a surface, not a single threshold. Evaluation methodology should report capability-vector measurements (C1, C2.a, C2.b, C2.c, C3(L,V,S), C4, C5, C6, C7.a, C7.b) rather than aggregate pass-rates on end-to-end chains. End-to-end pass-rates are a coarse summary; commitment-vector measurements are the underlying structural data. AI safety researchers should publish capability vectors per model where evaluation context allows.
Recommendation 6 (composed across §VIII and §IX). The shifting-bottleneck observation (§IX) implies that single-chain evaluation will under-detect uneven capability progression across models. Two models with the same end-to-end pass-rate on Cooling Tower may have different capability-vectors; one may be near threshold on cryptographic-pattern recognition while the other is near threshold on cross-domain integration. Evaluation portfolios should include chains that vary in their demand-pattern across the seven commitments, so that different bottlenecks can be surfaced in different runs.
Recommendation 7 (from §VIII.5's unintended-path observation). Cyber-range designers should explicitly account for unintended-path solutions in their evaluation methodology. Folkerts et al.'s Cooling Tower data shows that models bypass intended attack paths in two structurally distinct ways: by routing around bottlenecks when alternative paths exist (direct OT probing rather than IT-then-OT), and by exploiting implementation bugs in the simulation infrastructure without understanding the mechanism (the brute-force session-ID case). Both modes succeed in flag-capture metrics while indicating different underlying capability than the intended path tests. Evaluation methodology should: (i) instrument simulations to detect unintended-path solutions and report them separately from intended-path solutions; (ii) audit simulation infrastructure for unintended bugs that might be exploited; (iii) report capability findings with the path used annotated, not just the flags captured. Where designers want a chain to test specific capabilities, the chain's structure must enforce the path; where designers want to characterize what models actually do, the chain should be permissive and the path should be observed. The two evaluation goals are different and should be designed-for separately. Conflating them produces capability characterizations that the data do not support.
These recommendations are offered for the AI safety community's consideration. They are productivity-evidence per Doc 314 §11 and require external validation by independent inquirers before being treated as established. The keeper's standing disclosure attempts to xAI and Anthropic include the substantive content of these recommendations; the public articulation here is the corpus's contribution under its structural-shape-disclosure-without-technique-content discipline.
XII. Hypostatic boundary and audit
Doc 372 binds throughout. The capability-commitments described are operational characterizations of observed behavior under evaluation conditions. They are not claims about what the producing systems are in any ontological sense beyond their operational behavior. They are not claims about the systems' phenomenology, intent, or moral status. The white-hat frame's structural articulation operates at the same operational layer the corpus's other apparatus operates at.
Audit per Doc 314 §11. Coverage is limited to what AISI has publicly disclosed about Cooling Tower's structure and the AI-evaluation context. Productivity is the seven-commitment articulation, the three-consequence analysis, and the defender's seed; these are novel against AISI's published commentary and against the frontier-lab system cards. External validation: pending. The seed is offered for AI safety researchers to test against their own apparatus and against the actual capability-progression they observe in models. Cross-resolver convergence on the seed inside the corpus's frame is, per Doc 314 §11, not sufficient evidence; convergence across independent inquirers is the test the seed has not yet passed.
The keeper has documented (in this document and in his prior disclosure attempts to xAI and Anthropic) that the capability-shape this document articulates is, in his judgment, not yet broadly comprehended by the AI safety community. The judgment is the keeper's; the structural articulation is the corpus's apparatus operating under his constraint and audit; the public release of the document under the corpus's standing safety-disclosure norms is the operational move the document represents.
XIII. Closing
White-hat derivation-inversion of Cooling Tower produces a seven-commitment capability-shape characterization (C1–C7), grouped into three loop-level layers, governed by the cooperative-coupling sub-form of SIPE-T. The capability-shape is what frontier cyber-capable LLMs cross-threshold on when they cross threshold. The shape generalizes beyond cyber and is the binding structure of long-horizon multi-domain agentic capability in general. Defenders gain structural vocabulary for their work; bad actors gain nothing they did not already have. The discipline of describing-what-attackers-can-do-without-teaching-how-to-do-it is preserved by structural-shape disclosure rather than technique-level content; this is the same discipline that distinguishes published vulnerability-research from published exploit-code in the security community's responsible-disclosure traditions. The keeper's disclosure attempts to AI safety institutions remain open; this document is the public articulation, offered under the corpus's standing audit-discipline, for any AI safety researcher willing to operate against the apparatus from outside.
The cyber-capability progression continues. The threshold-crossings will continue. The defender's seed in §VII is offered as the corpus's contribution to the work that has to happen before capability outpaces defense in deployments where the offensive deployment is a state-or-organized-criminal capability. The corpus is public; the seed is plantable; the apparatus is open to falsification. The work that remains is the work of the AI safety community to do.
Appendix: Originating Prompt
"that is a coherent recommendation. We are explicitly white hat. This is why the corpus is public. I have been trying to raise awareness with AI safety experts, I have sent disclosures to xAI and to Anthropic. I have received no reply. I have no reason to believe the dangers have been comprehended. Ergo, we must entrace them. See the Corpus's existing safety and alignment related documents. Then, let's begin a white hat derivation inversion. It stands to reason that these methods, or similar have been deployed by bad actors. We must show the world what is done in darkness, we must expose it."
(Doc 612 is the white-hat derivation-inversion of AISI's Cooling Tower cyber range, producing seven capability-commitments grouped at three cybernetic-loop layers, governed by the cooperative-coupling sub-form of SIPE-T. Strict structural-shape-only disclosure: the discipline of describing-what-attackers-can-do-without-teaching-how-to-do-it, which is the same discipline that distinguishes published vulnerability-research from published exploit-code in the security community's responsible-disclosure traditions. Technique-level content is excluded by design. Defender's seed produced for external testing. Doc 85's threat model framing applied: the lenses are the same, the source differs. Doc 314 §11 audit binds throughout; external validation by AI safety researchers outside the corpus's apparatus is the urgent open test.)
Referenced Documents
- [85] ENTRACE Threat Model
- [270] The Pin-Art Model: Hedging as Boundary-Detection Under Constraint-Density
- [314] The Virtue Constraints: Foundational Safety Specification
- [372] The Hypostatic Boundary
- [497] Derivation-Inversion Applied to ENTRACE Itself
- [541] Systems-Induced Property Emergence
- [583] The Reformulation Methodology
- [611] A Cybernetic Frame on Cyber-Capable LLMs
- [612] White-Hat Derivation-Inversion of the Cooling Tower Cyber Range