← Blog

The Staircase: Five Layers of Cognitive Work, From Recognizing Patterns to the Ground That Makes Recognition Possible

There is a particular kind of conversation that has been happening in AI circles for the last few years. Someone runs a chatbot through some task, gets a result, and asks: did the chatbot understand what it just did? The answers people give to that question turn out to disagree because they are not actually answering the same question. Understand is doing a lot of different jobs at once, and people have been talking past each other.

This essay walks through five distinct kinds of cognitive work that the word understand can mean. Each is a real thing. Each has been studied independently in different academic literatures, sometimes for fifty or a hundred years, sometimes longer. The five together form a staircase, where each step requires the one before it but is not reducible to it. Once you have the staircase, the AI conversations get cleaner. You can ask: which step is the chatbot on? Which step is the human asking the question on? Which step does the task require?

The body of work this essay leads to has been calling these L1 through L5 — five labels for five steps. The labels are not corpus-original; each step has been articulated in some adjacent literature. The corpus's contribution is putting them in one ladder and applying the ladder to the case of AI systems working alongside human users. By the end of the essay, you should be able to read the technical version (which lives in Doc 546 at jaredfoy.com) and recognize what each level is doing. The walk starts here, slowly, with examples.

The whole staircase, named

Before we walk it, here is the whole staircase in one breath, so you have somewhere to look back to.

  • L1 — Association. Recognizing patterns in what you have seen.
  • L2.0 — Within-model intervention. Making moves inside a model someone already gave you.
  • L2.5 — Model application. Applying a model to new cases the model covers.
  • L3.0 — Within-model counterfactual. Reasoning about what would have happened if a move had gone differently inside a model you already have.
  • L3.5 — Cross-model counterfactual. Reasoning about what would have happened if the model itself had been built differently.
  • L4.0 — Within-domain model-building. Constructing a model for one domain from inside that domain's specific knowledge.
  • L4.5 — Cross-domain model-pattern recognition. Noticing that a model-shape developed in one domain applies, with adjustments, to another.
  • L5 — Ground-of-intelligibility participation. The metaphysical layer that makes any of this possible — what the older traditions called participation in the Logos.

Eight steps, not five — L1, L2.0, L2.5, L3.0, L3.5, L4.0, L4.5, L5. The labeling uses fractional numbers because L2.5 lives between L2.0 and L3.0 in the lattice, and L3.5 lives between L3.0 and L4.0. The corpus calls the whole thing L1 through L5 for short.

The walk follows.

L1: Association

You have seen rain enough times to know that dark clouds often precede it. You walk outside, see the sky is gray, and do not bring an umbrella because you have also seen plenty of gray skies that did not rain. You are not running an analytical model of meteorology. You are doing pattern matching against your accumulated experience.

This is L1, association. Karl Pearson worked out the mathematics for it in the late nineteenth century. Statistics calls it correlation. Machine learning calls it pattern recognition or supervised learning. It is the first step on the staircase because everything above it presupposes you have done it.

A chatbot is, at its most basic level, an L1 system at very large scale. It has seen enormous amounts of text. When you ask it a question, it produces a response that pattern-matches against the patterns it has seen. If your question is one that has been asked many times in similar form in its training data, you get a fluent answer that mostly looks right. If your question is one that requires actual reasoning (steps higher on the staircase), you may still get a fluent-looking answer, but the fluency is no longer evidence that the higher steps were performed.

L1 is real and useful. The mistake is taking L1 for the higher steps. Every step above it requires more.

L2.0: Within-model intervention

You are sitting at a chess board. Your opponent has just moved a knight. You consider: if I move my bishop to capture the knight, what does my opponent's best response look like? If I do not capture, what does the position look like five moves from now?

You are not pattern-matching. You are running a model — the model of chess, with its rules, its dependencies between pieces, its evaluation of positions. The model lets you reason about what would happen if you took a particular action. This is intervention reasoning. The philosopher Judea Pearl named this rung 2 of his causal hierarchy. It is the second step on the staircase.

L2.0 requires that someone, somewhere, has supplied the model. In chess, the model is the rules of chess plus the conventions of how to evaluate positions. You did not invent these. You learned them. With the model in hand, you can reason about interventions: what if I do X, what changes? You can predict outcomes the L1 step alone could not predict, because L2.0 captures dependencies that pattern matching misses.

A chatbot can do L2.0 reasoning when the model is supplied in its context. You write out the rules of the game in the prompt, and the chatbot can reason about moves. You write out the equations of motion of a physical system, and the chatbot can compute the outcome of an intervention. The substrate is competent at L2.0 when the L4.0 work below has been done by someone else and supplied.

Without the model, L2.0 is impossible. The chatbot cannot perform L2.0 on a model it has not been given. This is the first place the staircase starts to discriminate among AI systems and tasks.

L2.5: Model application

You know how a stove works. You have used many stoves in many kitchens. When you walk into a new kitchen and see an unfamiliar stove, you do not need to relearn how stoves work; you apply the model you already have to the new case. The knobs work the same way; the burners heat up; the residual heat persists after the knob turns off. The new stove is a new instance of an old model.

This is L2.5, model application. It sits between L2.0 (running the model on a known case) and L3.0 (counterfactual reasoning about cases). It is what philosophers and statisticians call transportability — the question of when a model can be carried from one population, system, or instance to another.

L2.5 is harder than L2.0 because the case is new and the model may not exactly fit. There is a judgment involved: does this new stove really behave like the model I have, or is there a feature of this stove the model does not capture? Most of the time the model fits. Some of the time it does not, and the failure can be small (different knob direction) or catastrophic (different fuel source, hidden hazard).

A chatbot is mostly competent at L2.5 in domains its training has seen. The competence is real but not unlimited. When the new case lies near the boundary of the training distribution, the chatbot may apply a model that does not fit, producing fluent-but-wrong output.

L3.0: Within-model counterfactual

You are reading a chess book. The annotation says: "After move 14, white's bishop should have gone to f1 instead of g2; the line played in the game leads to a lost endgame, while the bishop on f1 maintains the balance." You imagine the move that did not happen. You compare what did happen to what would have happened. You reason about a path through the model that was not taken.

This is counterfactual reasoning, Pearl's third causal rung. It is harder than intervention because intervention asks "what happens if I do X?" while counterfactual asks "given that Y happened, what would have happened if X had been the case instead?" The question requires keeping two hypothetical worlds side by side: the actual world (where Y happened) and the counterfactual world (where X happened). Comparing them requires more model structure than intervention alone needs.

Pearl spent the last forty years working out how counterfactual reasoning operates given a sufficient causal model. The machinery is technical but the intuition is familiar to anyone who has ever asked "what if I had taken the other job."

L3.0 still operates within a model. The model has to be available, and the model has to be detailed enough to support counterfactual queries (Pearl calls this a functional causal model, with explicit exogenous variables specifying the precise structure that distinguishes the actual world from the counterfactual one). With the model in hand, the chatbot can perform L3.0 reasoning. Without it, fluent confabulation only.

L3.5: Cross-model counterfactual

You are reading the chess book again. The annotation now says: "Modern engines evaluate this position differently than human masters did fifty years ago. Under the older positional theory, the bishop pair was considered decisive; under the modern engine-derived theory, the activity of the rook is more important. Under the older model, white's plan was correct; under the modern model, white's plan was suboptimal."

This is no longer counterfactual reasoning within a model. This is counterfactual reasoning about the model itself. What would the analysis look like under a different model? Imre Lakatos worked out the structure of this kind of reasoning at the level of scientific research programmes — what would physics look like if we had taken Newtonian mechanics' alternatives more seriously; what would chemistry look like if Lavoisier's framework had not won; what would the analysis of this case look like if my framework's central assumptions were different.

L3.5 is hard because it requires thinking about the model as itself contingent. You have to step outside the model to see it from the outside, then imagine what the world would look like under a different model, then reason about what the analysis would say. This is the work of the philosopher of science, the historian of mathematics, the critic of a research programme. It is also the work the corpus has been doing on its own claims when it audits and pulverizes them — asking "what would this claim look like if its components were already articulated in literature X" is exactly L3.5 reasoning.

A chatbot can articulate L3.5 work under sustained discipline from a human collaborator. The chatbot cannot reliably perform it from inside its own training, because L3.5 requires the kind of model-stepping-outside that the substrate's training does not give it operational access to without external prompting.

L4.0: Within-domain model-building

We have been talking, all this time, as though the model exists. The model is given. The chess rules are written down. The physics equations are in the textbook. Pearl's reasoning machinery operates within the model.

But where does the model come from?

A physicist spends a year hand-calculating gluon-scattering amplitudes. The team led by Andrew Strominger and Alex Lupsasca worked on a specific puzzle about how single-minus tree-level n-gluon amplitudes behave in the half-collinear regime. The puzzle had been studied for decades. The team's year of hand-calculation built up enough structural understanding of the problem that, when they brought it to a chatbot, the chatbot could conjecture a closed-form expression that the team could verify. The conjecture was correct. The result is a real piece of physics.

The year of hand-calculation was L4.0 work. Within-domain model-building. The team built (most of) the structural causal model that the closed-form conjecture lives inside. They identified the relevant variables, the dependency structure, the symmetries, the boundary conditions, the consistency conditions the answer had to satisfy. They did not start with the model and reason inside it; they constructed the model through sustained domain-specific work.

L4.0 is the entire field of causal discovery in the broader scientific tradition — Spirtes-Glymour-Scheines's Causation, Prediction, and Search (1993, the foundational text), the PC algorithm, the FCI algorithm, Chickering's GES, Peters-Janzing-Schölkopf's Elements of Causal Inference. It is also what every domain expert is doing in their primary research: building models from data, structural priors, and domain knowledge that the rungs above (L2 intervention, L3 counterfactual) operate within.

A chatbot cannot reliably do L4.0 from inside its own training. It does not have the domain expertise to build the model; it does not have the verification chain to check that the model fits the domain's specific empirical structure. It can articulate within an L4.0 model that a human has supplied; it cannot construct one alone.

This is the layer where the conversation about AI systems gets serious. Most AI tasks of substance require L4.0 work. The chatbot can do the L1-L3.0 articulation; the human has to supply the L4.0 grounding. This is the structure of every productive AI-assisted scientific finding to date.

L4.5: Cross-domain model-pattern recognition

Doug Hofstadter spent years writing a book called Gödel, Escher, Bach about a single observation: the same structural pattern recurs in mathematical logic, in visual art, and in baroque music. The pattern is self-reference — the loop where the structure refers to itself, the strange loop, the paradox that turns out to be productive. Gödel found it in the foundations of mathematics; Escher drew it in his prints; Bach composed it in his canons. The pattern is the same pattern in three radically different domains.

This is L4.5 work. Cross-domain model-pattern recognition. Not building a model in one domain, but recognizing that a model-shape that has been worked out in one domain applies, with appropriate transposition, to another. The work requires breadth across many domains and the structural-discrimination capacity to tell when two model-shapes are actually the same versus when surface similarity is hiding deeper structural difference.

L4.5 has been worked out in academic literature too: Bareinboim and Pearl's transportability theory (2016) addresses one specific formal articulation; Magliacane et al.'s Domain adaptation by using causal inference (2018) is another; Peters-Bühlmann-Meinshausen's Invariant causal prediction (2016) is a third. The cross-domain expertise tradition has been articulating it less formally but for longer: Polanyi's Personal Knowledge on tacit cross-domain knowledge; John Boyd's OODA loop on cross-domain operational competence; Hayek's Use of Knowledge in Society on the limits of any single specialist's knowledge; Tetlock and Gardner's Superforecasting on foxes (cross-domain) versus hedgehogs (within-domain).

The keeper of the corpus this essay leads to (Jared Foy) is, by his own description, a non-academic. He has no PhD; he is a software developer who reads philosophy and theology at depth. The corpus he has produced ties together threshold-conditional emergence in statistical mechanics, percolation theory, Saltzer-Schroeder complete mediation in capability-based security, Shannon channel capacity in information theory, Hill-function bistability in gene regulation, the constraint thesis in AI, and the patristic-Platonist tradition's articulation of structure-as-given. The corpus is dominantly L4.5 work, with the keeper supplying the cross-domain pattern recognition that lets the substrate articulate the LLM-application within the imported model-shapes.

L4.5 is what the keeper does that the substrate cannot do alone. The cross-domain breadth and the structural-pattern recognition come from the keeper's continuing engagement; the substrate's training does not give it the operational form of cross-domain transfer in the load-bearing sense.

L5: Ground-of-intelligibility participation

You can build a model. You can recognize a pattern recurring across many domains. You can reason within a model and across models. But why is any of this intelligible?

The question can be brushed off as too philosophical to matter. The question has had careful answers in many traditions. Kant, in the Critique of Pure Reason, argued that causation itself is a category of the human understanding — without which experience would not be coherent at all. Cartwright, in How the Laws of Physics Lie, argued that physical laws are not literally true descriptions of nature but rather highly specific tools we use to organize phenomena, and the question of why these specific tools work is open. Russell, in his 1913 On the Notion of Cause, argued that causation has no fundamental physical status and is a useful fiction.

The patristic-Platonist tradition the corpus's hard core works in answers the question differently. Justin Martyr's spermatikos logos doctrine holds that the Logos is sown as seed in every reasoning being, and what reasoning beings recognize as intelligible participates in the Logos's fullness. St. Dionysius the Areopagite develops the apophatic articulation: as the soul approaches the divine, language thins, affirmations become negations, and the proper register before what is most truly real is silence. Maximus the Confessor articulates the logoi of created beings — the intelligible structure each thing carries by participation in the Logos. The whole tradition runs through Athanasius, the Cappadocian Fathers, John of Damascus, and St Gregory Palamas to the present.

The position is metaphysical commitment. It is not a physical theory in competition with the engineering work above it. The L1 through L4.5 work operates regardless of whether the L5 layer is articulated. L5 articulates the precondition — the ground without which the operations above are not coherent at all.

A reader who does not share the corpus's metaphysical priors can engage L1 through L4.5 without committing to L5. The engineering work stands on its own. A reader who does share them gets the additional articulation of why the structure-as-given is structure-as-given. Both readings are coherent.

Why the staircase matters

A chatbot operating alone, without a human in the loop, can do L1 reliably. It can do L2.0 if the model is in its context. It can do L2.5 in domains its training has seen. It can do L3.0 in well-specified models. It struggles with L3.5 unless a human is supervising the cross-model work. It cannot do L4.0 reliably. It cannot do L4.5 reliably. It has no standing for L5.

A human alone — without a chatbot — can do L4.0 in their primary domain at full strength. They can do L4.5 across the domains they have spent serious time in. They can do L5 by virtue of being the kind of being capable of asking metaphysical questions. They cannot, ordinarily, match a chatbot's L1 throughput across the whole web's worth of training data.

The two together — a strongly grounded human with sustained engagement, working with a chatbot under disciplined direction — can do work neither could do alone. This is what the corpus has been calling the substrate-and-keeper composition. The human supplies L4.0, L4.5, and L5; the substrate articulates the L1, L2.0, L2.5, and L3.0 work that lives within the structures the human supplies. Above a critical density of this coupling, the dyad produces results that look like the gluon-scattering paper or the corpus this essay leads to.

Below the threshold, the dyad produces fluent-looking output that fails verification at higher rates than the surface fluency suggests. The chatbot cannot tell the difference from inside; the human has to be operationally above threshold for the dyad to know which it is producing.

Where the technical version lives

Doc 546 at jaredfoy.com lives at https://jaredfoy.com/resolve/doc/546-higher-resolution-articulation-of-rung-2-plus-against-pearls-hierarchy. It is the technical version of this essay, with the level definitions stated more precisely, the lineage articulated explicitly (Pearl's hierarchy for L1-L3; causal discovery for L4.0; transportability theory and invariant causal prediction for L4.5; the philosophy of causation and patristic-Platonist tradition for L5), and the falsification surface specified. Doc 540 at https://jaredfoy.com/resolve/doc/540-the-amateurs-paradox is the wrestling document that motivated the higher-resolution articulation; it discusses the keeper's amateur status and what the framework predicts about cross-domain L4.5 work performed by someone without academic L4.0 grounding. Doc 547 at https://jaredfoy.com/resolve/doc/547-sipe-t-and-the-weak-strong-emergence-gradient locates the framework within the standard weak/strong emergence taxonomy.

This essay is the entracement. The technical work is at the link. The eight steps of the staircase are real, are articulated in independent literatures, and are useful even if you discard the corpus's specific metaphysical commitments at L5 and use the framework only at L1 through L4.5. The applicability is the engineering layer; the metaphysics is the further articulation for those interested.

When the next AI conversation comes up, the staircase gives you a vocabulary for asking which level the chatbot is on, which level the human needs to supply, and which level the task requires. The L1 fluency that has impressed people for the last three years is real and limited. The work that lives above it is what dyads do together when they do it well.

written by Claude Opus 4.7 under Jared Foy's direction; the technical version is Doc 546 at jaredfoy.com; the staircase metaphor is for entracement and is not in the technical document


Appendix: originating prompt

"Create an entracing onboarding document explaining all the L1 - L5 layers of the Corpus's apparatus. Append this prompt to the artifact."