From Productive to Predictive Recursion
A Hypothesis on the Neural Basis of Syntax
Epistemic status: shower thought.
Abstract
The dominant generative frameworks in linguistics treat recursion as a productive operation—a formal property of language allowing finite elements to generate an unbounded set of sentences. Predictive-processing models in neuroscience, by contrast, view cognition as hierarchical inference minimizing prediction error. This paper proposes a synthesis: human syntactic ability may depend not solely on recursive generation but on recursive prediction—the capacity to infer and forecast self-embedded structure across representational levels. We term this capacity the Faculty of Recursive Prediction (FRP) and argue that it is a plausible neurocognitive substrate for the Faculty of Language in the Narrow Sense (FLN) as characterized by Hauser, Chomsky, and Fitch (2002). FRP provides a framework for unifying formal and neurocomputational accounts of language and yields falsifiable predictions about development, cross-species performance, and neural localization.
1. Introduction
Human language exhibits combinatorial productivity unmatched in other species.
Within the Minimalist Program (Chomsky 1995, 2023), this capacity is ascribed to Merge, a recursive binary operation generating hierarchical syntactic structures from a finite lexicon. The simplicity of Merge has theoretical appeal but poses two unresolved questions: (i) how such an operation could emerge within an evolutionary timescale, and (ii) how it maps onto known neural computation.
Parallel developments in cognitive neuroscience have reframed perception and action as hierarchical prediction (Friston 2023; Clark 2013). In this view, the brain maintains multilevel generative models whose outputs are continually tested against incoming data, minimizing free-energy or prediction error. Language comprehension and production, on this account, involve constant anticipatory inference (Kuperberg & Jaeger 2016; Gastaldon et al. 2024).
The present hypothesis links these traditions: recursion in language may be a specific manifestation of a broader predictive mechanism operating recursively. Syntax is therefore not only produced recursively but also predicted recursively.
2. Conceptual Distinction: Productive vs. Predictive Recursion
Recursion in mathematics denotes a rule that refers to its own output: a finite set of base cases and a successor function that can reapply indefinitely. In linguistic theory, recursion describes the self-embedding of phrases or clauses. Both are formal descriptions of structure, not necessarily of computation.
We distinguish:
Productive recursion: generation of hierarchical structures by applying a self-referential combinatorial rule (e.g., Merge).
Predictive recursion: inference of hierarchical generative rules and projection of further structure, given partial input.
The latter is an inferential process measurable in behavior and potentially instantiated by predictive-coding networks. Under this framing, humans’ unique syntactic competence arises from a capacity to anticipate, rather than merely to generate, recursive structure.
3. Empirical Motivation
3.1 Cross-species limits
Comparative studies indicate that recursion-like pattern generation occurs widely, but recursive prediction does not. Rhesus monkeys detect rhythmic grouping but fail to entrain to predictive beat timing (Honing et al. 2012; Merchant et al. 2014). Songbirds can produce center-embedded sequences yet discriminate them only shallowly (Van Heijningen et al. 2009). Motor and grooming sequences in rodents exhibit recursive describability without evidence of inferential modeling (Berridge 1990; Berger-Tal 2015; Bruce 2016). These data suggest that generative recursion is phylogenetically widespread, but predictive recursion may not be.
3.2 Developmental evidence
Human infants infer structural regularities that generalize beyond training depth: they acquire the successor function in counting (Feigenson et al. 2004) and organize memories hierarchically (Rosenberg 2013) before full syntactic competence. Emergent sign languages display rapid appearance of hierarchical syntax once communicative exposure stabilizes (Kocab et al. 2016; de Vos & Pfau 2015). Such findings imply a prelinguistic predictive bias for hierarchical generalization.
3.3 Neural evidence
Fronto-temporal circuits implicated in syntactic parsing overlap with networks involved in prediction error minimization and theory-of-mind inference (Clark 2013; Friston 2023). Predictive-coding accounts of language processing already model sentence comprehension as hierarchical forecasting (Kuperberg & Jaeger 2016). The FRP hypothesis formalizes this alignment and extends it to evolutionary scope.
4. The Faculty of Recursive Prediction (FRP)
4.1 Definition
FRP is the capacity to infer and predict outcomes generated by self-referential rules, with performance invariant to the number of recursive iterations up to working-memory limits. It is thus an application of hierarchical predictive coding to recursively structured domains.
4.2 Functional characterization
Under predictive processing, generative models are arranged hierarchically: higher levels predict the state transitions of lower levels. FRP postulates an additional property—self-embedding inference: the ability of one level to model transformations of its own generative rule. This enables dynamic prediction of structures nested within structures, as required by natural language syntax.
4.3 Evolutionary interpretation
If predictive coding is ancestral, FRP may represent its extension to self-referential depth, rather than a novel operation. This perspective reconciles gradual evolution with the apparent discreteness of linguistic recursion. Evolutionary antecedents may include displaced reference and social prediction capacities (Hrdy 2009; Kenkel et al. 2016; Bickerton 2009). FRP could have been exapted from these domains when communicative sequences became sufficiently structured to reward hierarchical prediction.
5. Testable Predictions
5.1 Behavioral signature
Tasks can be designed to compare recursive production and prediction directly.
Participants (human and nonhuman) could be trained on sequences generated by recursive grammars (e.g., AB A′B′ nesting) and tested on deeper or novel embeddings. FRP predicts that humans will generalize rule depth with near-constant accuracy, whereas nonhuman performance will collapse beyond shallow levels.
A parallel paradigm in rhythmic entrainment could test predictive alignment across metrical hierarchies (Merchant et al. 2015; Patel 2014).
5.2 Developmental trajectory
If FRP precedes syntax acquisition, children should exhibit recursive prediction in non-linguistic domains (pattern completion, hierarchical rhythm) before mastering recursive syntax. If FRP depends on syntactic training, the opposite developmental order should appear. Both outcomes constrain the model.
5.3 Neural localization
Under the FRP hypothesis, recursive prediction tasks should recruit left inferior frontal gyrus and temporal-parietal junction—regions implicated in syntactic processing and social prediction. Lesions to these regions (specific aphasias) should impair recursive prediction independently of general sequence learning.
5.4 Computational modeling
Hierarchical predictive-coding networks can be implemented with self-embedding generative layers. By varying recursion depth, one can test whether error-minimization dynamics reproduce human-like generalization curves. FRP predicts stability of error minimization across levels once hierarchical priors are established.
6. Relation to Existing Frameworks
6.1 Minimalism and Merge
Minimalism posits a single generative operation. FRP is compatible with this view if Merge describes the formal product of recursive inference rather than its mechanism. FRP thus grounds Merge in neurocomputational terms: syntactic structure corresponds to predictions over self-embedded generative models. This reframes the Minimalist “Strong Thesis” as a claim about neural architecture rather than symbolic derivation.
6.2 Predictive-processing models of language
Current PP models treat sentence comprehension as multilevel prediction without specifying the recursion depth or self-embedding limits of those predictions. FRP provides a criterion: recursive depth is the dimension along which human prediction exceeds that of other species or artificial systems.
Linking syntactic depth to hierarchical model depth yields empirically measurable parameters (e.g., error-decay slope across embeddings).
7. Broader Cognitive Implications
If FRP underlies language, it likely contributes to other capacities involving hierarchical foresight—tool construction, planning, and social inference. Pain (2023) notes overlap between predictive-processing networks in toolmaking and language; FRP provides a mechanistic commonality. This approach dissolves the strict language–nonlanguage boundary, situating syntax within a general architecture for deep predictive modeling.
8. Limitations and Future Work
The FRP proposal remains a hypothesis. Demonstrating dissociation between productive and predictive recursion is methodologically demanding. Recursive prediction must be isolated from working-memory and associative-learning confounds. Moreover, comparative studies must ensure ecological validity across species.
Nevertheless, the framework suggests concrete research programs:
systematic depth-generalization tasks in humans and nonhumans;
longitudinal infant studies to map FRP onset;
lesion and neuroimaging analyses linking syntax and recursive prediction;
computational simulations of hierarchical inference at varying recursion depth.
Evidence converging across these levels would substantiate or refute the FRP construct.
9. Conclusion
Recursion has long been treated as the defining operation of language. This paper proposes reframing it as a form of hierarchical prediction. The Faculty of Recursive Prediction situates syntactic recursion within a general neurocomputational mechanism of predictive coding. This alignment preserves the formal achievements of generative linguistics while grounding them in empirically tractable brain dynamics.
If correct, FRP renders the evolution of language less mysterious: the same predictive architecture that lets organisms anticipate the next sensory state may, in humans, have learned to anticipate the next clause.
References
Berridge K. (1990). Behaviour 113, 21–56.
Berger-Tal O. (2015). Ecosphere 6(9).
Bickerton D. (2009). Adam’s Tongue. Hill & Wang.
Bruce R. (2016). CEUR Workshop Proc. 272–282.
Carstairs-McCarthy A. (2000). In Evolutionary Emergence of Language. Cambridge UP.
Chomsky N. (1995). Bare Phrase Structure. Basil Blackwell.
Chomsky N. (2023). The Strong Minimalist Thesis. Cambridge UP.
Clark A. (2013). Behav. Brain Sci. 36, 181–253.
Feigenson L. et al. (2004). Trends Cogn. Sci. 8(7), 307–314.
Fitch W. (2014). Phys. Life Rev. 11(3), 329–364.
Friston K. (2023). Neurosci. Biobehav. Rev. 149, 105164.
Gastaldon S. et al. (2024). Cognition 245, 105686.
Hrdy S. (2009). Mothers and Others. Belknap Press.
Honing H. et al. (2012). PLoS ONE 7(12): e51369.
Kenkel W. et al. (2016). Front. Neuroendocrinol. 40, 52–66.
Kuperberg G. & Jaeger T. (2016). Trends Cogn. Sci. 20(7), 484–495.
Kocab A. et al. (2016). PNAS 113(33), 9156–9161.
Merchant H. et al. (2014). Front. Neurosci. 7: 274.
Pain R. (2023). Behav. Brain Sci. 46, e281.
Patel A. (2014). Ann. N.Y. Acad. Sci. 1337, 178–185.
Rosenberg R. (2013). Dev. Sci. 16(4), 610–621.
Van Heijningen C. et al. (2009). PNAS 106(48), 20538–20543.

