Abstract
In his pioneering work in the field of inductive inference, Gold (Inf Control 10:447–474, 1967) proved that a set containing all finite languages and at least one infinite language over the same fixed alphabet is not identifiable in the limit (learnable in the exact sense) from complete texts. Gold’s work paved the way for computational learning theories of language and has implications for two linguistically relevant classes in the Chomsky hierarchy (cf. Chomsky in Inf Control 2:137–167, 1959, Chomsky in Knowledge of language: its nature, origin, and use, Praeger, 1986; Shieber in Linguist Philos 8:333–343, 1985; Heinz in Linguist Inq 41:623–661, 2010). Within that same framework, Angluin (Inf Control 45:117–135, 1980) provided a complete characterization for the learnability of language families. Mathematically, the concept of identification in the limit from that classical setting can be seen as the use of a particular type of metric for learning in the limit (Wharton in Inf Control 26:236–255, 1974). In this research note, I use Niyogi’s extended version of a theorem by Blum and Blum (Inf Control 28:125–155, 1975) on the existence of locking data sets to prove a necessary condition for learnability in the limit of any family of languages with the Gold property in any given metric. This recovers the most general version of Gold’s Theorem as a particular case. Moreover, when the language family is further assumed to contain all finite languages, the same condition also becomes sufficient for learnability in the limit. Finally, we discuss questions that are left open and outline a research program regarding language learnability in the limit from other input types and cognitive feasibility.