Urheimat (/ˈʊərhmɑːt/; German pronunciation: [ˈʔuːɐ̯ˌhaɪmaːt]; a German compound of Ur- "primitive, original" and Heimat "home, homeland") is a linguistic term that denotes the homeland of the speakers of a proto-language. A proto-language is a reconstruction of a hypothetical parent language in the Tree model of language evolution. As the placement of branches is often uncertain, the time, location, and very existence of an urheimat is also often uncertain. However, it is possible to have considerable confidence regarding the location of an urheimat of a language or language family from multiple lines of linguistic, genetic and archaeological evidence, even when the precise contours of a proto-language are not firmly established.

Archaeological evidence is sometimes adduced to support the existence of an urheimat. In the 19th century and the first half of the 20th century, the prevailing belief was that languages could be reliably associated with archaeological cultures. This culture history theory, developed by Gustaf Kossinna, formalized the presumption that unified ethnicities, such as peoples or tribes, could be associated with archaeological cultures. One might point to a culture map and hazard a guess as to which language, typically a proto-language, was spoken in each culture.

In the latter part of the twentieth century, the link between archaeological cultures and language boundaries was weakened by the discovery of cases in which language shifts occurred with only minor differences in cultural artifacts. This article summarizes some of the leading, and sometimes competing, urheimat proposals for some of the larger or more carefully studied language families.


Language families predominantly found in Europe, North Asia and South Asia

Indo-European homelands

Proto-Indo-European homeland

Early efforts to identify the homeland of the Proto-Indo-European language speakers focused on the presence or absence of geographical indicator words. For example, such words as beech and salmon indicated a location within the range of those genera in the north temperate zone. The word for "ocean" was missing, suggesting an inland location. Words that did not fit this geographical location, such as lion, could be explained by more recent borrowings.

Many hypotheses for an Urheimat have been proposed. Mallory said,[1] "One does not ask 'where is the Indo-European homeland?' but rather 'where do they put it now?'" He also states that current discussion of the Indo-European homeland problem is largely confined to four basic models, with variations:[2]

  1. The Baltic-Pontic(-Caspian) region in the Mesolithic. The Funnel beaker culture, the Globular Amphora culture, and the Corded Ware culture are possible archaeological representatives of the proto-language speakers, in this theory as it is commonly expressed.
  2. Anatolia: Early Neolithic, 7000-6000 BC. Not only is there no supporting archaeology, but archaeology and word archaeology are to the contrary.
  3. Central Europe-Balkans: Early Neolithic, c. 5000 BC. At least part of the Linear Pottery Culture is within the range.
  4. Pontic-Caspian: Eneolithic, c. 4500-3000 BC. Typically the collection of similar cultures called the Kurgan culture are presented as supporting the reconstructed Indo-European customs.

Other, less accepted models select the Indian subcontinent:

  1. Indian Urheimat Theory
  2. Indigenous Aryans Theory

Some minor hypotheses are:

  1. The Armenian hypothesis was suggested by Soviet scholars in the 1980s
  2. the Paleolithic Continuity Theory was suggested by Italian "paleolinguist" Mario Alinei in the 1990s.

Earlier Indo-European phylogenies featured an initial split into Centum and Satem languages, a distinction formally based on the word for the number one hundred in each group's supposed proto-language. Today, one phonetic character is hardly enough to define a proto-language. Furthermore, languages studied better or discovered subsequently (including Armenian, the extinct Anatolian languages such as Hittite and the extinct Tocharian language of the Tarim basin of Asia) were not compatible with any such genetic distinction. Instead, the former shared innovation became the Centum Satem isogloss, which did not have to conform to language boundaries or represent any major change of language. It produced dialects instead.

Proto-Anatolian homeland

Proto-Anatolian was the parent language of the Anatolian languages, which are attested only by inscriptions found in Anatolia and a few exports. It is the only group to feature an explicit remnant of the laryngeals, sounds that disappeared in late Proto-Indo-European. It is therefore identified as the first branch, chronologically, which means that the ancestral Proto-Anatolians were first to become isolated from the Indo-European speech community.

Of the two ways separation could have occurred, the model of an entry into Anatolia from the north prevails. Indo-European culture featured horses. They were at first hunted and then domesticated on the plains of Asia, not in Anatolia. The other alternative, that all the other Indo-Europeans left Anatolia, leaving a population behind, does not account for the presence of a Hattic interface in Anatolian, but in none of the others.

That same Hattic interface suggests that Anatolia was not entirely the place where Proto-Anatolian formed, but rather the latter encountered the substrate on entering Anatolia and adjusted itself accordingly. The concept of Indo-Hittite fits a Proto-Anatolian outside of Anatolia, but it was used primarily to refer to an early stage of Proto-Indo-European, before the first separation. Anthony therefore narrows the meaning of Proto-Anatolian to "the language that was immediately ancestral to the three known daughter languages that entered Anatolia as Pre-Anatolian."[3] He defines the language phases between Proto-Indo-European and Proto-Anatolian as Pre-Anatolian.

Proto-Tocharian homeland

Proto-Italo-Celtic homeland

A likely candidate for the homeland of an Italo-Celtic proto-language or dialect continuum is the Urnfield culture and its predecessor, the Tumulus culture of Central Europe (1600 BC).

Proto-Italic homeland

Candidates for the first introduction of Proto-Italic speakers to Italy are the Terramare culture (1500 BC) or the Villanovan culture (1100 BC), although the latter is now usually identified with the non-Italic (indeed, non-Indo-European) Etruscan civilisation.

The Romance languages are all derivative of Latin, a member of this Indo-European language subfamily, which was the common language of the Western Roman Empire that had its roots in Italic dialect spoken in and around the capital, Rome, until the empire collapsed in the 5th century CE.

Proto-Celtic homeland

The Proto-Celtic homeland is usually located in the Early Iron Age Hallstatt culture of northern Austria. There is a broad consensus that the center of the La Tène culture lay on the northwest edges of the Hallstatt culture. Pre-La Tène (6th to 5th century BC) Celtic expansions reached Great Britain and Ireland (Insular Celtic) and Gaul. La Tène groups expanded in the 4th century BC to Iberia, the Po Valley, the Balkans, and even as far as Galatia in Asia Minor, in the course of several major migrations.

Albanian homeland

The history of the Daco-Thracian/Thraco-Illyrian dialects of the Balkans is obscure, in part, because the written record of these languages is fragmentary. One of these languages may have been the language that evolved into the modern Albanian language.

Proto-Germanic homeland

Pre-Germanic cultures were the bearers of the Nordic Bronze Age. Proto-Germanic proper is hypothesized by some to have developed in the Jastorf culture of the Pre-Roman Iron Age.[4]

Proto-Greek homeland

The Phrygian, Macedonian, and Greek proto-languages likely also originate in the Balkans.

Armenian homeland

Proto-Armenian may also be Balkans (Greco-Phrygian) derived, or at least strongly influenced by a Phrygian substrate. The Phrygian influence on [pre-]Proto-Armenian would date to about the 7th century BC, in the context of the declining kingdom of Urartu.

Proto-Balto-Slavic homeland

The Balto-Slavic homeland largely corresponds to the historical distribution of Baltic and Slavic.

Proto-Baltic homeland

Proto-Baltic likely emerging in the eastern parts of the Corded Ware horizon.

Proto-Slavic homeland

The Slavic languages experience a major expansion starting around the 6th century CE, in some cases supplanting earlier Indo-European languages in the region to which they expanded.

The Slavic homeland likely corresponds to the distribution of the oldest recognisably Slavic hydronyms, found in northern and western Ukraine and southern Belarus.

Proto-Indo-Iranians homeland

The Proto-Indo-Iranians are widely identified with the bearers of the Andronovo horizon of the late 3rd and early 2nd millennia BC, with the various languages of the Indo-Iranian language family starting to differentiate from Proto-Indo-Iranian around 2000 BCE.

There are three language families within the Indo-Iranian language family that derived from the Proto-Indo-Iranian language: the Indo-Aryan languages, such as Hindi, Urdu, Bengali, and other Indo-European languages of South Asia; the Iranian languages, e.g. Persian, Kurdish and Pashto of West Asia and Central Asia; and the Nuristani languages spoken in eastern Afghanistan.

The Indo-Aryan languages are all descendants of the Sanskrit language, which it at least as old as 1500 BCE, where Indo-Aryan linguistic features were historically attested by the Hittites in the Mittani language of Western Iran, and was a single Old Aryan language as recently as the 4th century BCE, when it was standardized in written form. Some scholars associate the Cemetery H culture of the Northern Indus River Valley (specifically Western Punjab) ca. 1900 BCE with the original Indo-Aryan population of South Asia. The community that originally spoke the Sanskrit language is also called the Vedic civiliation after their semi-legendary account of their community found in Hindu scriptures called the Vedas during the Vedic period from ca. 1700 BCE to ca. 320 BCE. The archaeological cultures in South Asia described as Black and Red Ware (10th century BCE) and the later Painted Gray Ware (starting ca. 900 BCE) and subsequently the Northern Black Polished Ware (ca. 500 BCE) are all commonly associated with the Sanskrit language speaking Indo-Aryans during the Vedic period.

The Iranian languages split into Eastern and Western branches in what are known as the Middle Iranian languages around the 4th century BCE. The Iranian Avestan language of Zoroastrian scripture is committed to writing at about this point but was in existence and historically attested long before a script was devised for it. The Median language was the language of the Median empire of western and central Iran (ca. 700–559 BC). The language of the Scythian people of Central Asia, whose interactions with the Greeks in 512 BCE were attested by Herodotus ca. 440 BCE, was also an Iranian language.

There is some dispute over whether the Dardic languages (spoken in northern Pakistan, eastern Afghanistan, and the Indian region of Jammu and Kashmir, most prominently the Kashmiri language) are Indo-Aryan, Iranian or part of the Nurustani languages. This issue of classification is clouded by the nationalistic implications of such a classification for the political affiliations of the contested Kashmir region of South Asia and by the fact that the Dardic languages are spoken in an area that borders the region where each of the other Indo-Iranian language families is spoken.

Dravidian homeland

The Dravidian languages have been found mainly in South India since at least the second century BCE (inscriptions, ed. I. Mahadevan 2003). It is, however, a widely held hypothesis that Dravidian speakers may have been more widespread throughout India, including the northwest region,[5] before the arrival of Indo-European speakers. A map showing where Dravidian languages are spoken today appears to the right.

Historical records suggest that the South Dravidian language group had separated from a Proto-Dravidian language no later than 700 BCE, linguistic evidence suggests that they probably became distinctive around 1,100 BCE,[6] and some scholars using linguistic methods put the deepest divisions in the language group at roughly 3,000 BCE. Russian linguist M.S. Andronov puts the split between Tamil (a written Southern Dravidian language) and Telugu (a written Central Dravidian language) between 1,500 BCE and 1,000 BCE.[7]

Southworth identifies late Proto-Dravidian with the Southern Neolithic culture in the lower Godavari River basin of South Central India, which first appeared ca. 2,500 BCE, based upon its agricultural vocabulary, while noting that this "would not preclude the possibility that speakers of an earlier stage of Dravidian entered the subcontinent from western or central Asia, as has often been suggested."[8]

Speculations regarding the original homeland have centered on the Indus Valley Civilization or on Elam (whose Elamite language was spoken in the hills to the east of the ancient Sumerian civilization with whom the Indus Valley Civilization traded and shared domesticated species) in an Elamo-Dravidian hypothesis, but results have not been convincing. The possibility that the language family is indigenous to the Dravidian area and is a truly isolated genetic unit has also not been ruled out.

Prof. Asko Parpola (University of Helsinki), the Jesuit priest Father Heras in the 1930s and other scholars (such as Indian and early Tamil expert Iravatham Mahadevan and Prof. Walter A. Fairservis Jr.) conclude that the Indus sign system represented an ancient Dravidian language, a view that they assume is supported by Tamil artifacts discovered in 2006.[9] Thus, in Parpola's view, the urheimat of Dravidian would be in the Indus River Valley. However, Harvard Indologist Michael Witzel takes the view—that has received serious academic consideration (ca. 2004)—which is critical of an Indus Valley Civilization Dravidian homeland and of the widely held view that the inscriptions of the Indus Valley Civilization even constitute a written language.[10] In the essay "Substrate Languages in Old Indo-Aryan" (with RV in this context referring to Rigvedic, i.e. Indo-Aryan), Witzel says "As we can no longer reckon with Dravidian influence on the early RV, this means that the language of the pre-Rigvedic Indus civilization, at least in the Panjab, was of (Para-) Austroasiatic nature." There are no written examples of Austroasiatic languages being spoken further west than Central India during the recent historical era (i.e., in the era for which we have written records).

Recent studies of the distribution of alleles on the Y chromosome,[11] microsatellite DNA,[12] and mitochondrial DNA[13] in India have cast doubt for a biological Dravidian "race" distinct from non-Dravidians in the Indian subcontinent;[14] other recent genetic studies have found evidence of Aryan, Dravidian and pre-Dravidian (original Asian) strata in South Asian populations.[15] Geneticist Luigi Luca Cavalli-Sforza proposes that a Dravidian people were preceded in India by Austroasiatic people, and were present prior to the arrival of Indo-Aryan language speakers in India.[16]

Uralic homeland

The Uralic homeland is unknown. A possible locus is the Comb Ceramic Culture of ca 4200 – ca 2000 BC (shown on the map to the right). This is suggested by the high language diversity around the middle Volga River, where three highly distinct branches of the Uralic family, Mordvinic, Mari, and Permic, are located. Reconstructed plant and animal names (including spruce, Siberian pine, Siberian Fir, Siberian larch, brittle willow, elm, and hedgehog) are consistent with this location. This is adjacent to the proposed homeland for Proto-Indo-European under the Kurgan hypothesis.

French anthropologist Bernard Sergent, in La Genèse de l'Inde (1997),[17] argued that Finno-Ugric (Uralic) may have a genetic source or have borrowed significantly from proto-Dravidian or a predecessor language of West African origins. Some linguists see Uralic (Hungarian, Finnish) as having a linguistic relationship to both Altaic (Turkic, Mongol) language groups[18] (as in the outdated Ural-Altaic hypothesis) and Dravidian languages. The theory that the Dravidian languages display similarities with the Uralic language group, suggesting a prolonged period of contact in the past,[19] is popular amongst Dravidian linguists and has been supported by a number of scholars, including Robert Caldwell,[20] Thomas Burrow,[21] Kamil Zvelebil,[22] and Mikhail Andronov.[23] This theory has, however, been rejected by some specialists in Uralic languages,[24] and has in recent times also been criticised by other Dravidian linguists like Bhadriraju Krishnamurti.[25]

As noted below, many notable linguists have proposed that the Eskimo-Aleut languages and Uralic languages have a common origin, although there is no consensus that this connection is genuine.

Turkic homeland

There is considerable dispute over the time and place of origin of the Turkic languages, but it is undisputed that their origins are not in or near the countries named after the language group, Turkey, a.k.a. Anatolia, and Turkmenistan. The people of Anatolia spoke Indo-European language family languages from at least the time of the Hittite Empire (whose expansion to most of Anatolia started ca. 2000 BCE), which is the earliest evidence of Indo-European languages in the region attested historically (some non-Indo-European languages were spoken in at least some parts of Anatolia for some substantial periods of time prior to the Hittite empire) until the Persian Sassanid Empire collapsed in 651 CE.

The Turkic languages are now spoken in Turkey, Central Asia and Siberia. The Turkic peoples originated in "the Far East including North China, especially Xinjiang Province and Inner Mongolia with parts of Mongolia and Siberia possibly as far west as Lake Baikal and the Altai Mountains. They may have been among the peoples of the multi-ethnic historical Saka known as early as the Greek writer Herodotus. Certainly identified Turkic tribes were known by the 6th century and, by the 10th century, most of Central Asia, formerly dominated by Iranian peoples, was settled by Turkic tribes. The Seljuk Turks from the 11th century invaded Anatolia, ultimately resulting in permanent Turkic settlement there and the establishment of the nation of Turkey."

The first possibly Turkic peoples to arrive in Europe were the Huns, who were at war with the Roman Empire in the 4th century CE. Confusingly, the Hungarian language is not a Turkic language (it is a Uralic language related to languages like the Finnish language and Estonian language) and was not spoken by the Huns.

Prior to the Turkic migration, Indo-European languages were spoken in Anatolia and Central Asia as far as the Tarim Basin.

The inferred population genetic contributions of Turkic populations show a cline from a high point in the East to the a low point in the West.[26] In Turkey, the Turkic contribution to the local population genetic mix is about 6%.[27]

The origin of Turkic languages is disputed, both in connection with other language families and in time and place. The lack of written records prior to the earliest Chinese accounts, and the fact that the early Turkic peoples were nomadic pastoralists, and hence mobile, makes localizing and dating the earliest homeland of the Turkic language difficult.


The Yeniseian language family has been recently tied by linguist Edward Vajda to the Native American Na-Dene languages of North American (e.g. Navajo),[28] in a proposal named Dene-Yeniseian. Several well-known linguists have reviewed the hypothesis as favorable, although several linguists, such as Lyle Campbell, still reject it. This family of languages is sometimes described as Paleosiberian, a classification that rests on a belief that it represents a stratum of Siberian populations that preceded the speakers of the other modern languages of Siberia (mostly of the Indo-European and Altaic language families), possibly one that dates back to the Paleolithic era when North America was initially populated. However, Paleosiberian is usually considered a – negatively defined – collective term of convenience, not a genetic nor even areal grouping, similarly to Papuan. There is some evidence that the speakers of the Yeniseian languages (such as the Ket language, which is the only surviving member of the moribund language family) migrated to their current homeland along the Yenisei River in Central Siberia from an area south of the Altai Mountains in the general vicinity of Mongolia or Northwest China within the last 2500 years or so (although there is no evidence that the Yeniseian languages are linguistically related to the Altaic languages).[29][30][31] One sentence of the language of the Jie, a Xiongnu tribe who founded the Later Zhao state in Chinese history, appears consistent with being a Yeniseian language. Other linguists have suggested, with far less widespread acceptance in the linguistics community, that the Yeniseian languages have a genetic relationship to one or more of the Caucasian languages and the Sino-Tibetan languages (such as Chinese).[32][33]

Other groups

The only languages which are predominantly found in Europe, North Asia and South Asia and are not part of the language families above are the Basque language spoken in Northern Spain and Southwestern France, the three living language families of the Caucasus mountains (Northwest Caucasian, Northeast Caucasian and South Caucasian, with the first two sometimes proposed as members of a single North Caucasian language family), the Paleosiberian languages (the Yukaghir languages of Central Siberia (viewed by some linguists as a divergent branch of the Uralic languages),[34][35] and the Chukotko-Kamchatkan languages of Eastern Siberia, a grouping which sometimes includes the geographically adjacent Nivkh language, although it is sometimes treated as a language isolate, and Yenesian), and a few South Asian linguistic isolates, such as Burushaski, spoken mostly in isolated pockets of Northern Pakistan, and the two indigenous language families of the Andamanese people (Great Andamanese and Ongan), and perhaps Nihali (spoken in West Central India).[36] In each of these cases, the languages are spoken in an area that is geographically compact, were spoken in that area at the time that they were first attested historically, and there is no definitive evidence of an origin for the languages in question outside the area where they are spoken now.

Joseph Greenberg and Stephen Wurm have both noted lexical similarities between the Great Andamanese language and the West Papuan languages. Wurm noted that the lexical similarities "are quite striking and amount to virtual formal identity [...] in a number of instances." There is no agreement, even between these two linguists, on a narrative that gave rise to these similarities.

Michael Fortescue, a specialist in Eskimo–Aleut as well as in Chukotko-Kamchatkan, argues for a link between Uralic, Yukaghir, Chukotko-Kamchatkan, and Eskimo–Aleut in Language Relations Across Bering Strait (1998). He calls this proposed grouping Uralo-Siberian.

There have been determined efforts by multiple linguists from at least the 19th century to link these languages to other language families, particularly in the case of the Basque language, where numerous connections to language families living and dead have been proposed by linguists. Frequently, efforts to look for deeper linguistic origins of these languages will also attempt to integrate them into attested extinct languages of Europe, such as the Etruscan language of Northern Italy, the Ligurian language of Italy, the Lemnian language of the Aegean Island of Lemnos, the Minoan language aka Linear A of ancient Crete, the Sumerian language once spoken in Mesopotamia (which is the oldest attested written language), the language of the Indus River Valley civilization, the Elamite language of Iran, and the Hurrian language and Hattic language of Anatolia. None of these efforts has achieved wide support among linguists, although some have been viewed as sufficiently credible to receive serious consideration from multiple linguists.[36][37][38][39][40][41]

Language families predominantly found in Africa and Southwest Asia

Khoisan homeland

The Khoisan click languages of Africa do not form a language family and so do not, as a family, have a homeland. However, limited genetic evidence from some Khoisan-language speakers in southern Africa suggest an origin "along the African rift and a possible wider East African range."[42] Thus, the Bushmen of the Kalahari who occupy the largest geographic region where click languages are spoken are viewed as a relict population far removed from the place where click languages probably originated. The Khoe languages, Tuu languages, Kx'a languages, Hadza language and Sandawe language (the latter two being Tanzanian language isolates) are frequently grouped together in the catch all Khoisan categorization, despite the lack of a definitive recent common origin of these languages in a common language family. However, for the Khoe-Kwadi group, a more recent origin by immigration from East Africa (around the beginning of the Christian Era) has been suggested by Tom Güldemann, based on his observation of similarities with Sandawe.

Afro-Asiatic homeland

The Afro-Asiatic languages include Arabic, Hebrew, Berber, and a variety of other languages now found mostly in Northeast Africa, although the exact boundaries of this language family are disputed in the case of a small number of languages spoken by small numbers of individuals in a few localized areas of Sudan and East Africa.

The limited area of the Afro-Asiatic Sprachraum (prior to its expansion to new areas in the historic era) has limited the potential areas where the that family's Urheimat could be. Generally speaking, two proposals have been developed: that Afro-Asiatic arose in a Semitic Urheimat in the Middle East aka Southwest Asia, or that Afro-Asiatic languages arose in northeast Africa (generally, either between Darfur and Tibesti or in Ethiopia and the other countries of the Horn of Africa). The African hypothesis is considered to be rather more likely at the present time, because of the greater diversity of languages with more distant relationships to each other there.

There have been serious linguistic proponents of almost every conceivable possible set of relationships of the Afro-Asiatic language subfamilies to each other, although there is reasonably great consensus concerning the subfamily classification of all but a few of the Afro-Asiatic languages. Some of this difficulty in resolving the Afro-Asiatic family tree flows from the time depth of these languages. The Afro-Asiatic Egyptian language of ancient Egypt (whose latest stage is known as Coptic) is one of the two oldest written language on Earth (the other being the Sumerian language, a language isolate) dating in written form to approximately 3000 BCE, and the Semitic Akkadian language was also attested in writing from a very early date (ca. 2000 BCE). A common Afro-Asiatic proto-language is necessarily older than these very old written languages which belonged to language families that had already diverged from each other considerably by that point. There is also no one genetic profile that is uniform among Afro-Asiatic language speakers that clearly unites them. There are also competing theories on whether the Afro-Asiatic language family owes its expansion to the Neolithic revolution that originated in an area that includes the range of the Afro-Asiatic language, or was already widespread in the Upper Paleolithic era. Notably, the Afro-Asiatic language family is spoken in most of the places that are leading candidates for the origins of the modern human species and most of intermediate species between modern humans and the Great Apes in human evolution.

Semitic homeland

There has been speculation regarding the specific Semitic subfamily of Afro-Asiatic languages, again with the Horn of Africa and Southwest Asia—specifically the Levant—being the most common proposals. The large number of Semitic languages present in the Horn of Africa seems at first glance to support the hypothesis that the Semitic homeland lies there. However, the Semitic languages in the Horn of Africa all belong to the South Semitic subfamily and appear to all have relatively recent common origins in a single Ethio-Semitic proto-language, while the East and Central Semitic languages are native solely to Asia. These features, and the presence of certain common Semitic lexical items in all Ethio-Semitic languages referring to items that arrived in Africa from the Levant at a time after Semitic languages were known to have been spoken in the Levant, have lent weight to the Levantine proposal.

Hebrew is found in Europe due to the Jewish diaspora after the fall of the Second Temple in 70 CE that marked the beginning of Rabbinic Judaism. It is relatively closely related to the Arabic language even within the Semitic language family, being part of the same Central Semitic group.

The Maltese language, the only other Semitic language of Europe, is a derivative of the Arabic language as it was spoken in Sicily starting in the couple of centuries after the commencement of the Islamic empire in North Africa.

Nilo-Saharan homeland

Genetic studies of Nilo-Saharan-speaking populations are in general agreement with archaeological evidence and linguistic studies that argue for a Nilo-Saharan homeland in eastern Sudan before 6000 BCE, with subsequent migration events northward to the eastern Sahara, westward to the Chad Basin, and southeastward into Kenya and Tanzania.[43]

Linguist Roger Blench has suggested that the Nilo-Saharan languages and the Niger–Congo languages may be branches of the same macro-language family.[44][45] Earlier proposals along this line were made by linguist Edgar Gregersen in 1972.[46] These proposals have not reached a linguistic consensus, however, and this connection presupposes that all of the Nilo-Saharan languages are actually related in a single family, which has not been definitively established.

Razib Khan, based on analysis of the autosomal genetics of the Tutsi ethnic group of Africa, suggests that "the Tutsi were in all likelihood once a Nilotic speaking population, who switched to the language of the Bantus amongst whom they settled."[47][48]

Niger–Congo homeland

The homeland of the Niger–Congo languages, which has as its subfamily the Benue–Congo languages, which in turn includes the Bantu languages, is not known in time or place, beyond the fact that it probably originated in or near the area where these languages were spoken prior to Bantu expansion (i.e. West Africa or Central Africa) and probably predated the Bantu expansion of ca. 3000 BCE by many thousands of years.[49] Its expansion may have been associated with the expansion of Sahel agriculture in the African Neolithic period.[49]

According to linguist Roger Blench, as of 2004, all specialists in Niger–Congo languages believe the languages to have a common origin, rather than merely constituting a typological classification, for reasons including their shared noun-class system, their shared verbal extensions and their shared basic lexicon.[50][51] Similar classifications have been made ever since Diedrich Westermann in 1922.[52] Joseph Greenberg continued that tradition making it the starting point for modern linguistic classification in Africa, with some of his most notable publications going to press starting in the 1960s.[53] But, there has been active debate for many decades over the appropriate subclassifications of the languages in that language family, which is a key tool used in localizing a language's place of origin.[50] No definitive "Proto-Niger–Congo" lexicon or grammar has been developed for the language family as a whole.

An important unresolved issue in determining the time and place where the Niger–Congo languages originated and their range prior to recorded history is this language family's relationship to the Kordofanian languages now spoken in the Nuba mountains of Sudan, which is not contiguous with the remainder of the Niger–Congo language speaking region and is at the northeasternmost extent of the current Niger–Congo linguistic region. The current prevailing linguistic view is that Kordofanian languages are part of the Niger–Congo language family, and that among the many languages still surviving in that region these may be the oldest.[54] The evidence is insufficient to determine if this outlier group of Niger–Congo language speakers represent a prehistoric range of a Niger–Congo linguistic region that has since contracted as other languages have intruded, or if instead, this represents a group of Niger–Congo language speakers who migrated to the area at some point in prehistory where they were an isolated linguistic community from the beginning.

The prehistoric range for the Niger–Congo languages has implications, not just for the history of the Niger–Congo languages, but for the origins of the Afro-Asiatic languages and Nilo-Saharan languages whose homelands have been hypothesized by some to overlap with the Niger–Congo linguistic range prior to recorded history. If the consensus view regarding the origins of the Nilo-Saharan languages which came to East Africa is adopted, and a North African or Southwest Asian origin for Afro-Asiatic languages is assumed, the linguistic affiliation of East Africa prior to the arrival of Nilo-Saharan and Afro-Asiatic languages is left open. The overlap between the potential areas of origin for these languages in East Africa is particularly notable because includes the regions from which the Proto-Eurasians who brought anatomically modern humans Out of Africa, and presumably their original proto-language or languages originated.

However, there is more agreement regarding the place of origin of the Benue–Congo subfamily of languages, which is the largest subfamily of the group, and the place of origin of the Bantu languages and the time at which it started to expand is known with great specificity.

The classification of the relatively divergent family of Ubangian languages which are centered in the Central African Republic, as part of the Niger–Congo language family where Greenberg classified them in 1963 and subsequently scholars concurred,[55] was called into question, by linguist Gerrit Dimmendaal in a 2008 article.[56]

Benue-Congo homeland

Roger Blench, relying particularly on prior work by Professor Kay Williamson of the University of Port Harcourt, and the linguist P. De Wolf, who each took the same position, has argued that a Benue–Congo linguistic subfamily of the Niger–Congo language family, which includes the Bantu languages and other related languages and would be the largest branch of Niger–Congo, is an empirically supported grouping which probably originated at the confluence of the Benue and Congo Rivers in Central Nigeria.[50][57][58][59][60][61] These estimates of the place of origin of the Benue-Congo language family do not fix a date for the start of that expansion other than that it must have been sufficiently prior to the Bantu expansion to allow for the diversification of the languages within this language family that includes Bantu.

Bantu homeland

There is a widespread consensus among linguistic scholars that Bantu languages of the Niger–Congo family have a homeland near the coastal boundary of Nigeria and Cameroon, prior to a rapid expansion from that homeland starting about 3000 BCE.[43][49][62][63][64][65][66]

Linguisic, archeological and genetic evidence also indicates that this expansion included "independent waves of migration of western African and East African Bantu-speakers into southern Africa occurred."[43] In some places, Bantu language, genetic evidence suggests that Bantu language expansion was largely a result of substantial population replacement.[67] In other places, Bantu language expansion, like many other languages, has been documented with population genetic evidence to have occurred by means other than complete or predominant population replacement (e.g. via language shift and admixture of incoming and existing populations). For example, one study found this to be the case in Bantu language speakers who are African Pygmies or are in Mozambique,[67] while another population genetic study found this to be the case in the Bantu language speaking Lemba of Zimbabwe.[68] Where Bantu was adopted via language shift of existing populations, prior African languages were spoken, probably from African language families that are now lost, except as substrate influences of local Bantu languages (such as click sounds in local Bantu languages).

Malagasy language homeland

The Malagasy language of Madagascar is not related to nearby African languages, instead being the westernmost member of the Malayo-Polynesian branch of the Austronesian language family. The similarity between Malagasy and Malay and Javanese was noted as long ago as 1708 by the Dutch scholar Adriaan van Reeland.[69] Malagasy is related to the Malayo-Polynesian languages of Indonesia, Malaysia, and the Philippines, and more closely with the Southeast Barito group of languages spoken in Borneo except for its Polynesian morphophonemics.[70] Malagasy shares much of its basic vocabulary with the Ma'anyan language, a language from the region of the Barito River in southern Borneo. This indicates that Madagascar was first settled by Austronesian people from the Malay Archipelago, who had passed through Borneo. This happened approximately 0 CE to 500 CE, prior to which the island of Madagascar lacked human inhabitants.[49] Later, the original Austronesian settlers must have mixed with Bantus and Arabs, amongst others.[71] The Malagasy language also includes some borrowings from Arabic, and Bantu languages (notably Swahili). Limited sample size whole genome analysis of Malgasy individuals show that the African component of the Malagasy genome is most similar to modern Bantu-speaking populations in the eastern African Great Lakes region.[72]

Language families predominantly found in East Asia, Southeast Asia and Oceania

Sino-Tibetan homeland

According to the Sino-Tibetan Etymological Dictionary and Thesaurus project of the University of California at Berkeley, the Proto-Sino-Tibetan (PST) homeland may have been "where the great rivers of East and Southeast Asia (including the Yellow, Yangtze, Mekong, Brahmaputra, Salween, and Irrawaddy) have their source. The time of hypothetical ST unity, when the Proto-Han (= Proto-Chinese) and Proto-Tibeto-Burman (PTB) peoples formed a relatively undifferentiated linguistic community, must have been at least as remote as the Proto-Indo-European period, perhaps around 4000 B.C."[73]

Some scholars place the Tibeto-Burman homeland in the area encompassing western Sichuan, northern Yunnan and eastern Tibet.[74]

Population genetic evidence, favors an origin for Proto-Sino-Tibetan languages in the upper and middle Yellow River basin, with part of that source population branching off to settle in the Himalayas, with the split of the population that would provide the genesis of the Chinese language from the population that would provide the genesis of the larger Sino-Tibetan language family in the East Asian Neolithic era:[75]

"[T]he closest relatives of the Tibetans are the Yi people, who live in the Hengduan Mountains and were originally formed through fusion with natives along their migration routes into the mountains. The Tibetan and Yi languages belong to the Tibeto-Bruman language group and their ancestries can be traced back to an ancient tribe, the Di-Qiang . . . After the ancestors of Sino-Tibetans reached the upper and middle Yellow River basin, they divided into two subgroups: Proto-Tibeto-Burman and Proto-Chinese. . . . The ancestral component which was dominant in Tibetan and Yi arose from the Proto-Tibeto-Burman subgroup, which marched on to south-west China and later, through one of its branches, became the ancestor of modern Tibetans. Proto-Tibeto-Burmans also spread over the Hengduan Mountains where the Yi have lived for hundreds of generations. Taking the optimal living condition and the easiest migration route into account, we favor the single-route hypothesis; it is more likely that their migration into the Tibetan Plateau through the Hengduan Mountain valleys occurred after Tibetan ancestors separated from the other Proto-Tibeto-Burman groups and diverged to form the modern Tibetan population."

One of the earliest Neolithic cultures of China in the upper to middle Yellow River basin was the Peiligang culture of 7000 BCE to 5000 BCE, so the population genetic reference in the quoted material is to a date on or after this time period. The Neolithic era concluded in the Yellow River around 1500 BCE. This is not inconsistent with the linguistically based estimate from the Sino-Tibetan Etymological Dictionary and Thesaurus project. By the early and middle Zhou Dynasty (1122 BCE–256 BCE), the language spoken in the Zhou court had become the standardized dialect for that kingdom.[76]

In contrast, four of the other main language families of East Asia and Southeast Asia outside the Sino-Tibetan language family, Austroasiatic, Austronesian, Hmong–Mien and Tai–Kadai, are generally believed to have at origins at some stage of their development in Southern China.

Austroasiatic homeland

The homeland of the Austroasiatic languages (e.g. Vietnamese, Cambodian) which are found from Southeast Asia to India is hypothesized to be located "the hills of southern Yunnan in China," between 4000 BCE and 2000 BCE,[77] with influences from Aryan and Dravidian languages at the Western edge of its expanse in India, and influence from Chinese at the Eastern edge of the regions where it is found. The disjoint distribution of Austroasiatic languages suggest that they were once spoken in most of the areas where the Tai–Kadai languages are now dominant.

However, Paul Sidwell has recently advocated a homeland in Southeast Asia instead,[78] preferring a late date of dispersal of about 2000 BCE.[79]

There is a strong correlation between the population genetic distribution Y-Chromosomal haplogroup O2a1-M95 and the distribution of Austroasiatic language speakers.[80]

Hmong–Mien homeland

The most likely homeland of the Hmong–Mien languages (aka Miao–Yao languages) is in Southern China between the Yangtze and Mekong rivers, but speakers of these languages may have migrated from Central China either as part of the Han Chinese expansion or as a result of exile from an original homeland by Han Chinese.[81] Migration of people speaking these languages from South China to Southeast Asia took place ca. 1600-1700 CE. Ancient DNA evidence suggests that the ancestors of the speakers of the Hmong–Mien languages were a population genetically distinct from that of the Tai–Kadai and Austronesian language source populations at a location on the Yangtze River.[82] Recent Y-DNA phylogeny evidence supports the proposition that people who speak the Hmong-Mien languages are descended from the population that now speaks Austroasiatic Mon-Khmer languages.[83]

Austronesian homeland

The homeland of the Austronesian languages is Taiwan. On this island the deepest divisions in Austronesian are found, among the families of the native Formosan languages. According to Blust (1999), the Formosan languages form nine of the ten primary branches of the Austronesian language family. Comrie (2001:28) noted this when he wrote:

... the internal diversity among the... Formosan languages... is greater than that in all the rest of Austronesian put together, so there is a major genetic split within Austronesian between Formosan and the rest... Indeed, the genetic diversity within Formosan is so great that it may well consist of several primary branches of the overall Austronesian family.

Archaeological evidence (e.g., Bellwood 1997) suggests that speakers of pre-Proto-Austronesian spread from the South Chinese mainland to Taiwan at some time around 6000 BCE. Evidence from historical linguistics suggests that it is from this island that seafaring peoples migrated, perhaps in distinct waves separated by millennia, to the entire region encompassed by the Austronesian languages (Diamond 2000). It is believed that this migration began around 4000 BCE (Blust 1999). However, evidence from historical linguistics cannot bridge the gap between those two periods.

It is possible that the ancient Taiwan aborigines were related to the ancient Minyue, derived in ancient times from the southeast coast of Mainland China, as suggested by linguists Li Jen-Kuei and Robert Blust. It is suggested that in the southeast coastal regions of China, there were many sea nomads during the Neolithic era and they may have spoken ancestral Austronesian languages, and were skilled seafarers.

The specific origins of most far flung member of this language family, the Malagasy language of Madagascar off the coast of Africa, are described above in the part of this article concerning African languages.

The Austro-Tai hypothesis suggests a common origin for the Austronesian languages and the Tai–Kadai languages whose hypothesized place of origin is geographically close to Taiwan.

Tai–Kadai homeland

Many scholars have addressed the question of the origins of the Tai–Kadai languages.[84][85][86][87][88]

There is a consensus that the Tai–Kadai languages have their origins in Southern China or on major nearby islands (such as Taiwan or Hainan).

The leading hypothesis is that the likely homeland of proto-Tai–Kadai was coastal Fujian or Guangdong as part of the neolithic Longshan culture (of 3000 BCE – 2000 BCE). The spread of the Tai–Kadai peoples may have been aided by agriculture, but any who remained near the coast were eventually absorbed by the Chinese. Weera Ostapirat is one academic who articulates this position.[89]

Laurent Sagart, on the other hand, holds that Tai–Kadai is a branch of Austronesian which migrated back to the mainland from northeastern Formosa (i.e. Taiwan) long after Formosa was settled, but probably before the expansion of Malayo-Polynesian out of Formosa.[90][91][92] The language was then largely relexified from what he believes may have been an Austroasiatic language. Sagart suggests that Austro-Tai is ultimately related to the Sino-Tibetan languages and has its origin in the Neolithic communities of the coastal regions of prehistoric North China or East China.

Ostapirat, by contrast, sees connections with the Austroasiatic languages (in Austric), as has Benedict.[93][94][95] Reid notes that the two approaches are not incompatible, if Austric is valid and can be connected to Sino-Tibetan.[96]

Robert Blust (1999) suggests that proto-Tai–Kadai speakers originated in the northern Philippines and migrated from there to Hainan (hence the diversity of Tai–Kadai languages on that island), and were radically restructured following contact with Hmong–Mien and Sinitic. However, Ostapirat maintains that Tai–Kadai could not descend from Malayo-Polynesian in the Philippines, and likely not from the languages of eastern Formosa either. His evidence is in the Tai–Kadai sound correspondences, which reflect Austronesian distinctions that were lost in Malayo-Polynesian and even Eastern Formosan.

Genetic evidence coroborates evidence from Kadai speaking people's oral traditions that puts a Kadai homeland on Hainan.[97] Ancient DNA evidence also shows a connection between speakers of Tai–Kadai speaking populations and Austronesian language speaking populations,[82] and a genetically distinct population at a different location on the Yangtze River as a possible source of Hmong–Mien languages.[82]

Mongolic homeland

The Mongols expanded into present day Mongolia sometime after the demise of the Karasuk culture (1500-300 BC), an Indo-European and, according to ancient DNA, genetically Western Eurasian population.[98] Genghis Khan, starting around 1206 CE, waged a series of military campaigns that, together with campaigns by his successors, stretched from present-day Poland in the west to Korea in the east and from Siberia in the north to the Gulf of Oman and Vietnam in the south, after which the empire ultimately collapsed with little long lasting linguistic impact outside the core Mongolian area.[99]

Japanese and Korean language homelands

Today, there is one Korean language spoken in Korea, and a small family of related languages called Japonic spoken in Japan. There is also an Ainu language spoken by an ethnic minority in Northern Japan.

There were multiple languages spoken in Manchuria and the Korean Peninsula prior to Korea's unification, and there is dispute over which of those languages gave rise to modern Korean sometime in the first millennium CE, and what relationship that proto-language may have had to the proposed family of Altaic languages. The core three populations in the Altaic classification show autosomal population genetic commonalities.[100] These core three populations also show lexical affinities in their languages.[101]

There is also dispute over the extent, if any, to which one of those multiple languages of the Korean peninsula prior to its unification gave rise to the Japanese language, and if so, which of those languages was the language of the Yayoi part of the founding group of modern Japan. The Yayoi may also have had linguistic influences from China. Japanese links to Altaic languages, if they exist, could have arisen via an Altaic source for a Korean peninsula language spoken by the Yayoi, and/or via Altaic influences on the Ainu languages via contacts between the Ainu people and Siberia.

The Ainu language or another extinct language of the indigenous people of Japan called the Jōmon may have also been a formative element in the Japanese language as the Yayoi people and the Jōmon people merged into a common Japanese ethnicity around 2300 years ago.

Both the Koreans and the Japanese make use of Chinese ideograms in their written language, whose Chinese origins are not disputed. However, neither of these spoken languages is closely related to the spoken Chinese language, and need not be because ideograms do not code phonetic versions of the ideas that they describe.


The Korean language is spoken in Korea and among emigrants from Korea. Conservative historical linguists tend to classify the Korean language as a language isolate, although other suggest a relationship to Altaic languages or to Japonic languages.

Old Korean is attested in Chinese histories, in the Three Kingdoms period of Korea (ca. 0 to 900 CE), when the Silla Kingdom (in Eastern Korea), Baekje Kingdom (in Southwestern Korea), and Goguryeo Kingdom (in Northern Korea) were simultaneously present on the Korean peninsula, although Korean was not a literary language until later; the hangul script of Korean was invented in the 15th century CE (the earlier Idu script dates to the 6th century CE).

There was a group of similar languages called the Buyeo languages in the northern Korean Peninsula and southern Manchuria and possibly Japan, which included, according to Chinese records, the languages of Buyeo, Goguryeo, Baekje, Dongye, Okjeo, —and possibly Gojoseon, but was different from ancient Manchu languages like Mohe language. Gojoseon was a kingdom in Northern Korea that is said by tradition to have been founded in 2333 BC (archaeological evidence and Chinese histories support a cultural civilization from around 1500 BCE and a kingdom fused from a federation of smaller states around the 7th century BCE), that was conquered by Han Dynasty China in 108 BC, and re-emereged from Chinese rule as the Kingdom Buyeo. The Three Kingdoms era kingdoms of Goguryeo and Baekje were successors to the Kingdom of Buyeo. Dongye was a vassal state of Goguryeo in Northeast Korea founded in the 3rd-century BCE that was eventually absorbed by Goguryeo around the 5th century CE. Okjeo was a minor state in Northern Korea to the North of Dongye that was a subordinate unit of Gojoseon from the 3rd century BCE to 108 BCE, then came under Han rule, and then was a subordinate state of Goguryeo. None of these Buyeo language family kingdoms ever included the Kingdom of Silla, which was just a small kingdom on the Southern coast of Korea until the Three Kingdoms period during which it expanded and conquered the other two kingdoms.

Linguists including Christopher Beckwith argue for Japanese as a descendant of Goguryeo, and for Korean as a descendant of the Silla language, based on lexical similarities between Goguryeo and Japanese, and based upon Silla's ultimate triumph in the quest for political control of Korea. Other linguistists, including Kim Banghan, Alexander Vovin, and J. Marshall Unger argue that Japanese is related to the pre-Goguryeo language of the central and southern part of Korean peninsula, including what would become the Kingdom of Silla, and that Old Korean is Goguryeo with a pre-Goguryeo Japonic substrate, in part, because Japanese-like toponyms found in the historical homeland of Silla were also distributed in southern part of Korean peninsula, and are not found in the northern part of Korean peninsula or south-western Manchuria.[102] None of the extinct languages is attested in writing well enough to reach definitive conclusions resolving the debate.

Japanese and Ainu languages

Japanese language family languages are spoken in Japan and among emigrants from Japan and is attested in Japanese language writing from the 8th century CE, and in imperfect Chinese transcriptions from the late 5th century CE. Conservative historical linguists tend to classify a small number of Japanese languages as a language family of their own. The Ainu languages are a barely surviving family of closely related languages or dialects that were spoken by indigenous populations on the island of Hokkaidō in what is now northern Japan as well as on the island of Sakhalin and the Kuril Archipelago in what is now the Russian Far East at the time of the oldest extant historical records concerning those islands.

There are similarities between the Japanese language and the Korean language in lexicon and grammatical features, but there is dispute over whether these denote a common origin, or mere linguistic borrowing due to a sprachbund of neighboring languages that are adjacent to each other. Samuel E. Martin, Roy Andrew Miller, and Sergei Starostin are linguists who have argued that they have common origins.[103][104][105][106][107] In contrast, Alexander Vovin has argued for a regional borrowing model to explain the linguistic similarities.[108]

One hypothesis proposes that Japanese is a relative of the extinct languages spoken by the Buyeo-Goguryeo cultures of Korea, southern Manchuria, and Liaodong of which the best attested is the extinct language Goguryeo.[109][110][111] This proposal is attributed to Shinmura Izuru, who proposed it in 1916. Modern Korean, in contrast, according to proponents of this hypothesis, appears to have stronger connections the Silla language, spoken in the ancient kingdom of Silla (57 BC – AD 935), one of the Three Kingdoms of Korea, whose similarity to the Goguryeo language is not clearly established.

The earliest Chinese historical records concerning the "Wa" in Japan indicate that they were fractured into many warring states. But, modern Japanese dialects show a common origin, rather than a "bushy" one. So, it is possible that there were many Yayoi dialects in the period before Old Japanese emerged, of which the dialect of the warring states that ended up prevailing politically as the Japanese state was unified superseded other early Yayoi languages or dialects.[112]

After a new wave of immigration, probably from the Korean Peninsula some 2,300 years ago, of the Yayoi people, the Jōmon were pushed into northern Japan. Genetic data suggest that modern Japanese are descended from both the Yayoi and the Jōmon. Tradition, as documented by the Nihon Shoki, a legendary account of Japan's history, puts the date of the Yayoi arrival in Japan at 660 BCE. Chinese historical records mention the existence of the Yayoi (called "Wa") starting in 57 BCE. The existing Japanese language has its origins at approximately this point in time, if not earlier (to the extent that Japanese derives primarily from either the language of the Bronze Age Yayoi people, as it existed prior to their arrival in Japan, or derives primarily from a language of the Jōmon at that point of time, rather than being a creole of some sort). Skeletal remains suggests that the two cultures had fused into a group with a homogeneous physical appearance in Southern Japan by 250 CE.[112] It is possible that the Japanese language has roots related to the Ainu language, the historical language of the Yayoi, whatever that may have been, or could have been a creole of both. It is also possible the Japanese has roots in a language spoken in Southern Japan that is lost and now unknown.[112]

The Ainu people are genetic descendants of the Jōmon, with some contribution from the Okhotsk people.[113] The Ainu languages that are now spoken by Ainu minorities in Hokkaidō; and were formerly spoken in southern and central Sakhalin, and the Kuril Islands (an area also known as Ezo), and perhaps northern Honshū island by the Emishi people (until approximately 1000 CE), are associated with the founding Jōmon people of Japan from than 14,000 years ago or earlier, and the Satsumon culture of Hokkaidō, although the Ainu also had contact with the Paleo-Siberian Okhotsk culture whose modern descendants include the Nivkh people (whose original homeland was mostly occupied by the Tungusic people), which could have linguistically influenced the Ainu language.[114] Thus, as a result of this important outside cultural influence, it is impossible to know with certainty how similar the language of the original language of the Jōmon people was to that spoken by the Ainu people today. Some linguists have suggested other language family connections for the Ainu language: Shafer has suggested a distant connection to the Austroasiatic languages.[115] Vovin, had viewed that suggestion as merely preliminary.[116] Japanese linguist Shichirō Murayama tried to link Ainu to the Austronesian languages, which include the languages of the Philippines, Taiwan, and Indonesia through both vocabulary and cultural comparisons. There is no consensus, however, that the Ainu languages have sources in any other known language, and the unique population genetics of the Ainu people support the hypothesis that they were largely isolated from the rest of the world for many thousands of years.

The Yayoi people had strong physical, genetic and cultural similarities to the Chinese during the Han Dynasty (202 BCE-8) in the Jiangsu province on China's Eastern Coast.[117] The Yayoi also have strong cultural similarities to the Koreans of that time period.[118][112]

Some linguists, such as Turchin,[101] see a connection between Japanese and Korean and an Altaic language family or similar larger grouping of languages, with those speakers coming from an area North of Korea, based in part upon similarities in lexical roots. The statistical method used by Turchin, however, would not discriminate between Jōmon and Yayoi sources for any Altaic linguistic affinities. Turchin's analysis also did not look at the various proposed ancient predecessors of the Korean language in Korea or the relationship of those languages to any of the proto-Altaic languages, despite the fact that the hypothesis would require one of those ancient Korean peninsular languages to be intermediate between Japanese and one of the proto-Altaic languages. Old Japanese when first attested had eight vowels, rather than the current five (which were lost within a century of the oldest preserved writings) which was close to the vowel system seen in Uralic and Altaic languages.[119] Old Japanese also had more grammatical similarity to Altaic languages than modern Japanese.

These classifications of the origins of Japanese language origins ignore significant borrowing from other languages in recent times. Current estimates are that "wago" (i.e. words attributable to the original Yayoi language) make up 33.8% of the Japanese lexicon, that "kango" (i.e. words with roots borrowed from Chinese since the 5th century CE) make up 49.1% of Japanese words (and in addition, the Chinese ideograms used in the Japanese written language), that foreign words called gairaigo make up 8.8% of Japanese words, and that 8.3% of Japanese words are konshugo that draw upon multiple languages.[120] This account attributes only a small number of words in modern Japanese to Ainu roots.

The six Ryukyuan languages spoken in the islands to the South of Japan, are descended from Japanese but are not mutually intelligble with Japanese with which they share about 72% of their words (or each other) and started to diverge from Japanese around the 7th century CE. these islands were united in a Ryukyuan kingdom from 1429 CE (prior to that there were multiple divided kingdoms which were tributary states of China after 1372 CE); the kingdom was a tributary state of China until 1609 when it became a vassal state of Japan, until it was annexed by Japan in 1879. These languages were then suppressed and while they have about a million native speakers, there are relatively few native speakers under the age of twenty. They are effectively minority languages in their own countries at this point.

Other groups

The only language isolates or language families predominantly spoken in Southeast Asia, East Asia and Oceania that do not belong to one of the language families above are the indigenous languages of Melanesia (which number more than eight hundred or more in perhaps sixty language families), which are described with a geographic term that does not presume a genetic relationship between them as the Papuan languages, and the Australian aboriginal languages (of which there are about one hundred and fifty remaining in about ten language families, all of which, except the languages of the Pama–Nyungan languages are largely confined to the central Northern coast of Australia). No linguists have found a language family connection between indigenous Papuan and Australian aboriginal languages and those of Asia, Africa, the Americas or any other part of the world. Indeed, no linguistic connection has been established between the indigenous languages of Melanesia and the indigenous languages of the Aboriginal Australians.[121] This is consistent with the mainstream view, supported by population genetics and archaeology, that Papua New Guinea and Australia, as well as some of the islands neighboring Papua New Guinea, were first inhabited by hominins (humans or otherwise) at least 40,000 years ago in migrations that were either separate or swiftly segregated, and that many of these populations have had only limited contact with outside populations until the modern era. While there are plausible reasons to infer that the Melanesian languages and the aboriginal Australian languages, respectively, have common origins in a small founding population with a single language, the linguists have not been able to marshal lexical, phonetic and grammatical evidence from these languages in their current form to support these inferences.

Languages spoken predominantly in North and South America


Recently linguist Edward Vajda has proposed a genetic link between the Na-Dene languages and the Yeniseian languages of the Ket people of central Siberia, suggesting a homeland in Siberia or a back migration of Na-Dene speakers from Beringia. Na-Dene languages are spoken by Native Alaskans and some people from the First Nations of Western Canada, in the Pacific Northwest, and also includes the Southern Athabaskan languages spoken in the American Southwest (e.g. the Apache language and Navajo language). The proposal is still not fully accepted among linguists.


The Eskimo–Aleut languages are spoken by native peoples of the Arctic regions of Alaska and Canada and Greenland, generally to the North of Na-Dene linguistic areas (shown on the map on the left).

Current ancient and modern DNA scholarship and archaeology supports a three-layer paradigm in which first the Saqqaq (Arctic Paleo-Eskimos) which was present 2000 BCE, then the Dorset (second wave Arctic Paleo-Eskimos), and finally the Thule (proto-Inuit) from ca. 500 CE – 1000 CE, successively sweep Arctic North America while having little genetic impact on Native American populations further South, that presumably have origins that date back to the initial colonization of the Americas by modern humans from Asia (who are the first hominins to live there), and ancient DNA shows genetic continuity from the Thule to modern Inuit (whose genetics are remarkably homogeneous), dominated by the A2a, A2b, and D3 mtDNA haplotypes, while "Haplotype D2 (3%), found among modern Aleut and Siberian Eskimos, was identified at a low frequency in the modern samples but not the ancient. This haplotype was recently identified in an ancient Paleo-Eskimo Saqqaq individual from western Greenland. . . . Whole genomic sequencing of the 4,000 year old PaleoEskimo, "Inuk," indicated that the Saqqaq sequences clustered with the Chukchi and Koryaks of Siberia-suggesting an earlier migration from Siberia along the northern slope of Alaska to Greenland."[122] Evidence such as bronze artifacts produced in East Asia from ca. 1000 CE, further supports a proto-Eskimo-Aleut arrival in the polar regions of North America ca. 500 CE – 1000 CE.[123]

The proto-Eskimo-Aleut migration to North America, associated with the Thule expansion in North America ca. 500 CE, took place much more recently than the initial human population of North America, which took place more than 14,000 years ago. Also, the modern Inuit populations are genetically distinct from other indigenous populations of the Americas. Thus, evidence from genetics and archaeology strongly supports an East Asian origin for Eskimo-Aleut languages sometime in the last 1500 years that is distinct from most other indigenous languages of the Americas. But there is no linguistic consensus on any particular languages of East Asia with which this family of North American languages is associated.[124] It is entirely possible that Eastern Siberian languages most closely ancestral to Eskimo-Aleut are extinct. Many indigenous languages and cultures of this region have died in the face of expanded Russian cultural and national influence starting in the 18th century.

Michael Fortescue in 1998 proposed a group of Uralo-Siberian languages, in which Uralic languages like Finnish were related to Eskimo-Aleut languages supported by lexical correspondences and grammatical similarities, expanding upon a proposal of Morris Swadesh in 1962 that itself reiterates similarities that have been noted since at least 1746.[125] Fortescue argues that the Uralo-Siberian proto-language (or a complex of related proto-languages) may have been spoken by Mesolithic hunting and fishing people in south-central Siberia (roughly, from the upper Yenisei river to Lake Baikal) between 8000 and 6000 BC, and that the proto-languages of the derived families may have been carried northward out of this homeland in several successive waves down to about 4000 BC, leaving the Samoyedic branch of Uralic in occupation of the Urheimat thereafter.

A 2005 proposal by Holst, also reiterating a proposal of Swadesh from 1962, suggests that the Wakashan languages (map on right) spoken in British Columbia around and on Vancouver Island, are part of the same language family as the Eskimo-Aleut languages.[126] This proposal, if accurate, would suggest that Na-Dene languages may have arrived in North America after (although not long after) Eskimo-Aleut languages.

Phonologically, the Eskimo–Aleut languages resemble other languages of northern North America and far eastern Siberia.


Some authorities on the history of the Uto-Aztecan language group place its homeland in the border region between the USA and Mexico, namely the upland regions of Arizona and New Mexico and the adjacent areas of the Mexican states of Sonora and Chihuaua, shown on the map (below left) roughly corresponding to the Sonoran Desert. The proto-language would have been spoken by foragers, about 5,000 years ago. Hill (2001) proposes instead a homeland further south, making the assumed speakers of Proto-Uto-Aztecan maize cultivators in Mesoamerica, who were gradually pushed north, bringing maize cultivation with them, during the period of roughly 4,500 to 3,000 years ago, the geographic diffusion of speakers corresponding to the breakup of linguistic unity.[127]

Other groups

Other than Dene-Yeniseian, and a possible connection between the Eskimo-Aleut language family and the Uralic language family, no proposals of genetic relations between languages of North or South America and languages of Eurasia, Africa, or other parts of the world, have been backed by credible evidence. There is not, for example, any indication that the Vikings who had a brief presence in North America around 1000 CE left any linguistic trace.

Population genetic evidence suggests that the non-circumpolar indigenous peoples of the Americas have origins in a small common founder population in the Upper Paleolithic era that arrived via a Berginian land bridge from Asia.[128][129][130][131] This population genetic data point suggests the possibility that all indigenous Native American languages of non-circumpolar indigenous Americans (i.e. neither Inuit-Aleut nor Na-Dene) have genetic origins in a single language of the founding population of the Americas, and hence, as controversially proposed by Greenberg, that they all ultimately belong to the same linguistic superfamily, which Greenberg called Amerind.[132] But, there is not clear evidence of this from efforts to use traditional comparative linguistic methods to classify indigenous Native American languages. The process of identifying linguistic origins with traditional linguistic methods begins with the process of classifying languages into families.

In general, more progress has been made in identify language family relationships in North America, where the just under three hundred attested languages are grouped into twenty-nine language families and twenty-seven language isolates (some of which are simply incapable of being classified because they are extinct and were not sufficiently well attested to classify). Two (super-) family proposals, Penutian and Hokan generally along the Pacific coast of North America that are gaining currency among linguists, would reduce the number of language families in North America to about fifteen. However, in large portions of the Southeast United States where it is known that there was considerable pre-Columbian linguistic diversity, there are no attested indigenous languages and the populations in question either left no survivors, or all remaining speakers of relocated tribes with diminished numbers underwent language shift as their ancestral languages became moribund.

Mesoamerica was home to one of the most developed succession of farming societies in the Americas in the pre-Columbian era. Mesoamerica's attested languages are likewise quite well systematized into six main language families and four other language isolates or small language families, as well as a few unclassified extinct languages, encompassing all of the languages in the region. Mesoamerica is also the only part of the Americas in which written languages were in use in the pre-Columbian era.

In South America there are about 350 living indigenous languages (in addition to many creoles) and an estimated more than one thousand extinct languages, grouped into more than 140 categories, only ten of which have more than five languages which have been demonstrated to belong to the same language family. This is about three times as much linguistic diversity at the language family/language isolate level as North America and Mesoamerica combined. The naïve expectation from population genetics would have been that there would be less linguistic diversity, because the entire indigenous population of South America appears to derive genetically from only a subset of an already small indigenous founder population of the Americas as a whole, something illustrated, for example, by its lack several of the less common genetic haplotypes found in indigenous America outside South America (although genetic diversity has accumulated in these populations over time through mutations distinguishing these populations from the founder population genomes). Some of the lack of classification of indigenous South American languages may be simply attributable to the small number of linguists devoted to the task and the limited amount of information available about many of the languages. But the languages of the region may also simply be particularly diverse due to separation by great time depth and geographic isolation. The only other place in the world with comparable linguistic diversity that has not been reduced to a small number of language families is Papua New Guinea, which also experienced many millennia of isolation from the rest of the world that ended only relatively recently.

Implications of current research

The Out of Africa theory of human origins marshals archeological, genetic, and ancient climate evidence to suggest a common origin for all modern humans in Africa about 70,000 years ago and an origin for farming and herding about 8,000 to 10,000 years ago.[133]

We also have some idea about the time depth of these languages. For example, the Urheimats in which the proto-languages of the subfamilies are the Indo-European language family necessarily arose more recently than the Proto-Indo-European language family. Similarly, a language superfamily's proto-language must have been spoken in an Urheimat not more recent than the time depth of the oldest language in the language family. The time and place of the Urheimats of various language family proto-languages spoken by most people alive today is in many cases much more recent than either the Out of Africa date or the origin of farming and herding. The relatively young time depth of modern language families can arise from at least two factors: prior languages went extinct as other languages expanded,[49] and some language families may have deeper connections at a greater time depth.

It will probably never be possible to know with any great confidence what the linguistic landscape of the world looked like 18,000 years ago, and even determining what the linguistic landscape of the world looked like 8,000 years ago is a profound challenge and highly controversial undertaking. It is unlikely that it is possible to reconstruct a historical Tower of Babel linguistic community in which all humans spoke a common language (although we can say with confidence that large stone edifices built by large organized communities of people, which date to the Neolithic era at the earliest, weren't built by any culture on Earth until at least many tens of thousands of years after there was a hypothetical common language of all humans, or even of all Eurasians), or to gain very specific insight about what the language the original proto-Eurasians or the earliest modern humans spoke, although the lack of instances of writing more than about 5,500 years ago, despite the extensive recovery of earlier artifacts and art from prehistory, makes it unlikely that earlier humans had anything approaching a complete written language. Proto-linguistic markings used in trade are only a few thousand years older.

Evidence from pre-Columbian languages in the Americas and from places like Papua New Guinea and Australia that were isolated during periods of linguistic consolidation in the rest of the world, suggest that pre-Neolithic revolution societies had a great many languages relative to their populations, most of which are now irrevocably lost. The great linguistic diversity of these regions that presumably had at most one or two languages when first settled by modern humans, given the founding population sizes for them implied by population genetic evidence, reinforces the impossibility of making any meaningful statements about the nature of a proto-language at a time depth of tens of thousands of years.

The expansion of particular major language families is frequently associated with the adoption of superior food production, military technologies or social organization by a particular group of people that allowed them to expand and exert dominance over neighborhoring societies, either ruling them or replacing them. For example, the domestication of horses is frequently associated with the expansion of the Indo-European language family (other linguists see an earlier expansion date which they attribute to the expansion to farming and herding), the expansion of the Chinese language is sometimes associated first with millet and later with rice farming, and the development of crops and domesticated animals that can thrive in tropical environments may have been one factor in Bantu expansion. Some of the examples of this, such as the expansions of the Hungarian, Turkish, Arabic and Chinese languages, are historically documented. Other language replacement events are lost to history and must be inferred.

Limitations of the concept of Urheimat


The concept of an Urheimat only applies to populations speaking a proto-language defined by the tree model. This is not always the case. For example, creole languages are hybrids of languages that are sometimes unrelated. Similarities arise from the creole formation process, rather than from genetic descent.[134] For example, a creole language may lack significant inflectional morphology, lack tone on monosyllabic words, or lack semantically opaque word formation, even if these features are found in all of the parent languages of the languages from which the creole was formed.[135]


Some languages are language isolates. That is, they have no well accepted language family connection, no nodes in a family tree, and therefore no known Urheimat. An example is the Basque language of Northern Spain. Nevertheless it is a scientific fact that all languages evolve. An unknown Urheimat may still be hypothesized, such as that for a Proto-Basque, and may be defended by archaeological and historical evidence.

Sometimes relatives are found for a language originally believed to be an isolate. An example is the Etruscan language, which, even though only partially understood, was found to be related to the Raetic language and to the Lemnian language. A single family may be an isolate. In the case of the non-Austronesian indigenous languages of Papua New Guinea and the indigneous languages of Australia, there is no published linguistic hypothesis supported by any evidence that these languages have links to any other families. Nevertheless an unknown Urheimat is implied. The entire Indo-European family itself is a language isolate: no further connections are known. This lack of information does not prevent some professional linguists from formulating additional hypothetical nodes (Nostratic) and additional homelands for the speakers.

Shared urheimats

Other circumstances can also complicate the matter. For example, in places where language families meet, like the interface of the Nilo-Saharan and Afro-Asiatic language family in Western Ethiopia, the relationship between a group that speaks a language and the Urheimat for that language is complicated by "processes of migration, language shift and group absorption are documented by linguists and ethnographers" in groups that are themselves "transient and plastic."[136]

Also, over a sufficient period of time, in the absence of evidence of intermediary steps in the process, it may be impossible to observe linkages between languages that have a shared urheimat. This general concern is a manifestation of the larger issue of "time depth" in historical linguistics.[137] For example, while the evidence from genetics, archeology and historical climate change strongly points to a relatively small number of waves in a fairly short time period from Asia to the Americas,[138] there continues to be intense controversy regarding the classification of the indigenous languages of the Americas, for which there is little direct evidence because all but a couple of those languages were not written in the pre-Columbian era, and in Australia and New Guinea, whose history of human migration and contact is also well documented.[139] Given enough time, natural change in isolated language can obliterate any meaningful linguistic evidence of a known common genetic source for the languages.

