Language.
Foremost is the intimate and influential relationship between language, cognition, and perception. For a majority of the 20th Century, scholars debated whether or not our perceptions, and the types and forms of our thoughts about them, are dictated by the concepts available in a particular language. As the scholarship in this area has grown, a consensus has arisen that, in general, variation in the concepts available within a given language on a particular topic influences—but does not absolutely determine—both general variation in the perceptions and experiences that a person may have with respect to that topic, and general variation in the thoughts expressed about those experiences. In other words, while our particular language influences both the specific thoughts we have about a given experience and the manner in which we express these thoughts, the particularities of our language do not preclude us from experiencing aspects of the topic for which we have no words1.
Lexical Hypothesis.
In psychology, the lexical hypothesis has been utilized to compare the experiences of people within a given language group, and to compare such individually varying traits as cognitive tendencies, motivation, and personality. When applying the lexical hypothesis to individuals, it is the commonality, centrality, or frequency of a particular individual’s experiences or traits compared to the experiences or traits of the group of language speakers, which influences general variation in the type and frequency of particular words expressed. In this way, the lexical hypothesis is extended to propose that variation in the frequency of word usage within a given group of language speakers reflects individual variation in both experiences, and in psychological traits.
The lexical hypothesis can, therefore, be utilized to identify the relative variation in word or concept use, according to groups, individuals, and experiences.
Language corpora.
While language corpora serve many functions, in Raven’s Eye, they serve primarily as a background pool of words against which to compare an acquired natural language sample. When combined with the lexical hypothesis, these corpora facilitate the identification of words and themes that are relatively essential to your acquired natural language sample (and, as an extension, to your study).
Currently, Raven's Eye maintains corpora in 65 languages. These include:
Language | Unique Words | Total Words |
Afrikaans | 430,000 | 25,020,000 |
Amharic | 90,000 | 1,450,000 |
Arabic | 2,140,000 | 215,430,000 |
Assamese | 170,000 | 5,000,000 |
Azerbaijani | 770,000 | 40,250,000 |
Bengali | 610,000 | 29,100,000 |
Bulgarian | 1,370,000 | 146,950,000 |
Burmese | 420,000 | 6,540,000 |
Croatian | 1,270,000 | 75,360,000 |
Czech | 2,260,000 | 228,360,000 |
Dutch | 3,730,000 | 545,070,000 |
English | 16,780,000 | 4,812,300,000 |
Farsi | 2,010,000 | 188,060,000 |
Finnish | 2,600,000 | 155,040,000 |
French | 6,030,000 | 1,436,790,000 |
German | 9,670,000 | 1,441,420,000 |
Greek | 1,130,000 | 85,860,000 |
Gujarati | 230,000 | 8,910,000 |
Hebrew | 1,610,000 | 183,610,000 |
Hindi | 630,000 | 41,810,000 |
Hungarian | 2,910,000 | 246,590,000 |
Icelandic | 300,000 | 12,800,000 |
Language | Unique Words | Total Words |
Indonesian | 1,540,000 | 140,250,000 |
Irish | 160,000 | 6,950,000 |
Italian | 4,150,000 | 934,100,000 |
Javanese | 290,000 | 11,530,000 |
Kannada | 640,000 | 17,350,000 |
Kazakh | 880,000 | 54,570,000 |
Korean | 3,250,000 | 145,340,000 |
Kurdish | 180,000 | 4,850,000 |
Luxembourgish | 370,000 | 14,150,000 |
Malagasy | 470,000 | 14,290,000 |
Malay | 820,000 | 74,620,000 |
Malayalam | 800,000 | 20,510,000 |
Marathi | 340,000 | 12,880,000 |
Marathi | 310,000 | 12,880,000 |
Nepali | 320,000 | 13,660,000 |
Norweign | 660,000 | 38,690,000 |
Oriya | 190,000 | 5,930,000 |
Polish | 3,780,000 | 482,010,000 |
Portuguese | 2,980,000 | 503,400,000 |
Punjabi | 240,000 | 9,540,000 |
Pushto | 140,000 | 3,800,000 |
Romanian | 1,610,000 | 147,670,000 |
Language | Unique Words | Total Words |
Russian | 7,900,000 | 999,930,000 |
Sindhi | 90,000 | 2,380,000 |
Slovenian | 1,070,000 | 72,380,000 |
Spanish | 4,620,000 | 1,021,060,000 |
Sundanese | 180,000 | 8,210,000 |
Swahili | 190,000 | 7,650,000 |
Swedish | 4,980,000 | 874,690,000 |
Tagalog | 290,000 | 13,090,000 |
Tajik | 260,000 | 14,530,000 |
Tamil | 880,000 | 32,090,000 |
Tatar | 600,000 | 24,940,000 |
Telugu | 780,000 | 30,610,000 |
Thai | 1,280,000 | 37,700,000 |
Tibetan | 60,000 | 440,000 |
Turkish | 190,000 | 3,890,000 |
Ukrainian | 3,890,000 | 353,510,000 |
Urdu | 590,000 | 38,440,000 |
Uzbek | 380,000 | 25,020,000 |
Vietnamese | 1,870,000 | 289,930,000 |
Yiddish | 90,000 | 3,680,000 |
Yoruba | 100,000 | 3,240,000 |
Notes.
2 For instance, languages from areas where horses have been found historically tend to have many words to describe horses and their various attributes. However, languages from areas without the historic presence of horses often do not contain a similarly diverse set of words to describe such attributes, if at all.