Language.

Language is a structure of meanings systematically applied to particular sounds and symbols, which are then consistently related through an organized grammar. The structure and function of language has been studied and discussed in a number of scholarly disciplines. While we recommend that our users maintain a working knowledge of the scholarly literature on language, a few of their conclusions are of particular interest to the use of Raven’s Eye. Therefore, we explain them in these Technicals.

Foremost is the intimate and influential relationship between language, cognition, and perception. For a majority of the 20th Century, scholars debated whether or not our perceptions, and the types and forms of our thoughts about them, are dictated by the concepts available in a particular language. As the scholarship in this area has grown, a consensus has arisen that, in general, variation in the concepts available within a given language on a particular topic influences—but does not absolutely determine—both general variation in the perceptions and experiences that a person may have with respect to that topic, and general variation in the thoughts expressed about those experiences. In other words, while our particular language influences both the specific thoughts we have about a given experience and the manner in which we express these thoughts, the particularities of our language do not preclude us from experiencing aspects of the topic for which we have no words1.

Lexical Hypothesis.

The lexical hypothesis proposes that languages will contain words for objects, events, and ideas that are common to the experiences of their respective speakers. It further proposes a positive relation between the commonality, centrality, or frequency of an experience to the speakers of a language, and the number of extant words available to describe various aspects of that experience2. In this way, the lexical hypothesis proposes that the words and concepts that comprise a given language will be influenced by the everyday experiences and environmental contingencies of those who speak that language.

In psychology, the lexical hypothesis has been utilized to compare the experiences of people within a given language group, and to compare such individually varying traits as cognitive tendencies, motivation, and personality. When applying the lexical hypothesis to individuals, it is the commonality, centrality, or frequency of a particular individual’s experiences or traits compared to the experiences or traits of the group of language speakers, which influences general variation in the type and frequency of particular words expressed. In this way, the lexical hypothesis is extended to propose that variation in the frequency of word usage within a given group of language speakers reflects individual variation in both experiences, and in psychological traits.

The lexical hypothesis can, therefore, be utilized to identify the relative variation in word or concept use, according to groups, individuals, and experiences.

Language corpora.

A corpus is a collection of written documents; more than one such collection are referred to as corpora. A language corpus is an aggregated body of work, which is gathered together to facilitate the identification of popular words and their forms. Lists or tables of words, parts-of-speech, and other linguistic features are typically derived from such corpora, and are often organized according to their frequency of appearance in the corpus at-hand.

While language corpora serve many functions, in Raven’s Eye, they serve primarily as a background pool of words against which to compare an acquired natural language sample. When combined with the lexical hypothesis, these corpora facilitate the identification of words and themes that are relatively essential to your acquired natural language sample (and, as an extension, to your study).

Currently, Raven's Eye maintains corpora in 65 languages. These include:

Language
Unique Words
Total Words
Afrikaans

430,000

25,020,000

Amharic

90,000

1,450,000

Arabic

2,140,000

215,430,000

Assamese

170,000

5,000,000

Azerbaijani

770,000

40,250,000

Bengali

610,000

29,100,000

Bulgarian

1,370,000

146,950,000

Burmese

420,000

6,540,000

Croatian

1,270,000

75,360,000

Czech

2,260,000

228,360,000

Dutch

3,730,000

545,070,000

English

16,780,000

4,812,300,000

Farsi

2,010,000

188,060,000

Finnish

2,600,000

155,040,000

French

6,030,000

1,436,790,000

German

9,670,000

1,441,420,000

Greek

1,130,000

85,860,000

Gujarati

230,000

8,910,000

Hebrew

1,610,000

183,610,000

Hindi

630,000

41,810,000

Hungarian

2,910,000

246,590,000

Icelandic

300,000

12,800,000

Language
Unique Words
Total Words
Indonesian

1,540,000

140,250,000

Irish

160,000

6,950,000

Italian

4,150,000

934,100,000

Javanese

290,000

11,530,000

Kannada

640,000

17,350,000

Kazakh

880,000

54,570,000

Korean

3,250,000

145,340,000

Kurdish

180,000

4,850,000

Luxembourgish

370,000

14,150,000

Malagasy

470,000

14,290,000

Malay

820,000

74,620,000

Malayalam

800,000

20,510,000

Marathi

340,000

12,880,000

Marathi

310,000

12,880,000

Nepali

320,000

13,660,000

Norweign

660,000

38,690,000

Oriya

190,000

5,930,000

Polish

3,780,000

482,010,000

Portuguese

2,980,000

503,400,000

Punjabi

240,000

9,540,000

Pushto

140,000

3,800,000

Romanian

1,610,000

147,670,000

Language
Unique Words
Total Words
Russian

7,900,000

999,930,000

Sindhi

90,000

2,380,000

Slovenian

1,070,000

72,380,000

Spanish

4,620,000

1,021,060,000

Sundanese

180,000

8,210,000

Swahili

190,000

7,650,000

Swedish

4,980,000

874,690,000

Tagalog

290,000

13,090,000

Tajik

260,000

14,530,000

Tamil

880,000

32,090,000

Tatar

600,000

24,940,000

Telugu

780,000

30,610,000

Thai

1,280,000

37,700,000

Tibetan

60,000

440,000

Turkish

190,000

3,890,000

Ukrainian

3,890,000

353,510,000

Urdu

590,000

38,440,000

Uzbek

380,000

25,020,000

Vietnamese

1,870,000

289,930,000

Yiddish

90,000

3,680,000

Yoruba

100,000

3,240,000

Notes.

1 In such instances, one may modify existing words (e.g., “bueish,” for something similar to blue, but not quite blue in the sense that the word already describes in the language), borrow a word from another language, create neologisms, or produce similes or metaphors that utilize words used to describe somewhat similar experiences.

2 For instance, languages from areas where horses have been found historically tend to have many words to describe horses and their various attributes. However, languages from areas without the historic presence of horses often do not contain a similarly diverse set of words to describe such attributes, if at all.