Raven's Eye | Quantitative foundations

Quantitative methods.

As noted in General tendencies and attitudes, the apparent majority of social scientists analyze verbal information quantitatively, but in doing so generally exclude the analysis of natural language in its original form from the domain of their methodological purview. They do so for various heretofore valid philosophical and statistical reasons. Surveying all of them is beyond the scope of these Technicals, but most are generally covered by popular graduate-level research methods textbooks, during discussion on such topics as levels of measurement and scaling, and distinguishing between variables amenable to either parametric or non-parametric statistics.

We write, “heretofore,” in the preceding paragraph for a reason, however. With Quantitative Phenomenology and Raven’s Eye, we present a hybrid method that unites the quantitative and qualitative analysis of natural language, and in doing so dissolves the necessity of their previous distinction in this area. To understand how we do so, it is useful to describe two among the few previous quantitative approaches to natural language analysis in the social and information sciences: word frequency approaches, and word relationship approaches. Raven’s Eye integrates advanced forms of these two basic approaches into its algorithms.

Word frequency.

Raven’s Eye utilizes various aspects of word frequency as a foundation for its algorithms. Word frequency approaches generally incorporate the lexical hypothesis as a basic attitude toward their data. In other words, researchers utilizing word frequencies in their research often assume that the frequency of a given word in a set of responses (or language corpus) is positively related to the relative importance, pertinence, or centrality of that word to the stimuli at-hand. Generally, such word frequency approaches either proceed by: (a) identifying a list of specific words, or researcher-developed word classifications, and then counting the number of their occurrences found in a pool of natural language, or (b) developing a ranked-order list of each word, based on that word’s frequency in the pool of natural language being analyzed.

Due to technological limitations, for much of the 20th Century word frequency approaches were time-intensive, and laborious in nature. Perhaps because of this, they were relatively seldom performed.

Moreover, and due to perceived limitations in word frequency data, conclusions drawn from early studies using such methods remained statistically and nomothetically limited. Statistical limitations arose from both the form of word frequency distributions, as well as the inherent contingency of words when expressed in natural language. The frequency distribution of words in a given sample of natural language are known to follow Zipf’s law, and will thus be non-normally distributed. The expression of a particular word is also contingent on the word that precedes it, as well as those others that contextualize it. For instance, the probability approaches nil that the word nil in this sentence will directly precede the word flower. However, the probability that the word the will precede the word word, is relatively high. Therefore, the frequency of a given word is in part contingent on the frequency of those words that contextualize it. Because of the apparently chaotic, multifactorially contingent, and non-normal distribution of words in natural language, approaches relying solely on word frequency were (and remain) of relatively limited statistical use.

Nomothetically, then, while a word might be frequent in a natural language sample, and thus describe that sample in some way, it was difficult to know whether that frequency meant something, as compared to other samples. One could utilize nonparametric procedures to calculate the probability of difference in frequency between word use in two or more independent samples, but this has been historically performed one word at a time.

Raven’s Eye resolves most of these previous difficulties through its standardized algorithmic approach, which relies on word proportionality rather than raw frequency, and then compares this proportionality to language corpora. In doing so, it allows for standard statistical and nomothetic comparison of themes arising from natural language data.

Word relationships.

An ongoing critique of research based on simple word frequency lists is that such approaches decontextualize the words in natural language expressions. To resolve this, investigators have focused on the relationships between words in natural language, rather than focusing primarily on their frequency. Several approaches to understanding the relationships between words exist, each with their own merits and limitations. We divide them here into those that focus on semantic networks, and the Key-Word-In-Context method.

Semantic networks.

Several approaches exist as attempts to reach greater understanding of the cognitive networks that underlie and connect words and their associated ideas within the human mind. Increasingly, these approaches may utilize Bayesian statistics, factor analysis and multidimensional scaling to generate graphical representations that depict the multiple ways in which certain words are dependent on others, or cluster together. Others use a combination of qualitative and quantitative approaches to generate graphical representations, which then often utilize a variety of arrows and expressions to relay the specific ways in which words in a natural language sample are related. Generally, such networks attempt to represent the relationships between ideas or concepts by delineating their propositions and productions.

While a variety of graphical semantic networks can be constructed from the data produced by Raven’s Eye, our focus is on revealing these semantic relations functionally. We do so by constructing natural language expressions, which we call revelatory themes. These revelatory themes both consist of, and proportionally represent, the semantic relations found in the original data as they are expressed in natural language. From these revelatory themes, we further facilitate the identification of their common and essential functional semantic structure. To do so, we utilize an advanced form of the venerable Key-Word-In-Context method, which we call CORVID.

Key-Word-In-Context (KWIC) method.

The Key-Word-In-Context, or KWIC method, is a means of comparing the context of particular words in a text, such that one can derive the specific ways in which a given word relates to other words. Variations of the KWIC method exist, but all begin by searching for every instance of a given word in a natural language sample. Each instance of the word in the sample is listed along with either the sentence containing the word, or some set number of words surrounding the word in the original natural language sample. Investigators then analyze the key word with respect the its surrounding words, in an effort to identify whether or not there are consistent relationships between the key word and other specific words.

Variants of the KWIC method are integrated in Raven's Eye.

Qualitative methods and phenomenology.

Methodological foundations.

Previous development.