2. Get a sense of the whole.

Once you have acquired your dataset, prepare it and upload it to the Raven’s Eye website according to the procedures discussed in our Practicals. While you prepare and upload your data, begin to get a sense of the whole by noting any obvious and general trends or patterns that seem to emerge without deep or substantial inspection1. Once your data has been uploaded, get a further sense of the whole by understanding its gists, and the geists acting on it. Doing so includes briefly and visually inspecting your metadata, main chart, main table, and the subsample comparisons table for a few of your variables. Therefore, you may want to again familiarize yourself with the pages of the Practicals focused on understanding your results and comparing your variables.

The gists and geists identified in the main table are apparent due to the comparison of the pool of words in your natural language responses to those in our multimillion- to multibillion-word background corpora. The gists and geists identified in the subsample comparisons table, however, result from comparison against both the remainder of your natural language sample (column 7), and our corpora (column 8).

On this page we describe in detail how to get a sense of the whole. However, this step in the procedures for conducting a Quantitative Phenomenology is not meant to be exhaustive. Rather, its purpose is to gain an understanding of the general form of your data . Indeed, and as you become practiced, in actuality it should take substantially less time to complete this step than it takes to read about it.

2.1 Identify overrepresented gists.

Examine column 5 of the main table and, as appropriate, columns 7 and 8 of the subsample comparisons table. These columns automatically reveal the relative overrepresentation of a given word in your natural language sample. The number presented in the Overrepresentation and Variable Overrepresentation columns (i.e., column 5, or columns 7 & 8 depending on the table that you are viewing) is the quotient produced when the proportion of the term in the sample is divided by the proportion of that same term in the background comparison corpus (columns 5 & 8), or the non-selected portion of the sample (column 7). Because it is a quotient, a word’s overrepresentation is also the factor by which the word’s proportion in the background corpus must be multiplied to arrive at its proportional use in the sample. Overrepresentation represents, therefore, the number of times more proportional a given word is found in a particular sample, when compared to its rate of use in the background corpus (or remainder of the non-selected sample).

As the numbers in the cells of these columns approach 1.0, their respective word is used at the same proportionality as found in the comparison corpus. As the numbers in these columns diverge positively from 1.0, their respective word is increasingly overrepresented, and therefore increasingly pertinent to the specific state or class of phenomenon being studied. As the numbers in the cells of these column diverge negatively from 1.0, their respective word is increasingly underrepresented, and therefore not expressly specific to the specific phenomenon being studied.

Identify those terms that are overrepresented to a degree sufficient to the context and purpose of your specific project2. These overrepresented terms will comprise the gists of your natural language sample.

Most times, if a specific word is highly overrepresented, several of its inflections and determiners will also be overrepresented in your data. Those inflections and determiners present in the data together form one gist3, which is labeled according to the most proportional overrepresented inflection in the natural language response (column 3 of the main table).

To inspect your gists for related inflections and determiners, scroll downward in the main table. Alternatively, you can inspect for word inflections by clicking on the Word column (column 1) in the table, so that the words become arranged alphabetically.

2.2 Identify potentially influential geists.

We operationalize the identification of potentially influential geists by accounting for the temporal, spatial, and demographic context of the natural language sample. We label each form of these potential geists as zeitgeist, ortgeist, and kulturgeist, respectively. Explicating the time, space, and, cultural context of the natural language sample in this manner facilitates a greater sense of its whole.

2.2.1 Metadata.

Raven’s Eye presents general measures of language use complexity in the Meta dropdown menu. These metadata consist of the Flesch Reading Ease Score for the sample, the Flesch-Kincaid Grade Level of the sample, the total number of sentences, words, and syllables in the sample, and the average number of words in each cell of the sample. These metadata provide a sense of the complexity and length of response, and as such may provide insight into general cognitive abilities and tendencies of the person or people providing the natural language sample. Additionally, such data are also known to be influenced by the times, spaces, and cultures in which natural language is produced. Therefore, and while the average cell word count can provide information about such things as verbosity, the relative importance of the topic, or the relative motivation of the people supplying the natural language sample, it is also influenced by geist-related factors. Without further analysis, the metadata present are reflective of some combination of these influences. As you inspect these other factors in the following substeps, watch for corresponding and substantial differences in metadata measures. Also note those that appear without obvious association.

2.2.2 Zeitgeist.

The potential influence of time on your natural language sample can be understood by making its timeframe explicit. To do so, define the scope of time involved in its production. If that scope is not instantaneous, inspect for patterns readily apparent across the time involved in producing the sample. Occasionally, the influence of time can also be readily identified through differential rates of grammatical tense, the presence of overrepresented anachronistic terms, or conversely, neologisms in the main table. During this step, it is important to note those potential time-based influences not otherwise already accounted for by your project’s existing variables.

2.2.3 Ortgeist.

The potential influence of spaces and environments on natural language expression can be better understood by making explicit the environment in which it was produced. To get a sense of potential ortgeist influences on the whole, describe the general environmental conditions under which the natural language sample was expressed. Next, inspect for patterns readily apparent across the spaces or geographies involved in your project. As with the previous step, it is important to note those potential ortgeists not otherwise already accounted for by your project’s existing spatial variables. Occasionally, environmental influence can also be readily identified in the main and subsample comparisons tables, via the presence of overrepresented terms associated with the specific space or geographical locale in which the natural language sample was acquired, or regionalisms prevalent in the area.

2.2.4 Kulturgeist.

The potential influences of cultural factors on natural language expression are identified by delineating the presence of differing demographic classes in your project, and noting their proportional representation in the natural language sample. Also worthy of note are cultural factors involved in the acquisition and interpretation of data. Occasionally, the influence of culture can also be readily identified through such aspects as differential personal pronoun use, or the overrepresentation of particular dialectal features, colloquialisms, or jargon in the main and subsample comparisons tables. To get a sense of potential kulturgeist influences on the whole, inspect for patterns readily apparent across the demographic states or classes involved in your project. Note those potential kulturgeists not otherwise already accounted for by your project’s existing variables.

2.3 Bracket accordingly.

Once you have identified the geists potentially influencing your natural language sample, bracket to account for those not otherwise already represented by the current states or classes of your existing variables. In Quantitative Phenomenology with Raven's Eye, bracketing is accomplished through a standardized, practical, and transparent means of creating actual brackets in your original spreadsheet. It thereby facilitates the deep and contextual exploration of data, while simultaneously preserving its original state. The process of bracketing is basically the same in each of the procedures involved in performing a Quantitative Phenomenology.

Bracketing to account for geists involves the creation of additional variables as new columns in your original spreadsheet, which can then be subjected to additional analyses. We recommend that such newly created variables be derived from the actual linguistic features of the natural language sample, or observable environmental or demographic conditions involved in the acquisition of the sample4. To aid in later recall and replication, we further recommend that each variable’s label be brief, and descriptive. You are otherwise free to bracket in any way that best reflects the apparent form of your data, or the function of your project, or both.

Once you have created and named a new variable (and thus created and labeled a new column in your original spreadsheet). You will need to determine the states or classes to assign to each case (and corresponding row in your spreadsheet). As with the variable’s label, we recommend that these states or classes be derived from the actual linguistic features of the natural language sample, or observable environmental or demographic conditions involved in the acquisition of the sample5. Such states or classes can be assigned based on any appropriate level of measurement, from nominal to ratio. To assign the state or class, you may either enter it sequentially into the appropriate cell of the newly created column one case at a time, or utilize the filter feature of your spreadsheet program to assign states or classes to multiple cases at once.

2.4 Reiterate, as needed.

If the process of bracketing appears to reveal additional and obvious gists or potentially influential geists, repeat substeps 2.1-2.3 until they are generally accounted for in your sense of the whole.

If, after you have gotten a sense of the whole, you have not changed the original structure of the data in your spreadsheet, you may proceed. If, however, you made additions to it based on the considerations in this step, you will need to upload it to Raven’s Eye as a new project, categorize your columns, and select it for analysis6. By default, you will arrive at the main table of your newly uploaded project.

For many projects, and many purposes, getting a sense of the whole via the identification of the top gists present in the word pool will suffice for your results. If such is the case in your project, you can download the results you have been otherwise visually inspecting, and produce messages, tables, graphs, or conduct further statistical analyses in your spreadsheet, word processing, or other programs.


1 There’s no need to read through your data at this point. We are recommending that you simply notice those patterns that become immediately present during data preparation and submission.

2 Given that the data in columns 3 and 5 of the main table are proportions converted to percentages, data in these columns may also be analyzed statistically at any time (appropriate to your particular project).

3 This organization follows the same basic structure as grammar, in which a lexeme represents the grouping together of various inflections for a given word, while the lemma (the canonical form of the word) serves as the headword or label for the lexeme. In Quantitative Phenomenology, the inflections and determiners present in the sample form a gist, which assumes as its headword or label the most proportional inflection.

4 This is a general recommendation. We can imagine a number of reasons for bracketing according to such things as researcher-constructed categories, treatment outcomes, or audience ratings, among others.

5 Again, this is a general purpose recommendation.

6 In this way, your major data transformations are demarcated by file, and thus facilitate easier replication and extension in subsequent projects.