Wednesday, September 7, 2011

Methods of Data Analysis in Qualitative Research

Methods of Data Analysis in Qualitative Research

1.Typology

a classification system, taken from patterns, themes, or other kinds of groups of data. categories should be mutually exclusive and exhaustive if possible, often they aren't. Basically a list of categories. example: Lofland and Lofland's 1st edition list: acts, activities, meanings, participation, relationships, settings (in the third edition they have ten units interfaced by three aspects--see page 114--and each cell in this matrix might be related to one of seven topics--see chapter seven).


2. Taxonomy:

(See Domain Analysis - often used together, especially developing taxonomy from a single domain.) James Spradley A sophisticated typology with multiple levels of concepts. Higher levels are inclusive of lower levels. Superordinate and subordinate categories

3. Constant Comparison/Grounded Theory

(widely used, developed in late 60's) Anselm Strauss

Look at document, such as field notes

Look for indicators of categories in events and behavior - name them and code them on document

Compare codes to find consistencies and differences

Consistencies between codes (similar meanings or pointing to a basic idea) reveals categories. So need to categorize specific events

We used to cut apart copies of field notes, now use computers. (Any good word processor can do this. Lofland says qualitative research programs aren't all that helpful and I tend to agree. Of the qualitative research programs I suspect that NUD*IST probably the best--see Sage Publishers).

Memo on the comparisons and emerging categories

Eventually category saturates when no new codes related to it are formed

Eventually certain categories become more central focus - axial categories and perhaps even core category.

4. Analytic Induction

(One of oldest methods, a very good one) Look at event and develop a hypothetical statement of what happened. Then look at another similar event and see if it fits the hypothesis. If it doesn't, revise hypothesis. Begin looking for exceptions to hypothesis, when find it, revise hypothesis to fit all

examples encountered. Eventually will develop a hypotheses that accounts for all observed cases.

5. Logical Analysis/Matrix Analysis

An outline of generalized causation, logical reasoning process, etc. Use flow charts, diagrams, etc. to pictorially represent these, as well as written descriptions. Matthew Miles and Huberman gives hundreds of varieties in their huge book Qualitative Data Analysis, 2nd ed.

6. Quasi-statistics

(count the # of times something is mentioned in field notes as very rough estimate of frequency) Howard Becker Often enumeration is used to provide evidence for categories created or to determine if observations are contaminated. (from LeCompte and Preissle).

7. Event Analysis/Microanalysis

(a lot like frame analysis, Erving Goffman) Frederick Erickson, Kurt Lewin, Edward Hall. Emphasis is on finding precise beginnings and endings of events by finding specific boundaries and things that mark boundaries or events. Specifically oriented toward film and video. After find boundaries, find phases in event by repeated viewing.

8. Metaphorical Analysis

(usually used in later stages of analysis) Michael Patton, Nick Smith Try on various metaphors and see how well they fit what is observed. Can also ask participant for metaphors and listen for spontaneous metaphors. "Hallway as a highway." Like highway in many ways: traffic, intersections, teachers as police, etc. Best to check validity of metaphor with participants - "member check".

9. Domain Analysis

(analysis of language of people in a cultural context) James Spradley Describe social situation and the cultural patterns within it. Semantic relationships. Emphasize the meanings of the social situation to participants. Interrelate the social situation and cultural meanings. Different kinds of domains: Folk domains (their terms for domains), mixed domains, analytic domains (researcher's terms for domains).

select semantic relationships

prepare domain analysis worksheet

select sample of field notes (statements of people studied)

look for broad and narrow terms to describe semantic relationships

formulate questions about those relationships

repeat process for different semantic relationship

list all domains discovered

10. Hermeneutical Analysis

(hermeneutics = making sense of a written text) Max Van Manen Not looking for objective meaning of text, but meaning of text for people in situation. Try to bracket self out in analysis - tell their story, not yours. Use their words, less interpretive than other approaches. Different layers of interpretation of text. Knowledge is constructed – we construct meaning of text (from background and current situation - Social construction because of influence of others - symbolic interactionism) Use context - time and place of writing - to understand. What was cultural situation? Historical context. Meaning resides in author intent/purpose, context, and the encounter between author and reader - find themes and relate to dialectical context. (Some say authorial intent is impossible to ascertain.) Videotape - probably needs to be secondary level of analysis. Get with another person who is using another method and analyze their field notes.

11. Discourse analysis

(linguistic analysis of ongoing flow of communication) James Gee Usually use tapes so they can be played and replayed. Several people discussing, not individual person specifically. Find patterns of questions, who dominates time and how, other patterns of interaction.

12. Semiotics

(science of signs and symbols, such as body language) Peter Manning Determine how the meanings of signs and symbols is constructed. Assume meaning is not inherent in those, meaning comes from relationships with other things. Sometimes presented with a postmodernist emphasis.

13. Content Analysis

(not very good with video and only qualitative in development of \categories - primarily quantitative) (Might be considered a specific form of typological analysis) R. P. Weber Look at documents, text, or speech to see what themes emerge. What do people talk about the most? See how themes relate to each other. Find latent emphases, political view of newspaper writer, which is implicit or look at surface level - overt emphasis. Theory driven - theory determines what you look for. Rules are specified for data analysis. Standard rules of content analysis include:

How big a chunk of data is analyzed at a time (a line, a sentence, a phrase, a paragraph?) Must state and stay with it.

What are units of meaning?, the categories used. Categories must be:

1. Inclusive (all examples fit a category) 2. Mutually exclusive

Defined precisely: what are properties

All data fits some category (exhaustive)

Also note context. Start by reading all way through, then specify rules. Could have emergent theory, but usually theory-driven. After determine categories, do the counting - how often do categories occur. Most of literature emphasizes the quantitative aspects. Originated with analyzing newspaper articles for bias - counting things in print. Very print oriented - can it be adapted for visual and verbal?

14. Phenomenology/Heuristic Analysis

(phenomenological emphasis - how individuals experience the world) Clark Moustakas Emphasizes idiosyncratic meaning to individuals, not shared constructions as much. Again, try to bracket self out and enter into the other person's perspective and experience. Emphasizes the effects of research experience on the researcher-personal experience of the research. How does this affect me as researcher. Much like hermeneutical analysis, but even more focused on the researcher's experience. Some use the term "phenomenology" to describe the researcher's experience and the idea that this is all research is or can ever be (see Lofland and Lofland, p. 14).

15. Narrative Analysis

(study the individual's speech) Catherine Reisman Overlaps with other approaches. (Is it distinctive?) Discourse analysis looks at interaction, narrative is more individual) The story is what a person shares about self. What you choose to tell frames how you will be perceived. Always compare ideas about self. Tend to avoid revealing negatives about self. Might study autobiographies and compare them.

context-situation

core plot in the story told about self

basic actions

Narrative analysis could involve study of literature or diaries or folklore.

References

Taxonomic Analysis: James P. Spradley (1980). Participant observation. Fort Worth: Harcourt Brace.

Typological Systems: John Lofland & Lyn H. Lofland (1995). Analyzing social settings, 3rd ed. Belmont, Cal.: Wadsworth.

Constant Comparison: Anselm L. Strauss (1987). Qualitative analysis for social scientists. New York: Cambridge University Press.

Case Study Analysis: Sharon Merriam (1988). Case study research in education. Jossey-Bass.

Ethnostatistics: Robert P. Gephart (1988). Ethnostatistics: Qualitative foundations for quantitative research. Newbury Park, Cal.: Sage Publications.

Logical Analysis/Matrix Analysis: Miles, M. B., & Huberman, A. M. (1994). Qualitative data

analysis, 2nd ed. Newbury Park, Cal.: Sage. [Note: I think this may well be the best book available on qualitative data analysis.]

Phenomenological/Heuristic Research: Moustakas, C. (1990). Heuristic Research. Newbury Park, Cal.: Sage; and Moustakas, C. (1994). Phenomenological research methods. Newbury Park, Cal.: Sage.

Event Analysis/Microanalysis: Frederick Erickson (1992). Ethnographic microanalysis of interaction. In M. LeCompte, et. al. (Eds), The handbook of qualitative research in education (chapter 5). San Diego: Academic Press.

Analytic Induction: Jack Katz (1983). A theory of qualitative methodology. In R. M. Emerson (Ed.), Contemporary field research. Prospect Heights, Ill.: Waveland.

Hermeneutical Analysis: Max Van Manen (1990). Researching lived experience. New York: State University of New York Press.

Semiotics: Peter K. Manning (1987). Semiotics and fieldwork. Newbury Park, Cal.: Sage.

Discourse Analysis: James P. Gee (1992). Discourse analysis. In M. LeCompte, et. al. (Eds), The handbook of qualitative research in education (chapter 6). San Diego: Academic Press.

Narrative Analysis: Catherine K. Reisman (1993). Narrative analysis. Newbury Park, Cal.: Sage.

Content Analysis: R. P. Weber (1990). Basic content analysis. Newbury Park, Cal.: Sage.

Domain Analysis: James P. Spradley (1980). Participant observation. Fort Worth: Harcourt Brace. Also see J. P. Spradley, Ethnographic interview (1979, same publisher).

Metaphorical Analysis: Nick Smith (1981). Metaphors for evaluation. Newbury Park, Cal.: Sage.

Content Analysis

An Introduction to Content Analysis

Content analysis is a research tool used to determine the presence of certain words or concepts within texts or sets of texts. Researchers quantify and analyze the presence, meanings and relationships of such words and concepts, then make inferences about the messages within the texts, the writer(s), the audience, and even the culture and time of which these are a part. Texts can be defined broadly as books, book chapters, essays, interviews, discussions, newspaper headlines and articles, historical documents, speeches, conversations, advertising, theater, informal conversation, or really any occurrence of communicative language. Texts in a single study may also represent a variety of different types of occurrences, such as Palmquist's 1990 study of two composition classes, in which he analyzed student and teacher interviews, writing journals, classroom discussions and lectures, and out-of-class interaction sheets. To conduct a content analysis on any such text, the text is coded, or broken down, into manageable categories on a variety of levels--word, word sense, phrase, sentence, or theme--and then examined using one of content analysis' basic methods: conceptual analysis or relational analysis.

A Brief History of Content Analysis

Historically, content analysis was a time consuming process. Analysis was done manually, or slow mainframe computers were used to analyze punch cards containing data punched in by human coders. Single studies could employ thousands of these cards. Human error and time constraints made this method impractical for large texts. However, despite its impracticality, content analysis was already an often utilized research method by the 1940's. Although initially limited to studies that examined texts for the frequency of the occurrence of identified terms (word counts), by the mid-1950's researchers were already starting to consider the need for more sophisticated methods of analysis, focusing on concepts rather than simply words, and on semantic relationships rather than just presence (de Sola Pool 1959). While both traditions still continue today, content analysis now is also utilized to explore mental models, and their linguistic, affective, cognitive, social, cultural and historical significance.

Uses of Content Analysis

Perhaps due to the fact that it can be applied to examine any piece of writing or occurrence of recorded communication, content analysis is currently used in a dizzying array of fields, ranging from marketing and media studies, to literature and rhetoric, ethnography and cultural studies, gender and age issues, sociology and political science, psychology and cognitive science, and many other fields of inquiry. Additionally, content analysis reflects a close relationship with socio- and psycholinguistics, and is playing an integral role in the development of artificial intelligence. The following list (adapted from Berelson, 1952) offers more possibilities for the uses of content analysis:

  • Reveal international differences in communication content
  • Detect the existence of propaganda
  • Identify the intentions, focus or communication trends of an individual, group or institution
  • Describe attitudinal and behavioral responses to communications
  • Determine psychological or emotional state of persons or groups

Please see the Examples folder for more examples of content analysis in use

Types of Content Analysis

Conceptual Analysis

Traditionally, content analysis has most often been thought of in terms of conceptual analysis. In conceptual analysis, a concept is chosen for examination, and the analysis involves quantifying and tallying its presence. Also known as thematic analysis [although this term is somewhat problematic, given its varied definitions in current literature--see Palmquist, Carley, & Dale (1997) vis-a-vis Smith (1992)], the focus here is on looking at the occurrence of selected terms within a text or texts, although the terms may be implicit as well as explicit. While explicit terms obviously are easy to identify, coding for implicit terms and deciding their level of implication is complicated by the need to base judgments on a somewhat subjective system. To attempt to limit the subjectivity, then (as well as to limit problems of reliability and validity), coding such implicit terms usually involves the use of either a specialized dictionary or contextual translation rules. And sometimes, both tools are used--a trend reflected in recent versions of the Harvard and Lasswell dictionaries.


Methods of Conceptual Analysis

Conceptual analysis begins with identifying research questions and choosing a sample or samples. Once chosen, the text must be coded into manageable content categories. The process of coding is basically one of selective reduction. By reducing the text to categories consisting of a word, set of words or phrases, the researcher can focus on, and code for, specific words or patterns that are indicative of the research question.

An example of a conceptual analysis would be to examine several Clinton speeches on health care, made during the 1992 presidential campaign, and code them for the existence of certain words. In looking at these speeches, the research question might involve examining the number of positive words used to describe Clinton's proposed plan, and the number of negative words used to describe the current status of health care in America. The researcher would be interested only in quantifying these words, not in examining how they are related, which is a function of relational analysis. In conceptual analysis, the researcher simply wants to examine presence with respect to his/her research question, i.e. is there a stronger presence of positive or negative words used with respect to proposed or current health care plans, respectively.

Once the research question has been established, the researcher must make his/her coding choices with respect to the eight category coding steps indicated by Carley (1992).


Steps for Conducting Conceptual Analysis

The following discussion of steps that can be followed to code a text or set of texts during conceptual analysis use campaign speeches made by Bill Clinton during the 1992 presidential campaign as an example. To read about each step, click on the items in the list below:

1. Decide the level of analysis.

2. Decide how many concepts to code for.

3. Decide whether to code for existence or frequency of a concept.

4. Decide on how you will distinguish among concepts.

5. Develop rules for coding your texts.

6. Decide what to do with "irrelevant" information.

7. Code the texts.

8. Analyze your results.

Step One: Decide the Level of Analysis

First, the researcher must decide upon the level of analysis. With the health care speeches, to continue the example, the researcher must decide whether to code for a single word, such as "inexpensive," or for sets of words or phrases, such as "coverage for everyone."

Step Two: Decide How Many Concepts to Code For

The researcher must now decide how many different concepts to code for. This involves developing a pre-defined or interactive set of concepts and categories. The researcher must decide whether or not to code for every single positive or negative word that appears, or only certain ones that the researcher determines are most relevant to health care. Then, with this pre-defined number set, the researcher has to determine how much flexibility he/she allows him/herself when coding. The question of whether the researcher codes only from this pre-defined set, or allows him/herself to add relevant categories not included in the set as he/she finds them in the text, must be answered. Determining a certain number and set of concepts allows a researcher to examine a text for very specific things, keeping him/her on task. But introducing a level of coding flexibility allows new, important material to be incorporated into the coding process that could have significant bearings on one's results.

Step Three: Decide Whether to Code for Existence or Frequency of a Concept

After a certain number and set of concepts are chosen for coding , the researcher must answer a key question: is he/she going to code for existence or frequencyg? This is important, because it changes the coding process. When codin for existence, "inexpensive" would only be counted once, no matter how many times it appeared. This would be a very basic coding process and would give the researcher a very limited perspective of the text. However, the number of times "inexpensive" appears in a text might be more indicative of importance. Knowing that "inexpensive" appeared 50 times, for example, compared to 15 appearances of "coverage for everyone," might lead a researcher to interpret that Clinton is trying to sell his health care plan based more on economic benefits, not comprehensive coverage. Knowing that "inexpensive" appeared, but not that it appeared 50 times, would not allow the researcher to make this interpretation, regardless of whether it is valid or not.

Step Four: Decide on How You Will Distinguish Among Concepts

The researcher must next decide on the level of generalization, i.e. whether concepts are to be coded exactly as they appear, or if they can be recorded as the same even when they appear in different forms. For example, "expensive" might also appear as "expensiveness." The research needs to determine if the two words mean radically different things to him/her, or if they are similar enough that they can be coded as being the same thing, i.e. "expensive words." In line with this, is the need to determine the level of implication one is going to allow. This entails more than subtle differences in tense or spelling, as with "expensive" and "expensiveness." Determining the level of implication would allow the researcher to code not only for the word "expensive," but also for words that imply "expensive." This could perhaps include technical words, jargon, or political euphemism, such as "economically challenging," that the researcher decides does not merit a separate category, but is better represented under the category "expensive," due to its implicit meaning of "expensive."

Step Five: Develop Rules for Coding Your Texts

After taking the generalization of concepts into consideration, a researcher will want to create translation rules that will allow him/her to streamline and organize the coding process so that he/she is coding for exactly what he/she wants to code for. Developing a set of rules helps the researcher insure that he/she is coding things consistently throughout the text, in the same way every time. If a researcher coded "economically challenging" as a separate category from "expensive" in one paragraph, then coded it under the umbrella of "expensive" when it occurred in the next paragraph, his/her data would be invalid. The interpretations drawn from that data will subsequently be invalid as well. Translation rules protect against this and give the coding process a crucial level of consistency and coherence.

Step Six: Decide What To Do with "Irrelevant" Information

The next choice a researcher must make involves irrelevant information. The researcher must decide whether irrelevant information should be ignored (as Weber, 1990, suggests), or used to reexamine and/or alter the coding scheme. In the case of this example, words like "and" and "the," as they appear by themselves, would be ignored. They add nothing to the quantification of words like "inexpensive" and "expensive" and can be disregarded without impacting the outcome of the coding.

Step Seven: Code the Texts

Once these choices about irrelevant information are made, the next step is to code the text. This is done either by hand, i.e. reading through the text and manually writing down concept occurrences, or through the use of various computer programs. Coding with a computer is one of contemporary conceptual analysis' greatest assets. By inputting one's categories, content analysis programs can easily automate the coding process and examine huge amounts of data, and a wider range of texts, quickly and efficiently. But automation is very dependent on the researcher's preparation and category construction. When coding is done manually, a researcher can recognize errors far more easily. A computer is only a tool and can only code based on the information it is given. This problem is most apparent when coding for implicit information, where category preparation is essential for accurate coding.

Step Eight: Analyze Your Results

Once the coding is done, the researcher examines the data and attempts to draw whatever conclusions and generalizations are possible. Of course, before these can be drawn, the researcher must decide what to do with the information in the text that is not coded. One's options include either deleting or skipping over unwanted material, or viewing all information as relevant and important and using it to reexamine, reassess and perhaps even alter one's coding scheme. Furthermore, given that the conceptual analyst is dealing only with quantitative data, the levels of interpretation and generalizability are very limited. The researcher can only extrapolate as far as the data will allow. But it is possible to see trends, for example, that are indicative of much larger ideas. Using the example from step three, if the concept "inexpensive" appears 50 times, compared to 15 appearances of "coverage for everyone," then the researcher can pretty safely extrapolate that there does appear to be a greater emphasis on the economics of the health care plan, as opposed to its universal coverage for all Americans. It must be kept in mind that conceptual analysis, while extremely useful and effective for providing this type of information when done right, is limited by its focus and the quantitative nature of its examination. To more fully explore the relationships that exist between these concepts, one must turn to relational analysis.


Relational Analysis

Relational analysis, like conceptual analysis, begins with the act of identifying concepts present in a given text or set of texts. However, relational analysis seeks to go beyond presence by exploring the relationships between the concepts identified. Relational analysis has also been termed semantic analysis (Palmquist, Carley, & Dale, 1997). In other words, the focus of relational analysis is to look for semantic, or meaningful, relationships. Individual concepts, in and of themselves, are viewed as having no inherent meaning. Rather, meaning is a product of the relationships among concepts in a text. Carley (1992) asserts that concepts are "ideational kernels;" these kernels can be thought of as symbols which acquire meaning through their connections to other symbols.


Theoretical Influences on Relational Analysis

The kind of analysis that researchers employ will vary significantly according to their theoretical approach. Key theoretical approaches that inform content analysis include linguistics and cognitive science.

Linguistic approaches to content analysis focus analysis of texts on the level of a linguistic unit, typically single clause units. One example of this type of research is Gottschalk (1975), who developed an automated procedure which analyzes each clause in a text and assigns it a numerical score based on several emotional/psychological scales. Another technique is to code a text grammatically into clauses and parts of speech to establish a matrix representation (Carley, 1990).

Approaches that derive from cognitive science include the creation of decision maps and mental models. Decision maps attempt to represent the relationship(s) between ideas, beliefs, attitudes, and information available to an author when making a decision within a text. These relationships can be represented as logical, inferential, causal, sequential, and mathematical relationships. Typically, two of these links are compared in a single study, and are analyzed as networks. For example, Heise (1987) used logical and sequential links to examine symbolic interaction. This methodology is thought of as a more generalized cognitive mapping technique, rather than the more specific mental models approach.

Mental models are groups or networks of interrelated concepts that are thought to reflect conscious or subconscious perceptions of reality. According to cognitive scientists, internal mental structures are created as people draw inferences and gather information about the world. Mental models are a more specific approach to mapping because beyond extraction and comparison because they can be numerically and graphically analyzed. Such models rely heavily on the use of computers to help analyze and construct mapping representations. Typically, studies based on this approach follow five general steps:

Relational Analysis: Overview of Methods

As with other sorts of inquiry, initial choices with regard to what is being studied and/or coded for often determine the possibilities of that particular study. For relational analysis, it is important to first decide which concept type(s) will be explored in the analysis. Studies have been conducted with as few as one and as many as 500 concept categories. Obviously, too many categories may obscure your results and too few can lead to unreliable and potentially invalid conclusions. Therefore, it is important to allow the context and necessities of your research to guide your coding procedures.

Three Subcategories of Relational Analysis

Affect extraction: This approach provides an emotional evaluation of concepts explicit in a text. It is problematic because emotion may vary across time and populations. Nevertheless, when extended it can be a potent means of exploring the emotional/psychological state of the speaker and/or writer. Gottschalk (1995) provides an example of this type of analysis. By assigning concepts identified a numeric value on corresponding emotional/psychological scales that can then be statistically examined, Gottschalk claims that the emotional/psychological state of the speaker or writer can be ascertained via their verbal behavior.

Proximity analysis: This approach, on the other hand, is concerned with the co-occurrence of explicit concepts in the text. In this procedure, the text is defined as a string of words. A given length of words, called a window, is determined. The window is then scanned across a text to check for the co-occurrence of concepts. The result is the creation of a concept determined by the concept matrix. In other words, a matrix, or a group of interrelated, co-occurring concepts, might suggest a certain overall meaning. The technique is problematic because the window records only explicit concepts and treats meaning as proximal co-occurrence. Other techniques such as clustering, grouping, and scaling are also useful in proximity analysis.

Cognitive mapping: This approach is one that allows for further analysis of the results from the two previous approaches. It attempts to take the above processes one step further by representing these relationships visually for comparison. Whereas affective and proximal analysis function primarily within the preserved order of the text, cognitive mapping attempts to create a model of the overall meaning of the text. This can be represented as a graphic map that represents the relationships between concepts.

In this manner, cognitive mapping lends itself to the comparison of semantic connections across texts. This is known as map analysis which allows for comparisons to explore "how meanings and definitions shift across people and time" (Palmquist, Carley, & Dale, 1997). Maps can depict a variety of different mental models (such as that of the text, the writer/speaker, or the social group/period), according to the focus of the researcher. This variety is indicative of the theoretical assumptions that support mapping: mental models are representations of interrelated concepts that reflect conscious or subconscious perceptions of reality; language is the key to understanding these models; and these models can be represented as networks (Carley, 1990). Given these assumptions, it's not surprising to see how closely this technique reflects the cognitive concerns of socio-and psycholinguistics, and lends itself to the development of artificial intelligence models.

The steps to relational analysis that we consider in this guide suggest some of the possible avenues available to a researcher doing content analysis. We provide an example to make the process easier to grasp. However, the choices made within the context of the example are but only a few of many possibilities. The diversity of techniques available suggests that there is quite a bit of enthusiasm for this mode of research. Once a procedure is rigorously tested, it can be applied and compared across populations over time. The process of relational analysis has achieved a high degree of computer automation but still is, like most forms of research, time consuming. Perhaps the strongest claim that can be made is that it maintains a high degree of statistical rigor without losing the richness of detail apparent in even more qualitative methods.

Steps for Conducting Relational Analysis

The following discussion of the steps (or, perhaps more accurately, strategies) that can be followed to code a text or set of texts during relational analysis. These explanations are accompanied by examples of relational analysis possibilities for statements made by Bill Clinton during the 1998 hearings. To read about each step, click on the items in the list below:

1. Identify the Question.

2. Choose a sample or samples for analysis.

3. Determine the type of analysis.

4. Reduce the text to categories and code for words or patterns.

5. Explore the relationships between concepts (Strength, Sign & Direction).

6. Code the relationships.

7. Perform Statisical Analyses.

8. Map out the Representations.

Step One: Identify the Question

The question is important because it indicates where you are headed and why. Without a focused question, the concept types and options open to interpretation are limitless and therefore the analysis difficult to complete. Possibilities for the Hairy Hearings of 1998 might be:

What did Bill Clinton say in the speech? OR What concrete information did he present to the public?

Step Two: Choose a Sample or Samples for Analysis

Once the question has been identified, the researcher must select sections of text/speech from the hearings in which Bill Clinton may have not told the entire truth or is obviously holding back information. For relational content analysis, the primary consideration is how much information to preserve for analysis. One must be careful not to limit the results by doing so, but the researcher must also take special care not to take on so much that the coding process becomes too heavy and extensive to supply worthwhile results.

Step Three: Determine the Type of Relationships to Examine

Once the sample has been chosen for analysis, it is necessary to determine what type or types of relationships you would like to examine. There are different subcategories of relational analysis that can be used to examine the relationships in texts. For more information regarding subcategories of relational analysis, see the discussion of Three Subcategories of Relational Analysis.

In this example, we will use proximity analysis because it is concerned with the co-occurrence of explicit concepts in the text. In this instance, we are not particularly interested in affect extraction because we are trying to get to the hard facts of what exactly was said rather than determining the emotional considerations of speaker and receivers surrounding the speech which may be unrecoverable.

Once the subcategory of analysis is chosen, the selected text must be reviewed to determine the level of analysis. The researcher must decide whether to code for a single word, such as "perhaps," or for sets of words or phrases like "I may have forgotten."

Step Four: Reduce the Text to Categories and Code for Words or Patterns

At the simplest level, a researcher can code merely for existence. This is not to say that simplicity of procedure leads to simplistic results. Many studies have successfully employed this strategy. For example, Palmquist (1990) did not attempt to establish the relationships among concept terms in the classrooms he studied; his study did, however, look at the change in the presence of concepts over the course of the semester, comparing a map analysis from the beginning of the semester to one constructed at the end. On the other hand, the requirement of one's specific research question may necessitate deeper levels of coding to preserve greater detail for analysis.

In relation to our extended example, the researcher might code for how often Bill Clinton used words that were ambiguous, held double meanings, or left an opening for change or "re-evaluation." The researcher might also choose to code for what words he used that have such an ambiguous nature in relation to the importance of the information directly related to those words.

Step Five: Explore the Relationships Between Concepts

Once words are coded, the text can be analyzed for the relationships among the concepts set forth. There are three concepts which play a central role in exploring the relations among concepts in content analysis.

a. Strength of Relationship: Refers to the degree to which two or more concepts are related. These relationships are easiest to analyze, compare, and graph when all relationships between concepts are considered to be equal. However, assigning strength to relationships retains a greater degree of the detail found in the original text. Identifying strength of a relationship is key when determining whether or not words like unless, perhaps, or maybe are related to a particular section of text, phrase, or idea.

b. Sign of a Relationship: Refers to whether or not the concepts are positively or negatively related. To illustrate, the concept "bear" is negatively related to the concept "stock market" in the same sense as the concept "bull" is positively related. Thus "it's a bear market" could be coded to show a negative relationship between "bear" and "market". Another approach to coding for strength entails the creation of separate categories for binary oppositions. The above example emphasizes "bull" as the negation of "bear," but could be coded as being two separate categories, one positive and one negative. There has been little research to determine the benefits and liabilities of these differing strategies. Use of Sign coding for relationships in regard to the hearings my be to find out whether or not the words under observation or in question were used adversely or in favor of the concepts (this is tricky, but important to establishing meaning).

c. Direction of the Relationship: Refers to the type of relationship categories exhibit. Coding for this sort of information can be useful in establishing, for example, the impact of new information in a decision making process. Various types of directional relationships include, "X implies Y," "X occurs before Y" and "if X then Y," or quite simply the decision whether concept X is the "prime mover" of Y or vice versa. In the case of the 1998 hearings, the researcher might note that, "maybe implies doubt," "perhaps occurs before statements of clarification," and "if possibly exists, then there is room for Clinton to change his stance." In some cases, concepts can be said to be bi-directional, or having equal influence. This is equivalent to ignoring directionality. Both approaches are useful, but differ in focus. Coding all categories as bi-directional is most useful for exploratory studies where pre-coding may influence results, and is also most easily automated, or computer coded.

Step Six: Code the Relationships

One of the main differences between conceptual analysis and relational analysis is that the statements or relationships between concepts are coded. At this point, to continue our extended example, it is important to take special care with assigning value to the relationships in an effort to determine whether the ambiguous words in Bill Clinton's speech are just fillers, or hold information about the statements he is making.

Step Seven: Perform Statistical Analyses

This step involves conducting statistical analyses of the data you've coded during your relational analysis. This may involve exploring for differences or looking for relationships among the variables you've identified in your study. For more information about conducting statistical analysis, see our reference unit on Statistics.

Step Eight: Map the Representations

In addition to statistical analysis, relational analysis often leads to viewing the representations of the concepts and their associations in a text (or across texts) in a graphical -- or map -- form. Relational analysis is also informed by a variety of different theoretical approaches: linguistic content analysis, decision mapping, and mental models.

See the Palmquist, Carley, and Dale study for an example of mapping.

The Palmquist, Carley and Dale Study

Consider these two questions: How has the depiction of robots changed over more than a centuryĆ¢€™s worth of writing? And, do students and writing instructors share the same terms for describing the writing process? Although these questions seem totally unrelated, they do share a commonality: in the Palmquist, Carley & Dale study, their answers rely on computer-aided text analysis to demonstrate how different texts can be analyzed.

Literary texts

One half of the study explored the depiction of robots in 27 science fiction texts written between 1818 and 1988. After texts were divided into three historically defined groups, readers look for how the depiction of robots has changed over time. To do this, researchers had to create concept lists and relationship types, create maps using a computer software (see Fig. 1), modify those maps and then ultimately analyze them. The final product of the analysis revealed that over time authors were less likely to depict robots as metallic humanoids.


Figure 1: A map representing relationships among concepts.

Non-literary texts

The second half of the study used student journals and interviews, teacher interviews, texts books, and classroom observations as the non-literary texts from which concepts and words were taken. The purpose behind the study was to determine if, in fact, over time teacher and students would begin to share a similar vocabulary about the writing process. Again, researchers used computer software to assist in the process. This time, computers helped researchers generated a concept list based on frequently occurring words and phrases from all texts. Maps were also created and analyzed in this study (see Fig. 2).


Figure 2: Pairs of co-occurring words drawn from a source text

Content Analysis: Commentary

The authors of this guide have created the following commentaries on content analysis. To read these commentaries, please click on the items below:

Issues of Reliability & Validity

The issues of reliability and validity are concurrent with those addressed in other research methods. The reliability of a content analysis study refers to its stability, or the tendency for coders to consistently re-code the same data in the same way over a period of time; reproducibility, or the tendency for a group of coders to classify categories membership in the same way; and accuracy, or the extent to which the classification of a text corresponds to a standard or norm statistically. Gottschalk (1995) points out that the issue of reliability may be further complicated by the inescapably human nature of researchers. For this reason, he suggests that coding errors can only be minimized, and not eliminated (he shoots for 80% as an acceptable margin for reliability).

On the other hand, the validity of a content analysis study refers to the correspondence of the categories to the conclusions, and the generalizability of results to a theory.

The validity of categories in implicit concept analysis, in particular, is achieved by utilizing multiple classifiers to arrive at an agreed upon definition of the category. For example, a content analysis study might measure the occurrence of the concept category "communist" in presidential inaugural speeches. Using multiple classifiers, the concept category can be broadened to include synonyms such as "red," "Soviet threat," "pinkos," "godless infidels" and "Marxist sympathizers." "Communist" is held to be the explicit variable, while "red," etc. are the implicit variables.

The overarching problem of concept analysis research is the challengeable nature of conclusions reached by its inferential procedures. The question lies in what level of implication is allowable, i.e. do the conclusions follow from the data or are they explainable due to some other phenomenon? For occurrence-specific studies, for example, can the second occurrence of a word carry equal weight as the ninety-ninth? Reasonable conclusions can be drawn from substantive amounts of quantitative data, but the question of proof may still remain unanswered.

This problem is again best illustrated when one uses computer programs to conduct word counts. The problem of distinguishing between synonyms and homonyms can completely throw off one's results, invalidating any conclusions one infers from the results. The word "mine," for example, variously denotes a personal pronoun, an explosive device, and a deep hole in the ground from which ore is extracted. One may obtain an accurate count of that word's occurrence and frequency, but not have an accurate accounting of the meaning inherent in each particular usage. For example, one may find 50 occurrences of the word "mine." But, if one is only looking specifically for "mine" as an explosive device, and 17 of the occurrences are actually personal pronouns, the resulting 50 is an inaccurate result. Any conclusions drawn as a result of that number would render that conclusion invalid.

The generalizability of one's conclusions, then, is very dependent on how one determines concept categories, as well as on how reliable those categories are. It is imperative that one defines categories that accurately measure the idea and/or items one is seeking to measure. Akin to this is the construction of rules. Developing rules that allow one, and others, to categorize and code the same data in the same way over a period of time, referred to as stability, is essential to the success of a conceptual analysis. Reproducibility, not only of specific categories, but of general methods applied to establishing all sets of categories, makes a study, and its subsequent conclusions and results, more sound. A study which does this, i.e. in which the classification of a text corresponds to a standard or norm, is said to have accuracy.

Example of a Problematic Text for Content Analysis

In this example, both students observed a scientist and were asked to write about the experience.

Student A: I found that scientists engage in research in order to make discoveries and generate new ideas. Such research by scientists is hard work and often involves collaboration with other scientists which leads to discoveries which make the scientists famous. Such collaboration may be informal, such as when they share new ideas over lunch, or formal, such as when they are co-authors of a paper.

Student B: It was hard work to research famous scientists engaged in collaboration and I made many informal discoveries. My research showed that scientists engaged in collaboration with other scientists are co-authors of at least one paper containing their new ideas. Some scientists make formal discoveries and have new ideas.

Content analysis coding for explicit concepts may not reveal any significant differences. For example, the existence of "I, scientist, research, hard work, collaboration, discoveries, new ideas, etc..." are explicit in both texts, occur the same number of times, and have the same emphasis. Relational analysis or cognitive mapping, however, reveals that while all concepts in the text are shared, only five concepts are common to both. Analyzing these statements reveals that Student A reports on what "I" found out about "scientists," and elaborated the notion of "scientists" doing "research." Student B focuses on what "I's" research was and sees scientists as "making discoveries" without emphasis on research.

Advantages of Content Analysis

Content analysis offers several advantages to researchers who consider using it. In particular, content analysis:

  • looks directly at communication via texts or transcripts, and hence gets at the central aspect of social interaction
  • can allow for both quantitative and qualitative operations
  • can provides valuable historical/cultural insights over time through analysis of texts
  • allows a closeness to text which can alternate between specific categories and relationships and also statistically analyzes the coded form of the text
  • can be used to interpret texts for purposes such as the development of expert systems (since knowledge and rules can both be coded in terms of explicit statements about the relationships among concepts)
  • is an unobtrusive means of analyzing interactions
  • provides insight into complex models of human thought and language use

Disadvantages of Content Analysis

Content analysis suffers from several disadvantages, both theoretical and procedural. In particular, content analysis:

  • can be extremely time consuming
  • is subject to increased error, particularly when relational analysis is used to attain a higher level of interpretation
  • is often devoid of theoretical base, or attempts too liberally to draw meaningful inferences about the relationships and impacts implied in a study
  • is inherently reductive, particularly when dealing with complex texts
  • tends too often to simply consist of word counts
  • often disregards the context that produced the text, as well as the state of things after the text is produced
  • can be difficult to automate or computerize