Digital Stylistic Analyses of Genre
in 19th Century Spanish American Novels



Ulrike Henny-Krahmer
(CLiGS, University of Würzburg)


Seminar "Digital Approaches to Literature", University of Düsseldorf
February 1, 2019


Slides at: https://hennyu.github.io/due_19/

Overview

  • Context: Digital Stylistics, Genre Stylistics, CLiGS
  • Case study 1: Topics, genre, and text development
  • Case study 2: Sentiments and genre
  • Conclusions

Context: Digital Stylistics, Genre Stylistics, CLiGS

Digital Stylistics

  • digital style analysis: computer-aided analysis of literary style
  • "stylometry", authorship attribution (overview: Juola 2006)
  • quantitative methods
  • literary studies, (computational) linguistics, information science

Genre Stylistics

  • relationship between genre and style?
  • "stylometry beyond authorship"

CLiGS

  • Junior Research Group "Computergestützte Literarische Gattungsstilistik"
  • University of Würzburg, funded by BMBF (2015-2019)
  • Genre Stylistics for literary corpora in Romance languages
  • broad concept of style: quantifiable textual features

Case study 1: Topics, genre, and text development

Topics, genre, and text development

  • Background:
    • Topic modeling (Blei 2011)
    • How to model text (and plot) development? (Jockers 2015, Schmidt 2014)
  • Questions:
    • How are topics and subgenres related?
    • How are topics related to text development?
    • How are topics in text development related to subgenres?

Data

  • 128 novels in Spanish language (from Spain, Argentina, Cuba, Mexico)
  • 1850-1930
  • subgenres: sentimental, historical, socio-political, subjective
  • 7,318,130 tokens

Method: Topic Modeling

  • algorithms to discover "hidden thematic structure in large archives of documents" (Blei 2011)
  • based on assumptions from distributional semantics
  • groups of semantically related words are detected based on their joint occurrence in the documents
  • topic: probability distribution of words
  • document: probability distribution of topics

(David Blei, "Introduction to Probabilistic Topic Models", 2011)

Method: Topic Modeling

  • tools: Mallet (McCallum 2002), tmw (Schöch et al.)
  • parameters:
    • text segment length: 1500 words
    • just nouns
    • without the 70 MFW
    • number of topics: 70
    • 6 bins (sections) per text

Results: Topics

Results: Topics

Results: Subgenres

Distinctive topics for subgenres

Results: Text development

Important topics at the beginning

Results: Text development

Important topics at the end

Results: Topics, subgenres, and text development

School

Results: Topics, subgenres, and text development

Illness

Conclusions: Topics, genre, and text development

  • there are: topics which are specific for certain subgenres,
  • topics which are typical for specific points in the text development,
  • topics which are characteristic for the beginning or end of a subgenre,
  • but: rather exceptions than the general rule!

Case study 2: Sentiments and genre

Sentiments and genre: Background

  • Sentiment Analysis: computational treatment of sentiment, opinion, emotion in text (Pang and Lee, 2008)
  • Sentiments modelled as polarity values or emotion values
  • Method has been used for genre analysis (Kim et al, 2017; Zehe et al., 2016)
  • Here: analysis of Spanish American novels

Aims & Hypotheses

  • Exploration of relationship between sentiments and genre in 19th c. Spanish-American novels
  • Sentiments as linguistic manifestations of emotions on the textual surface
  • Hypothesis 1: degree and kind of emotionality differs for different subgenres
  • Hypothesis 2: it matters, whether emotions are expressed in direct speech or narrated text
  • Sidegoal: test two sentiment lexica for Spanish
  • Methods: Sentiment Analysis & Machine Learning

Sentence examples

from El Chacho by Eduardo Gutiérrez, 1884, military novel, narrated text:

Quiroga entretanto permanecía en Buenos Aires, bebiendo en la inspiración infame del tirano las más sangrientas ideas, y recibiendo las más terribles instrucciones.

Quiroga, in the meantime, stayed in Buenos Aires, drinking the most blodthirsty ideas from the infamous inspiration of the tyrant, receiving the most terrible instructions.

from Clemencia by Ignacio Manuel Altamirano, 1869, sentimental novel, direct speech:

— ¡Oh! sí podrá usted, Fernando, sí podrá usted. A una mujer tan hermosa como ésta, lo difícil, lo imposible es no amarla. Es demasiado encantadora para que el corazón de usted pueda permanecer indiferente.

Oh! yes, you can, Fernando, yes, you can. It's difficult, impossible not to love a woman as beautiful as this one. She is too charming for your heart to remain indifferent.

Data: Spanish American Novels

  • 30 novels, 1840-1910
  • 3 countries:
    Argentina (16)
    Cuba (9)
    Mexico (5)
  • 16 authors
  • 4 subgenres:
    sentimental (9)
    historical (8)
    sociopolitical (7)
    costumbrista (6)

Distribution of novels per decade and subgenre

Methods: Sentiment features

Sentiment lexica:

SentiWordNet 3.0 NRC Emotion Lexicon
Miller, 1995; Baccianella et al., 2010 Saif and Turney, 2013
polarity (positive, negative, neutral) polarity + 8 basic emotions (Trust, Fear, Joy, Sadness, Anger, Disgust, Anticipation, Surprise)
117,653 entries 14,182 entries
  • Linguistic annotation
    FreeLing (Padró and Stanislovsky, 2012)
  • Separation of
    direct speech
    & narrated text
  • Sentiment values per sentence
  • Threshold: 1

Methods: Subgenre classification

  • Decision Tree Classifier: easily interpretable
  • sentences aggregated into 5 sections, divided by section length: 150 data points
  • 60 experiments: varying type of features/lexicon, depth of decision tree
  • 5-fold cross-validation

Results: Sentence example

from Romualdo. Uno de tantos, by Francisco Calcagno, 1881, anti-slavery novel (narrated text)

SentiWordNet 3.0

La pluma no alcanza a describir las salvajes peripecias de aquella lucha espantosa.
The quill not accomplish to describe the ferocious events of that fight frightening.
- neutral - positive
0.125
- neutral - negative
0.75
positive
0.125
- - neutral negative
0.625
negative
0.125

NRC Emotion Lexicon

La pluma no alcanza a describir las salvajes peripecias de aquella lucha espantosa.
The quill not accomplish to describe the ferocious events of that fight frightening.
- neutral - neutral - neutral - negative - - - negative negative
Anger Anger Disgust
Fear Fear Fear

Results: subgenre classification

Results: Feature importance

  • tree depth: 3
  • feature set: NRC speech
  • F1 training set: 0.75
  • F1 test set: 0.66
  • most important features:
    positive speech
    narrated fear

Results: Decision tree

tree depth: 3, feature set: NRC speech, F1 training set: 0.75, F1 test set: 0.66

Comparison of classification methods

Conclusions: Sentiments and genre

  • Classification by sentiment features: better understanding of how emotions are expressed linguistically in subgenres
  • Distinction narrated text – direct speech matters
  • For Spanish: NRC Emotion Lexicon better than SentiWordNet, relevance of 8 basic emotions
  • Next: increase corpus size, use other classifiers, combine sentiment and other types of features, analyze genre pairs, revise genre assignments

Conclusions

Conclusions

  • Digital stylistic analyses can give new insights into literary genres
    • broader empirical basis possible
    • many different textual features can be explored (heuristically, systematically)
  • prerequesites:
    • texts in machine-readable digital format
    • some knowledge: text encoding, programming, statistics
  • challenges:
    • theory?
    • mapping of concepts (e.g. topic vs. theme, sentiment vs. emotion)
    • interpretation of results

Thank you!

Slides at: https://hennyu.github.io/due_19/

CLiGS: http://cligs.hypotheses.de/

CC-BY 4.0

References

  • Baccianella, S., Esuli, A. and Sebastiani, F. (2010). SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings of LREC 2010. Valletta, Malta: ELRA: 2200-2204. http://www.lrec-conf.org/proceedings/lrec2010/summaries/769.html.
  • Blei, David M. (2011). Introduction to Probabilistic Topic Models. Communications of the ACM.
  • Henríquez Miranda, C. and Guzmán, J. (2017). A Review of Sentiment Analysis in Spanish. Una Revisión Sobre el Análisis de Sentimientos en Español. TECCIENCIA 12 (22): 35-48. doi: 10.18180/tecciencia.2017.22.5.
  • Hettinger, L., Jannidis, F., Reger, I. and Hotho, A. (2016). Classification of Literary Subgenres. DHd2016. Leipzig: Universität Leipzig: 154-158. http://dhd2016.de/boa.pdf.
  • Jockers, M. (2015). Revealing Sentiment and Plot Arcs with the Syuzhet Package. Matthew. L. Jockers. http://www.matthewjockers.net/2015/02/02/syuzhet/.
  • Juola, P. (2006): Authorship attribution. Foundations and Trends in Information Retrieval 1/3: 233–334.
  • Kim, E., Padó, S. and Klinger, R. (2017). Prototypical Emotion Developments in Literary Genres. Digital Humanities 2017. Conference Abstracts. Montréal: McGill University. https://dh2017.adho.org/abstracts/203/203.pdf.
  • Miller, G. A. (1995). WordNet: A Lexical Database for English. Communications of the ACM 38 (11), 39-41.
  • Molina, H. B. (2011). Como crecen los hongos. La novela argentina entre 1838 y 1872. Buenos Aires: Teseo.
  • Padró, L. and Stanislovsky, E. (2012). FreeLing 3.0: Towards Wider Multilinguality. Proceedings of the Language Resources and Evaluation Conference (LREC 2012). Istanbul, Turkey: ELRA: 2473-2479. http://nlp.Isi.upc.edu/publications/papers/padro12.pdf.
  • Pang, B. and Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval 2 (1-2): 1-135.
  • Saif, M. and Turney, P. (2013). Crowdsourcing a Word-Emotion Association Lexicon. Computational Intelligence 29 (3), 436-465.
  • Schmidt, B. (2014). Typical TV Episodes: Visualizing Topics in Screen Time. Sapping Attention. http://sappingattention.blogspot.de/2014/12/typical-tv-episodes-visualizing-topics.html.
  • Winko, S. (2003). Über Regeln emotionaler Bedeutung in und von literarischen Texten. In Jannidis, F., Lauer, G., Martínez, M., Winko, S. (eds.), Regeln der Bedeutung. Berlin: de Gruyter, pp. 329-348.
  • Zehe, A., Becker, M., Hettinger, L., Hotho, A., Reger, I., and Jannidis, F. (2016): Prediction of Happy Endings in German Novels based on Sentiment Information. Proceedings of DMNLP, Workshop at ECML/PKDD. Riva del Garda, Italy. http://ceur-ws.org/Vol-1646/paper2.pdf
  • Zó, R. E. (2015). Emociones escriturales. La novela sentimental latinoamericana. Saarbrücken: Editorial Académica Española.