Talks and presentations

A Fisher’s Exact Test Interpretation of the TF-IDF Term-weighting Scheme

November 28, 2024

Talk, Dalhousie University, Halifax, Canada

Term frequency–inverse document frequency, or TF–IDF for short, is arguably the most celebrated mathematical expression in the history of information retrieval. Conceived as a simple heuristic quantifying the extent to which a given term’s occurrences are concentrated in any one given document out of many, TF–IDF and its many variants are routinely used as term-weighting schemes in diverse text analysis applications. There is a growing body of scholarship dedicated to placing TF–IDF on a sound theoretical foundation. In this talk, I build on that tradition by motivating TF–IDF to the statistics community by deriving the famed expression from a significance testing perspective. I will sketch out how TF–IDF is, under some admittedly restrictive conditions, asymptotically equal to the negative logarithm of a one-tailed Fisher’s exact (significance) test p-value. The Fisher’s exact test interpretation of TF–IDF equips the working statistician with a justification for TF–IDF’s use together with a ready explanation of its long-established effectiveness. Presentation slides are available here.

The Literary Theme Ontology for Media Annotation and Information Retrieval

September 24, 2019

Talk, The Medical University of Graz, Graz, Austria

Abstract: Literary theme identification and interpretation is a focal point of literary studies scholarship. Classical forms of literary scholarship, such as close reading, have flourished with scarcely any need for commonly defined literary themes. However, the rise in popularity of collaborative and algorithmic analyses of literary themes in works of fiction, together with a requirement for computational searching and indexing facilities for large corpora, creates the need for a collection of shared literary themes to ensure common terminology and definitions. To address this need, we here introduce a first draft of the Literary Theme Ontology. Inspired by a traditional framing from literary theory, the ontology comprises literary themes drawn from the authors own analyses, reference books, and online sources.