Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

This is an item in your portfolio. It can be have images or nice text. If you name the file .md, it will be parsed as markdown. If you name the file .html, it will be parsed as HTML.

publications

talks

The Literary Theme Ontology for Media Annotation and Information Retrieval

Published:

Abstract: Literary theme identification and interpretation is a focal point of literary studies scholarship. Classical forms of literary scholarship, such as close reading, have flourished with scarcely any need for commonly defined literary themes. However, the rise in popularity of collaborative and algorithmic analyses of literary themes in works of fiction, together with a requirement for computational searching and indexing facilities for large corpora, creates the need for a collection of shared literary themes to ensure common terminology and definitions. To address this need, we here introduce a first draft of the Literary Theme Ontology. Inspired by a traditional framing from literary theory, the ontology comprises literary themes drawn from the authors own analyses, reference books, and online sources.

A Fisher’s Exact Test Interpretation of the TF-IDF Term-weighting Scheme

Published:

Term frequency–inverse document frequency, or TF–IDF for short, is arguably the most celebrated mathematical expression in the history of information retrieval. Conceived as a simple heuristic quantifying the extent to which a given term’s occurrences are concentrated in any one given document out of many, TF–IDF and its many variants are routinely used as term-weighting schemes in diverse text analysis applications. There is a growing body of scholarship dedicated to placing TF–IDF on a sound theoretical foundation. In this talk, I build on that tradition by motivating TF–IDF to the statistics community by deriving the famed expression from a significance testing perspective. I will sketch out how TF–IDF is, under some admittedly restrictive conditions, asymptotically equal to the negative logarithm of a one-tailed Fisher’s exact (significance) test p-value. The Fisher’s exact test interpretation of TF–IDF equips the working statistician with a justification for TF–IDF’s use together with a ready explanation of its long-established effectiveness. Presentation slides are available here.

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.