Economics Webinar - On the Testability of the Anchor Words Assumption in Topic Models

9:00am - 10:30am
Online via Zoom

Topic models are a simple and popular tool for the statistical analysis of textual data. Their identification and estimation is typically enabled by assuming the existence of \emph{anchor words}; that is, words that are exclusive to specific topics. In this paper we show that the existence of anchor words is statistically testable: there exists a test with correct size that has nontrivial power. This means that, in general, the anchor word assumption cannot be viewed simply as a convenient normalization. At the core of our result lies a simple characterization of when a column-stochastic matrix with known nonnegative rank admits a \emph{separable} factorization. We use a simulation study to analyze the power of a bootstrapped version of our suggested procedure and to discuss its computational limitations.

Prof. Jose Luis Montiel Olea
Cornell University

