Grimmer/Roberts/Stewart (2022): Text as Data: A New Framework for Machine Learning and the Social Sciences. More reading material will be provided during the course.
Students should have basic knowledge of R programming and be proficient in the basics of statistics and quantitative analysis.
Portfolio examination: Presentations (25%) and written papers (75%), on students' own research projects, applying appropriate text analysis methods to address example research questions of their choosing (in coordination with the lecturer)
In this course, we review and apply recent advances in computational methods for analyzing text as data. Following the framework established by Grimmer, Roberts, and Stewart (2022), students will learn the theoretical foundations of important text analysis models alongside practical implementation in R. The course provides both conceptual understanding and hands-on skills needed to leverage textual data for research, with a focus on economic analyses.Preliminary Outline1. Selection and Representation Fundamental concepts and approaches to text as data Text preprocessing and representation techniques Basic text features and quantification methods2. Discovery Theoretical foundations of key discovery models Unsupervised methods for exploring textual data Approaches to pattern identification in large text corpora3. Measurement Supervised learning approaches for text analysis Methods for quantifying concepts in textual data Validation and reliability assessment4. Inference Statistical inference with text data Causal inference approaches using textual information Applications and limitations of text-based inference Text as outcome, treatment, or confounderCourse FormatThe course combines lecture elements with practical lab-style sessions. Lectures will cover theoretical foundations and methodological considerations, while lab sessions will focus on implementation in R. Students will work on their own research projects throughout the course, applying appropriate text analysis methods to address example questions of their choosing. These projects will allow students to gain hands-on experience with the full text analysis pipeline from data preparation to inference.