Vastly available digitized text data has created new opportunities for understanding social phenomena. Relatedly, social issues like toxicity, discrimination, and propaganda frequently manifest in text, making text analyses critical for understanding and mitigating them. In this course, we will centrally explore: how can we use NLP as a tool for understanding society? Students will learn core and recent advances in text-analysis methodology, building from word-level metrics to embeddings and language models as well as incorporating statistical methods such as time series analyses and causal inference.

Prerequisites: Pre-reqs: one of (EN.601.465/665, EN.601.467/667, EN.601.468/668) and familiarity with Python/PyTorch. Students may receive credit for EN.601.472 or EN.601.672, but not both.



Schedule

The current class schedule is below. The schedule is subject to change:

Date Topic Reference Readings Work Due
Wed Jan 22 Introduction, course expectations [slides]
  1. Wallach, Hanna. "Computational social science≠ computer science+ social data." Communications of the ACM 61.3 (2018): 42-44.
Mon Jan 27 Word statistics [slides]
  1. Monroe et al. "Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict", Political Analysis (2008)
  2. Jurafsky and Martin, Speech and Language Processing, 3rd ed (2023) [Sec 6.6]
Submit course goals form; set up iClicker
Wed Jan 29 Topic Modeling pt 1 [slides]
  1. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.
  2. Griffiths, Thomas L., and Mark Steyvers. "Finding scientific topics." Proceedings of the National academy of Sciences 101.suppl_1 (2004): 5228-5235.
Mon Feb 1 Topic Modeling pt 2 [slides]
  1. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.
  2. Roberts, Margaret E., et al. "stm: R Package for Structural Topic Models" Journal of Statistical Software
  3. Roberts, Margaret E., et al. "The structural topic model and applied social science." Advances in neural information processing systems workshop on topic models: computation, application, and evaluation. Vol. 4. No. 1. 2013.
HW 1 Released
Wed Feb 05 Word Embeddings: Construction [slides]
  1. Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space."
  2. Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." NeuRIPS (2013).
Mon Feb 10 Word Embeddings: Applications and Evaluations [slides]
  1. Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker? Debiasing word embeddings." NeuRIPS (2016).
  2. Hamilton, William L., Jure Leskovec, and Dan Jurafsky. "Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change." ACL (2016).
  3. Garg, Nikhil, et al. "Word embeddings quantify 100 years of gender and ethnic stereotypes." PNAS (2018)
  4. Joseph, Kenneth, and Jonathan Morgan. "When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?." ACL (2020).
Wed Feb 12 Affect and Lexicons [slides]
  1. Giovanna Colombetti. "From affect programs to dynamical discrete emotions", Philosophical Psychology (2009).
  2.  Saif M. Mohammad. "Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words>" ACL (2018)
  3. Hamilton, William L., et al. "Inducing domain-specific sentiment lexicons from unlabeled corpora." EMNLP. (2016)
Mon Feb 17 Data annotating [slides]
  1. Zeerak Waseem. "Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter". In Proceedings of the First Workshop on NLP and Computational Social Science at ACL (2016)
  2. Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov. "Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts". In EMNLP (2019)
  3. ICML 2023 tutorial on RLHF
HW 1 due HW 2 released
Wed Feb 19 Classification Models [slides]
  1. Jurafsky & Martin Chap. 5
  2. Jurafsky & Martin Chap. 7
  3. Keith, Katherine, and Brendan O’Connor. "Uncertainty-aware generative models for inferring document class prevalence." EMNLP (2018)
Mon Feb 24 Drawing Conclusions from Measurements [slides] HW 2 due
Wed Feb 26 Causal inference: Adjustments [slides]
  1. Brady Neal, “Introduction to Causal Inference from a Machine Learning Perspective”, Course Lecture Notes, Chapter 2
  2. Stuart EA. Matching methods for causal inference: A review and a look forward. Stat Sci. 2010 Feb 1;25(1):1-21. doi: 10.1214/09-STS313
  3. Chesnaye NC, Stel VS, Tripepi G, Dekker FW, Fu EL, Zoccali C, Jager KJ. An introduction to inverse probability of treatment weighting in observational research. Clin Kidney J. 2021 Aug 26;15(1):14-20. doi: 10.1093/ckj/sfab158
  4. Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res. 2011 May;46(3):399-424. doi: 10.1080/00273171.2011.568786.
HW 3 released
Mon Mar 3 Causal inference: Text and NLP [slides]
  1. Keith, Katherine, David Jensen, and Brendan O’Connor. "Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates." ACL. 2020.
  2. Roberts, Margaret E., Brandon M. Stewart, and Richard A. Nielsen. "Adjusting for confounding with text matching." American Journal of Political Science 64.4 (2020): 887-903.
  3. Veitch, Victor, Dhanya Sridhar, and David Blei. "Adapting text embeddings for causal inference." Conference on Uncertainty in Artificial Intelligence. PMLR, 2020.
  4. Field, Anjalie, and Yulia Tsvetkov. "Unsupervised Discovery of Implicit Gender Bias." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.
Wed Mar 5 Network metrics [slides]
  1. Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855-864).
  2. Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2018). How powerful are graph neural networks?. ICLR (2019).
  3. Yuan, H., Yu, H., Gui, S., & Ji, S. (2022). Explainability in graph neural networks: A taxonomic survey. IEEE transactions on pattern analysis and machine intelligence, 45(5), 5782-5799.
HW3 Due on Friday
Mon Mar 10 Midterm Review [slides]
Wed Mar 12 Midterm
Mon Mar 17 BREAK
Wed Mar 29 BREAK
Mon Mar 24 Ethics [slides]
  1. Bianchi, Federico, et al. "Easily accessible text-to-image generation amplifies demographic stereotypes at large scale." FAccT. 2023.
  2. Sap, Maarten, et al. "The Risk of Racial Bias in Hate Speech Detection." ACL. 2019
  3. Blodgett, Su Lin, et al. "Language (Technology) is Power: A Critical Survey of “Bias” in NLP" ACL. 2020
Project Proposal Released
Wed Mar 26 Language Modeling: Background [slides]
  1. Jurafsky and Martin, Speech and Language Processing, 3rd ed (2023) [Sec 3]
  2. Jurafsky and Martin, Speech and Language Processing, 3rd ed (2023) [Sec 7.6]
Mon Mar 31 Language Modeling: MLM Use Cases [slides]
  1. The Illustrated Transformer
  2. Card, Dallas, et al. "Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration." Proceedings of the National Academy of Sciences 119.31 (2022)
  3. Myra Cheng, et al. "AnthroScore: A Computational Linguistic Measure of Anthropomorphism" EACL (2024).
Wed Apr 2 History Applications with guest Louis Hyman
HW 4 released
Mon Apr 8 Political Science Applications with Jae Yeon Kim Project Proposals due

Policies

Late Days Each student can use 5 late days for HW assignments over the course of the semester. Late days can be distributed in any way accross assignments. Additional extensions will not be granted, and work turned in late after all late days have been used will receive 0 credit. If a group assignment is turned in late, it will count as a late day for all students in the group. Late days cannot be used for the final project report.

Course Conduct This course includes topics that could raise differing opinions. All students are expected to respect everyone's perspective and input and to contribute towards creating a welcoming and inclusive climate. We the instructors will strive to make this classroom an inclusive space for all students, and we welcome feedback on ways to improve.

Academic Integrity This course will have a zero-tolerance philosophy regarding plagiarism or other forms of cheating, and incidents of academic dishonesty will be reported. A student who has doubts about how the Honor Code applies to this course should obtain specific guidance from the course instructor before submitting the respective assignment.

Discrimination and Harrasment The Johns Hopkins University is committed to equal opportunity for its faculty, staff, and students. To that end, the university does not discriminate on the basis of sex, gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, military status, immigration status or other legally protected characteristic. The University's Discrimination and Harassment Policy and Procedures provides information on how to report or file a complaint of discrimination or harassment based on any of the protected statuses listed in the earlier sentence, and the University’s prompt and equitable response to such complaints.

Personal Well-being Take care of yourself! Being a student can be challenging and your physical and mental health is important. If you need support, please seek it out. Here are several of the many helpful resources on campus:

Acknowledgements Thank you Daniel Khashabi for sharing the course website template!