May 2023

Chat Featurizer

A sentiment analysis package to quantify various semantic levers of team/group communication.

Tech Stack: Python, Jupyter Notebooks, Matplotlib, RoBERTa, LIWC, ConvoKit, Sentence Transformers, GitHub Actions, Pytest

01.

Project Motivation

Chat Featurizer is a Python package that takes a series of chat messages (i.e. a conversation) as input, and featurizes it into 150+ quantified, semantic metrics. These features are computed in varying ways, drawing inspiration from social science literature, external frameworks, and original methods as well.

This project originally inspired from a research question — we wanted to understand, which features of team communication lead to productivity? However, after expanding our feature set to over 150+ metrics, our team saw the potential in opening up our work as a sentiment analysis tool for public research projects. That way, other scientists/researchers can avoid “reinventing the wheel”, and simply leverage our modular package to extract different characteristics of their own conversational datasets.

I worked this project for 15 months during my time at UPenn’s Computational Social Science Lab. My contributions largely consist of computing 100+ chat/conversation level features, updating our codebase infrastructure/data model to keep up with varying feature requirements, alongside building an automated testing pipeline via GitHub Actions to validate our output metrics. Big thanks to Emily Hu, Priya D’Costa, and CSSLab for granting me this formative experience!

02.

Feature Computation Overview

This package takes in as input a Pandas DataFrame, corresponding to some conversation/message exchange, and returns a set of desired characteristics, i.e. features. Some examples of features include positivity, number of words, etc. We compute features on the chat and conversation level. Chat features take in a single instance of our DataFrame as input (i.e. a chat), while conversation features analyze a contiguous chunk of rows (i.e. a conversation). Observe this difference in the plots below.

While some features involve simple functions (i.e. counting number of words), many require more complex computation. One example of this is discursive diversity, a feature that measures how semantically divergent speakers are from each other in a given conversation. In order to compute this metric, we require a Sentence Transformer model, SBert, to represent users as “vectors” of sorts, computed by taking the average of their corresponding message vectors. The visuals on the right depict this process. Ultimately, we are left with a single disc. div metric for the given conversation.

In addition to discursive diversity, I’ve computed various complex features including forward flow, mimicry, user-level aggregates, etc., all of which have public facing documentation (link coming soon!). The process is littered with literature review, design discussions, and frequent codebase restructuring.

03.

Testing Pipeline

To ensure users that Chat Featurizer generates features as expected, I implemented an automated testing pipeline. This pipeline makes it so that when developers want to test their feature, all they have to do is add test cases into a designated DataFrame and push their code to our GitHub repository. From there, GitHub Actions* takes off, running a series of jobs to validate the newly-added feature. With any errors that arise, users are provided with a running test.log of inputs/expected vs predicted outputs.

*GitHub Actions is a continuous integration and continuous delivery (CI/CD) service that allows developers to automate your build, test, and deployment pipeline. You can create triggers for certain workflows, such as a pull request/push to repo, to customize your development workstreams.