Employing Neural Language Models and A Bayesian Hierarchical Framework for Classification and Engagement Analysis of Misinformation on Social Media

dc.contributor.advisor

Jiang, Yue

dc.contributor.advisor

Wiseman, Sam

dc.contributor.author

List, Abbey

dc.date.accessioned

2022-12-19T13:44:47Z

dc.date.available

2022-12-19T13:44:47Z

dc.date.issued

2022-04

dc.department

Computer Science

dc.department

Statistical Science

dc.description.abstract

While social media can be an effective tool for maintaining personal relationships and making global connections, it has become a powerful force in the damaging spread of misinformation, especially during universally difficult and taxing events such as the COVID-19 pandemic. In this study, we collected a sample of Tweets related to COVID-19 from Twitter accounts of influential political media commentators and news organizations, assigning labels of misinformation, misleading, or legitimate to each Tweet. We constructed a Bayesian hierarchical negative binomial regression model to analyze any associations between Tweet engagement and misleading status while controlling for factors such as political lean, lexical diversity, and Retweet status. We found evidence that engagement had a positive association with misleading status and text readability, as well as a negative association with Retweets. We also employed a DeBERTa neural language classification model to predict the presence of misinformative or misleading content in Tweets, and we experimented with external datasets, multitask fine-tuning, backtranslation, and weighted loss to achieve accuracy of 0.683 and a macro F1-score of 0.593. We then examined DeBERTa explainability through word attributions with integrated gradients and found that tokens with the highest influence on model predictions often possessed connotations or context that was understandably related to the predicted label. The results of this study indicate that misleading status, Retweet status, and linguistic features may hold associations with overall Tweet engagement, and the DeBERTa model represents a potentially useful tool that can examine Tweet text alone without an external knowledge base and determine whether misinformation is present.

dc.identifier.uri

https://hdl.handle.net/10161/26386

dc.subject

Bayesian statistics

dc.subject

Misinformation on social media

dc.subject

Hierarchical models

dc.subject

COVID-19

dc.subject

Neural language models

dc.subject

Text analysis

dc.title

Employing Neural Language Models and A Bayesian Hierarchical Framework for Classification and Engagement Analysis of Misinformation on Social Media

dc.type

Honors thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AbbeyList_UndergraduateThesis_Final.pdf
Size:
4.22 MB
Format:
Adobe Portable Document Format
Description: