Content Modeling in Multi-Platform Multilingual Social Media Data

Arman Setser; Libby Lange; Kyle Weiss; Vlad Barash

doi:10.54501/jots.v2i2.136

Vol. 2 No. 2 (2024), Peer-reviewed Articles

Vol. 2 No. 2 (2024)

Content Modeling in Multi-Platform Multilingual Social Media Data

Peer-reviewed Articles

https://doi.org/10.54501/jots.v2i2.136

Published 2024-02-28

Authors

Arman Setser

Graphika

https://orcid.org/0000-0002-3569-8993

Libby Lange

https://orcid.org/0009-0006-1932-4787

Kyle Weiss

https://orcid.org/0009-0008-2981-0652

Vlad Barash

https://orcid.org/0000-0001-9096-1830

PDF

Keywords

content modeling
natural language processing
sentence embedding
social media data

How to Cite

Setser, A., Lange, L., Weiss, K., & Barash, V. (2024). Content Modeling in Multi-Platform Multilingual Social Media Data. Journal of Online Trust and Safety, 2(2). https://doi.org/10.54501/jots.v2i2.136

Abstract

An increase in the use of social media as the primary news source for the general population has created an ecosystem in which organic conversation commingles with inorganically seeded and amplified narratives, which can include public relations and marketing activity but also covert and malign influence operations. An efficient and easily understandable analysis of such data is important, as it allows relevant stakeholders to protect online communities and free discussion while better identifying activity and content that may violate social media platform terms of service. To accomplish this, we propose a method of large-scale social media data analysis, which allows for multilingual conversations to be analyzed in depth across any number of social media platforms simultaneously. Our method uses a text embedding model, i.e., a natural language processing model that holds semantic and contextual understandings of language. The model uses an “understanding” of language to represent posts as coordinates in a high-dimensional space, such that posts with similar meanings are assigned coordinates close together. We then cluster and analyze the posts to identify online topics of conversation existing across multiple social media platforms. We explicitly show how our method can be applied to four different datasets, three consisting of Chinese social media posts related to the Belt and Road Initiative and one relating to the Russia-Ukraine war, and we find politically-influenced conversations that contain misleading information relating to the Chinese government and the Russia-Ukraine war.

https://doi.org/10.54501/jots.v2i2.136

PDF

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Content Modeling in Multi-Platform Multilingual Social Media Data

Authors

Keywords

Categories

How to Cite

Download Citation

Abstract