Abstract
An increase in the use of social media as the primary news source for the general population has created an ecosystem in which organic conversation commingles with inorganically seeded and amplified narratives, which can include public relations and marketing activity but also covert and malign influence operations. An efficient and easily understandable analysis of such data is important, as it allows relevant stakeholders to protect online communities and free discussion while better identifying activity and content that may violate social media platform terms of service. To accomplish this, we propose a method of large-scale social media data analysis, which allows for multilingual conversations to be analyzed in depth across any number of social media platforms simultaneously. Our method uses a text embedding model, i.e., a natural language processing model that holds semantic and contextual understandings of language. The model uses an “understanding” of language to represent posts as coordinates in a high-dimensional space, such that posts with similar meanings are assigned coordinates close together. We then cluster and analyze the posts to identify online topics of conversation existing across multiple social media platforms. We explicitly show how our method can be applied to four different datasets, three consisting of Chinese social media posts related to the Belt and Road Initiative and one relating to the Russia-Ukraine war, and we find politically-influenced conversations that contain misleading information relating to the Chinese government and the Russia-Ukraine war.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright (c) 2024 Journal of Online Trust and Safety