With the growing prevalence of the Internet, abusive behaviors on online platforms have surged in recent decades. These online abusers take advantage of the popularity and conveniences of online platforms to engage in abusive activities toward online users, drawing attention from researchers on effective methods to combat various abusive activities on online platforms. However, traditional machine learning methods, focusing on modeling abusive behavior patterns, struggle to effectively detect and mitigate such behaviors due to their inability to account for the intricate relationships among abusers on online platforms. In response, this dissertation proposes to leverage graphs to depict the complex relationships among abusive users and employ Graph Representation Learning (GRL) methods to detect these online abusive activities (i.e., drug trafficking detection and malicious repository detection). Despite the notable success that existing GRL methods have gained on benchmark graph datasets like social, academic, and molecule graphs, they still face challenges for abuse detection in real-world scenarios: (i) Many real-world abuse detection tasks lack sufficient labeled data for model training, as obtaining a large amount of labeled data is always time-consuming and resource-intensive; (ii) Online abuse graph datasets frequently exhibit class imbalance and topology imbalance issues. To handle the first challenge, this dissertation presents advanced techniques, including graph meta-learning and graph self-supervised learning, to combat online abusive activities (i.e., drug trafficker identification and malicious repository detection). Specifically, to make optimal use of the limited labeled data, two novel GRL models (called Meta-AHIN and Meta-HG) that incorporate GRL and meta-learning are first designed to detect malicious repositories on social coding platforms and drug sellers on social media, respectively. In the second part, this dissertation fully exploits the benefits of handy unlabeled data. It proposes two graph self-supervised learning methods (i.e., Rep2Vec and HyGCL-DC) to detect malicious repositories and drug trafficking communities, respectively. Furthermore, to handle the second challenge about class and topology imbalance issues in graphs, this dissertation designs two novel models called CM-GCL and AD-GSMOTE to alleviate the imbalance issues in the self-supervised learning setting (CM-GCL) and the supervised learning setting (AD-GSMOTE) for various detection tasks. All designed models in this dissertation are evaluated on benchmark or real-world datasets, showcasing the effectiveness of GRL-based methods in real-world online abuse detection. Furthermore, this dissertation contributes to research fields by publicly releasing two newly collected datasets (i.e., Twitter-Drug and Twitter-HyDrug), providing valuable resources and assets for researchers in the areas of online abuse detection and GRL.
History
Alt Title
Graph Representation Learning Techniques for the Combat against Online Abuse