Cyberbullying is the deliberate use of online digital media to communicate false, embarrassing, or hostile information about another person. It is the most common online risk for adolescents and well over half of young people do not tell their parents when it occurs. While there have been many studies about the nature and prevalence of cyberbullying, there has been relatively less work in the area of automated identification of cyberbullying in social media sites. The focus of our work is to develop an automated model to identify and measure the degree of cyberbullying in social networking sites we propose a new representation learning method to tackle this problem. Our method named Support Vector Machine Its goal is to find the optimal separating hyper plane which maximizes the margin of training data. Initially the classifier is trained with labelled data before being used to classify the data to test accuracy. Before the data can be used to train our classifier, it is imperative to process it a good separation is achieved by the hyper plane that has the largest distance to the nearest training-data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed based on domain knowledge and the word embedding technique. Our proposed method is able to exploit the hidden feature structure of bullying information.
Cyberbullying has emerged as a serious problem afflicting children and young adults. Previous studies of Cyberbullying focused on extensive surveys and its psychological effects on victims, and were mainly conducted by social scientists and psychologists. Although these efforts facilitate our understanding for cyberbullying, the psychological science approach based on personal surveys is very time-consuming and may not be suitable for automatic detection of cyberbullying.
Automatic cyberbullying detection is becoming possible. In machine learning-based cyberbullying detection, there are two issues: 1) Text representation learning to transform each post/message into a numerical vector and 2) Classifier training.
In cyberbullying detection, most bullying posts contain bullying words such as profanity words and foul languages. These bullying words are very predictive of the existence of cyberbullying. However, a direct use of these bullying features may not achieve good performance because these words only account for a small portion of the whole vocabulary and these vulgar words are only one kind of discriminative feature for bullying. This kind of dropout noise can be denoted as semantic dropout noise, because semantic information is used to design dropout structure.
In, this mapping matrix is learned to reconstruct removed features from other uncorrupted features and hence is able to capture the feature correlation information. Here, we inject the sparsity constraints on the mapping weights so that each row has a small number of nonzero elements. This sparsity constraint is quite intuitive because one word is only related to a small portion of vocabulary instead of the whole vocabulary.
The bullying features play an important role and should be chosen properly. In the following, the Steps for constructing bullying feature set are given, in which the first layer and the other layers are addressed separately. For the first layer, expert knowledge and word embeddings are used. For the other layers, discriminative feature selection is conducted.
As a side effect of increasingly popular social media, cyberbullying has emerged as a serious problem afflicting children, adolescents and young adults. Machine learning techniques make automatic detection of bullying messages in social media possible, and this could help to construct a healthy and safe social media environment. Method named Semantic-Enhanced Marginalized Denoising Auto-Encoder (smSDA) is developed via semantic extension of the popular deep learning model stacked denoising auto encoder. Whereas the original problem may be stated in a finite dimensional space, it often happens that the sets to discriminate are not linearly separable in that space.
It is to combine BoW features, sentiment features and contextual features to train a support vector machine for online harassment detection and utilized label specific features to extend the general features, where the label specific features are learned by Linear Discriminative Analysis. During training of smSDA, we attempt to reconstruct bullying features from other normal words by discovering the latent structure, i.e. correlation, between bullying and normal words. The intuition behind this idea is that some bullying messages do not contain bullying words.
Three kinds of information including text, user demography, and social network features are often used in cyberbullying detection. Since the text content is the most reliable, our work here focuses on text-based cyberbullying detection. In this paper, we investigate one deep learning method named Support Vector Machine Its goal is to find the optimal separating hyper plane which maximizes the margin of training data.
Initially the classifier is trained with labelled data before being used to classify the data to test accuracy whereas the original problem may be stated in a finite dimensional space, it often happens that the sets to discriminate are not linearly separable in that space is defined and solved through this support vector machine method.
This paper addresses the text-based cyberbullying detection problem, where robust and discriminative representations of messages are critical for an effective detection system. By designing semantic dropout noise and enforcing sparsity, we have developed Support Vector Machine Its goal is to find the optimal separating hyper plane which maximizes the margin of training data. Initially the classifier is trained with labelled data before being used to classify the data to test accuracy whereas the original problem may be stated in a finite dimensional space, In addition, word embeddings have been used to automatically expand and refine bullying word lists that is initialized by domain knowledge.