Ensemble Learning Approach for Clickbait Detection Using Article Headline Features
Informing Science: The International Journal of an Emerging Transdiscipline
• Volume 22
• 2019
• pp. 031-044
Aim/Purpose: The aim of this paper is to propose an ensemble learners based classification model for classification clickbaits from genuine article headlines.
Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception.
Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines.
Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures.
Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.
Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception.
Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines.
Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures.
Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.
authentic news; clickbaits; ensemble learning; natural language processing; performance measures
218 total downloads