Wednesday, December 11 2019
3:00pm - 5:00pm
Masters Presentation
Classifying Political Articles from Fox News and CNN Using Random Forests

Media bias in national news sources has been a hot topic in recent years, especially with regard to the political sphere. While national news sources are supposed to be unbiased sources of news, many people claim this is not the case. This project will look at two different national media sources, Fox News and CNN, which were chosen based on popular beliefs that the former has a conservative political bias and the latter having a liberal political bias. The two news sources were web scraped using Python’s “newspaper” package. Ten articles from each news sources’ political page were scraped per day during the weeks of September 16, 2019 – October 4, 2019, which amounted to 300 total articles being scraped with 150 articles from each news source. Natural language processing was then used to remove stop words, convert text to lowercase, remove punctuation, tokenize, perform lemmatization, and vectorize using both Term Frequency – Inverse Document Frequency (TF-IDF) and Bag of Words methods. Finally, the random forest algorithm and logistic regression were used to classify an article’s news source based on the article’s text and titles. The random forest model using TF-IDF classified the article’s text with the highest accuracy of all models. While the models used for classification of an article’s title were lower than their text counterparts, the random forest model using TF-IDF still had the highest accuracy of all the other models used for article title classification. The top five tokens with the highest feature importance from the most accurate model, listed from highest to lowest, were “click”, “president donald”, “donald”, “contributed” and “report”.
Speaker:Michael Ingram
Location:SCB 4018

Email Address:  
   Sign up to be notified by email before this event takes place.

Download as iCalendar