About This Project

What is it?

This Fake News Detector is a machine learning tool built to automatically classify news content as real or fake using a fine-tuned BERT language model.

Why was it built

In the age of viral misinformation, detecting fake news is critical. This tool aims to help researchers, educators, and developers explore how NLP and AI can be used to fight misinformation using modern deep learning techniques.

Performance

The model achieves around 97% accuracy on the test set, as supported by the confusion matrix:

2119 real news items were correctly predicted as real
2274 fake news items were correctly predicted as fake
23 real items were wrongly predicted as fake
74 fake items were wrongly predicted as real

This model is trained on dataset derived from Kaggle source – fake-and-real-news. Contains 23,502 fake news articles and 21,417 true news articles.

How It Works

Model: Fine-tuned BERT from Hugging Face Transformers

Input features: News statement, Title

Output:

Binary Classification – Real or Fake
SHAP waterfall plot – visually highlighting the most influential words that contributed to the model’s decision.

Training Method

The model is fine-tuned on a binary classification task using BERT-base-uncased from Hugging Face Transformers.

It was trained using CrossEntropy loss and the Adam optimizer for 4 epochs with a batch size of 32 and learning rate of 1e-5.

During training, the BERT encoder was partially frozen (only unfreezed the last two layers) to reduce overfitting and speed up convergence. Token inputs included the Title and partial statement (GPU power bottleneck with long input).

As the image above suggests, the training loss starts around 0.6 and lands near ~0.05 after 4 epochs.