Automated Fact Checking

17 November 2016 | Mevan Babakar

Full Fact is building scalable, robust, automated fact checking tools to be used in newsrooms and by fact checkers all over the world.

If you want to use or test our automated fact checking software:

Honesty in public debate matters

You can help us take action – and get our regular free email

Our Goals

Bad information ruins lives. It harms our communities, by spreading hate through misleading claims. It hurts our democracy, by damaging trust in politicians and political processes. It leads to bad decisions, by disrupting public debate on the issues that most affect us, including climate change and public spending

Since 2015, we have been developing technology to help increase the speed, scale and impact of our and others fact checking. Our goal is to create a global collaborative effort to help media outlets, civil society, platforms and public policy makers better understand the landscape, and to bring the benefits of those tools to everyone by working in partnership.

We launched our roadmap The State of Automated Fact Checking in August 2016, where we set out a plan for making fact checking dramatically more effective using existing technology. In autumn of that year we were one of the first UK organisations to use the “Fact Check” label in Google news.

In November 2016, we announced support from Google’s Digital News Initiative for the first stages of our automated fact checking work, and we’re grateful for vital support from hosting experts Bytemark and open source search specialists Flax too. This funding helped build our first prototypes. In May 2019 we – along with Africa Check, Chequeado and the Open Data Institute – won the Google AI Impact Challenge. We are just one of 20 international winners, chosen from more than 2,600 entrants. Over the next three years, with Google’s support, we will use machine learning to dramatically improve and scale fact checking, working with international experts to define how artificial intelligence could transform this work, to develop new tools and to deploy and evaluate them.

What we are building

We have made a set of tools designed to alleviate the pain points we experience in the fact checking process. As fact checkers with ten years experience, we understand the operational advantages these tools can bring, making us uniquely placed to build them.

We are not attempting to replace fact checkers with technology, but to empower fact checkers with the best tools. We expect most fact checks to be completed by a highly trained human, but we want to use technology to help:

Know the most important thing to be fact checking each day
Know when someone repeats something they already know to be false
Check things in as close to real-time as possible

Across a suite of products, our technology does the following tasks:

Collecting and monitoring the data

We start by collecting a range of data from leading news sites and social media platforms that may contain claims we want to fact check. Data we collect can be taken from speech on live TV, online news sites, and social media pages. We are able to add new monitoring inputs for fact checkers in other countries and have done so already for a number of countries in Africa.

Once we have all the input information available as text we split everything down to individual sentences, which are our atomic unit for fact checks. The sentences are then passed through a number of steps to enrich them and make them more and more useful in the process of fact checking.

Identifying and labelling claims

We define a claim as the checkable part of any sentence which is made by a politician, journalist or online.

There are many different types of claims - ranging from claims about quantities (“GDP has risen by x%”), claims about cause and effect (“this policy leads to y”), predictive claims about the future (“the economy will grow by z”) and more.

We have developed a claim-type classifier to guide fact checkers towards claims that might be worth investigating. It helps us to identify and label every new sentence according to what type of claim it contains (whether it is about cause and effect, quantities, etc.).

We started building this with the recent BERT model published by Google Research and fine-tuned it using our own annotated data. BERT is a tool released by Google Research that has been pre trained with hundreds of millions of sentences in over 100 languages. This makes it a broad statistical model of language as it is actually used.

Labelling claims in this way filters the volume of data we could fact check from hundreds of thousands to tens of thousands. It is a vital first step in ensuring that the users of our tools have a chance to make sense of all the information.

Matching claims

Once we have labelled claims, sentences are checked to see if they are a match to something we have previously fact checked. Some claims are easier to model than others due to specificity and ambiguity in the language used to describe them.

The plan is to train a BERT-style model to predict match/no-match for sentences and then add in entity analysis (e.g. count if both sentences contain the sample numbers, people, organizations etc.). In combination, we hope these two stages will find repeats of a claim even if different words are used to describe it

Taking matching and identifying a step further

Additionally, we semantically enrich the content to help our model detect semantically similar words and phrases. The first step is to identify people, places and other valuable entities, identifying the entities of interest and matching them to external URIs. We then deduplicate the information across multiple sentences, to identify and group together semantically similar references (e.g. ‘the prime minister’ and ‘Boris Johnson’). This allows us to extract greater value from the data we process and means we can make sophisticated interfaces showing all statements made by individuals. We currently use wikidata via Google big query to power this service.

Real time checks

Finally, we use external processes to help spot more claims and further identify patterns of language that can be automatically checked.

Given a sentence, our tool attempts to identify the topic, trend, values, dates and location. If that succeeds, it compares the extracted information with the corresponding data via the UK Office For National Statistics API. It knows about 15 topics and c.60 verbs that define trends (e.g. rising, falling). This means our technology can automatically match with significantly more data to identify whether it’s correct.

Once the claim has been identified

The fact checking process is often undertaken offline. We then publish the results on our website. We also describe each fact check with some very specific markup, called ClaimReview. This is part of the wider schema.org project. It describes content on a range of topics in domain specific terms. This is important for us as describing our content so specifically helps ensure that our fact checks can travel further than our own platforms. Fact checks can form a vital part of the web. Just over 60,000 fact checks exist in the Google Fact Check Explorer and these were seen over 4 billion times in 2019 in Google Search alone.

Limitations

We are careful not to overstate our results. There are a lot of people who say that artificial intelligence and machine learning is a panacea, but we have been at the front lines of fact checking since 2010, we know how difficult fact checking is first hand. Humans aren't going anywhere anytime soon—and nor would we want them to be.

Our automated fact checking team is made up of:

Andy Dudfield, Head of Automated Fact Checking
Ed Ingold, Tech Lead
Simon Coltman, Front End Developer
David Corney, NLP Engineer
Alex Joseph, NLP Engineer

We need support and funding to develop this work further. Please get in touch if you can help.

In the news

Poynter Full Fact has developed and is using an inward-facing automated fact checking platform
BBC Click Full Fact talks automated fact checking on BBC Click
The Guardian Journalists to use 'immune system' software against fake news
TechCrunch Full Fact aims to end fake news with automated fact checking tools
Wired Google is helping Full Fact create an automated, real-time fact-checker
The Guardian Fake news clampdown: Google gives €150,000 to fact-checking projects
Engadget Full Fact wants to automate fact checking to fight fake news
Independent Google funds automated fact-checking software in bid to fight fake news
Nieman Lab Fact-checking and data-driven projects among winners of Google’s Digital News Initiative funding