image

One the largest media organisations in the world

image

Accurate prediction of online views

image

Publication strategy optimisation

Client

News UK

Purpose

News Corporation is one of the world's biggest media organisations. It owns News UK which publishes The Sun, The Times, The Sunday Times and The TLS. With the rapid rise in social media, the publications promote many of their articles online. The aim is to ensure that the articles most relevant and important to readers are promoted, and that readers share stories within their networks.

The team was given the challenge of helping the newsrooms make informed decisions about how individual news articles would perform (how much web traffic or engagement on social media they would generate) before an article was published, or very shortly afterwards. The newsroom wanted to evaluate whether these types of algorithms could be useful to aid their decision making, such as what stories to promote on social media.

Approach

The S2DS team was provided with access to a large number of news articles from recent months, with their associated metadata such as headline, article content and topics. They were also provided with detailed data on minute-by-minute traffic that the articles received once published on the website, from News UK’s in-house analytics tool called INCA (Intelligent News Contextual Analytics). The first goal of the team was to develop an accurate modelfor the total number of website visitors each article would get in the first 24 hours after publication. To build this model, the team engineered a wide range of features about each article to include in their predictive model. These included some that were available before publication, such as length, number of words, topics etc. The team also included others that captured the performance of the article in the first few minutes after publication, as well as when the article was published (day of week, time of day).

Not surprisingly, the amount of traffic an article receives in the first few minutes strongly predicts its expected performance over its first 24 hours. However, the performance of the model was significantly increased by including the wide range of other features that the S2DS team included in the model. When the S2DS team dropped features that were only available after publication (e.g. visitors in the first 30 minutes) the performance dropped. It nevertheless provided valuable signals that are of use to newsrooms in advance of publication. Next, the team developed a model for the number of visitors an article would receive from social media. They were able to recommend which articles should be considered for sharing by comparing articles -using a similarity metric -to previously published articles that performed well on social media. They worked through several iterations of the model, and saw the biggest lift in performance after adding their own topic features using NLP techniques.

S2DS provides a great platform to help accelerate innovation around how you apply machine learning in your business as well as a way to find talented new data scientists. The quality of the teams is outstanding and they quickly familiarise themselves with your business concerns and data. They complement your existing data science teams and you benefit from multiple data scientists collaborating and testing competing approaches, to find the best solution in a short space of time.

Dan Gilbert, Director of Data, News UK
The Outcome

The team set out to help News UK utilise data science methods to find articles that will attract large numbers of visitors online, the results of which can help complement the expertise within the newsroom. Using both classification and regression models, the team were able to write an algorithm that would take in an article and predict the total number of visitors it would receive. The algorithm would also take into account whether it should be shared on social media. The team discovered that applying NLP techniques significantly boosted the performance of their models. They also learned how predictive models can improve workflows in a business, but also the limitations and where human expertise is crucial. The team worked alongside News UK’s data science and editorial teams throughout the project, and were mentored by S2DS alumni and News UK data scientist Jonathan Brooks-Bartlett, PhD. The next steps are for News UK’s data team to productionise these models and add them to the wider suite of machine learning models that they generate about news content. They will also look to integrate them into the INCA analytics toolset, so they can be used by the newsroom.

Connect with the Data Science Community

For Data Scientists & Businesses

TOP