Graph Databases are comprised of nodes and relationships. A node represents a concept, such as a person or topic and a relationship looks at the connection between them. When data is stored in a graph database, algorithms can highlight the relationship and send this information to whoever requested it.
Working with Pivigo, The Ditchley Foundation required a graph database which could validate the connections between members. They chose two groups: MPs and Academics. One of the most common issues that arise when trying to obtain information on their previous attendee is identity verification. Common names often create confusion and it would be far more efficient to automate this process.
What does Ditchley do?
Founded in 1958, the Ditchley Foundation has been bringing together decision-makers in government, business, academia, and technology to discuss wider issues in society and the world, influencing policy through conversation. Having run many events and conferences for more than 60 years, they have a large database of academics and politicians, and as a result, confirming identities when sending out invitations to their events is difficult and current methods are often unreliable.
Initial experiments by the S2DS team to look for MPs in the Ditchley database, by removing titles and hyphens, were easier to match, but not robust enough. By adding fuzzy matching, where variations and typos are considered as matches, it allowed them to look at issues such as nicknames or double-barreled surnames. The algorithm provided a match score for a name to assess its likeliness. However, even high scores can be one letter out but incorrect, and common names appear a lot.
After this initial exploration, the data was cleaned and the process began with a surname match. If this score was less than 99, it was discarded. Following this, the algorithm looked at first name and MP as a suffix on the database to confirm if it was the individual. For example, “Toby Smith MP” may return a high score when looking for “Tony Smith MP”. This further check clears up the ambiguity.
The number of academics in Ditchley’s database far outnumbers the number of MPs, with more than 4 million publications. This presented a greater challenge when trying to verify identities and highlights how integral Graph Databases are to scalable datasets.
Deserunt illum ab eveniet voluptates aliquid perspiciatis et. Beatae molestias et fuga
The team looked at a number of features, such as titles, which indicate a likelihood that the person is an academic. By combining this with the institution address, field of study, and the year of publication, it allows for a greater confirmation score. For instance, if an individual named “Dr George Williams” has published papers on machine learning in 2012, but attended an event in the 1960s, it’s unlikely to be the same individual, despite the name appearing at face value to be correct.
The S2DS team looked at Twitter data from MPs to analyse the type of subjects they spoke about and the frequency. This information was used in conjunction with follower counts and the connections between followers. Using this, they created a network showing which individuals can be brought into future conversations to help Ditchley achieve their aims.
Twitter is a great resource to show connections; it details who is following who, and which department they work in. You can establish their importance within the network based on their follower count and how many influential people follow them. You can see who are the connectors within these networks who can bring different groups together.
For the second project, the team looked to create a network of relationships from media articles. Looking at 1 million publicly available Guardian articles from a period of 10 years, they used this to analyse the highly critical and influential individuals in these networks. From this data, they acquired 30,000 articles, identified 80,000 people, and 550,000 relationships, increasing efficiency from traditional manual labelling methods.
The result is a queryable graph database. Ditchley can look at an organisation, look at where an individual appeared in an article, the authors of the articles publishing relevant material, the critical connectors in this space, and use this information to guide invitations. Ditchley is deploying these techniques in their in-house database. This enables the crucial task of robustly matching people across contrasting portions of the graph. In addition, Ditchley has removed false positive matches by 80%.
To see how we built this database and the ways in which this is now being deployed by Ditchley, watch our webinar below.
To discover how Graph Databases can help your business utilise its data and optimise performance, get in contact with one of the Pivigo team here.
"Our participation in S2DS has helped us to advance our work in bridging divides using data science