Diagonal is building Privacy Enhancing Technology

One of our Founders, Andrew Eland, gives us the benefit of his healthcare background with this software engineer’s take on Privacy Enhancing Technologies (PETs), why we need them, how they work, and why they are important in our world.

What we are doing

At Diagonal we build tools to help people understand and solve urban geospatial data problems. Working with many different — often large — datasets, we often end up dealing with problems that touch upon public health and equity. These topics intersect with neighbourhoods, travel behaviour, and other physical markers of place. To address these challenges community organisers, public health officials, or local governments need to use data, including data that represents people and behaviours. Though the benefits of using such data are huge, there are significant risks to consider, in particular concerns around privacy.

We are building tools that respond to the need to embed more personal, sensitive data in planning decisions — responsibly. And reduce the risk of data breaches exposing personal data. These technologies are called Privacy Enhancing Technologies (PETs).

PETs are not often used within data analysis for urban planning. They are techniques and tools developed for sectors like: consumer marketing, medicine, security or financial services. By applying our expertise to this field we can harness the benefits of privacy preserving techniques to solve problems for planners and place-makers, applying them to types of data that have been overlooked until now.

Our work in this area has already been recognised. We have recently been selected as Phase 1 winners in a UK-US privacy-enhancing technologies prize challenge. The challenge seeks to develop PETs to unlock the sharing of sensitive data between healthcare providers, and use a federated learning approach to model infection risk to improve pandemic responses.

Why are PETs needed to enhance privacy?

Making sure that sensitive data has suitable privacy safeguards in place is a significant challenge. Even in vast datasets featuring anonymised data from hundreds of thousands of individuals there may be methods of reverse engineering data and combining it with other public datasets, which can lead to the retrospective identification of individual users.

Well-known cases demonstrate the inadequacy of anonymisation. Harvard Professor, and founder of the Data Justice Lab, Latanya Sweeney, proved that combining zip code, date of birth, and sex, you could identify a huge proportion of the population from hospital visits data. She demonstrated it by finding a local senators’ data, for example. Cross-referencing very different datasets, such as timed and dated paparazzi photographs with NYC taxi data, allowed for circumventing anonymisation enough to find out where celebrities they had travelled. Such events reduce wider trust in the safety of protecting sensitive data. This potentially reduces the likelihood of the public giving permission for its data to be used, even in cases where there is an obvious benefit or ‘public good’.

It can also make data institutions or organisations holding large datasets decide they cannot analyse or share their data for the public good because data privacy cannot be assured — resulting in huge pools of potentially life-saving or life-improving data, about people, their behaviours and their movement around places, left underutilised. Effective PETs offer a solution to these problems.

How our approach to PETs is different

PETS haven’t been widely used with location data in practice. Our place-centric approach attempts to redress this by focussing on predictions based on individuals’ activity and location patterns. We use the locations of infected individuals to help predict infection risk for other individuals.

We will apply two key techniques to obfuscate the sensitive data used in the infection forecast model: first through homomorphic encryption, a cryptographic technique that allows computations to be performed on encrypted data. It allows one party to run a calculation using another party’s data, without being able to see the data itself. Second, we will introduce noise into the system to mask the impact of an individual. The purpose of adding noise is to thwart attempts to discover data about an individual, a concept mathematically defined as differential privacy.

Where we are going

While our approach is implemented in terms of geospatial data, it could easily be adapted to produce features based on other graph data, for example, the population contact graph rather than the place visitation graph.

We will be developing this technology as part of Phase 2 of the UK-US privacy-enhancing technologies prize challenge. We will be sharing more about what we mean when we talk about PETs in future blogs. And at the conclusion of this challenge, we will release our core technology as open source.

If you are working with sensitive data, and want to uncover new ways to gain value from that data through collaboration get in touch with us: hello@diagonal.works


December 2022