Beyond open data: scoping city data and analytics services

By Jeni Tennison, expert in technology, governance, and public policy, and Public Digital Network member.

As part of the Bloomberg Philanthropies City Data Alliance, Public Digital will teach city leaders how to move beyond a portal-only strategy and channel data into tools tailored to the internal and external stakeholder’s needs. The essence of this transformation is to shift cities from thinking of data as something they might publish simply for transparency, towards implementing what we call “Data as a Service”, a concept we explored in the first post in this series.

In this post, we’ll look at how cities are moving beyond supplying open government data, towards supplying more sensitive data, and data from third parties. We’ll also look at how the self-service portal-based mode of delivery can be extended towards supplying more curated and responsive data services, and stimulating the use of data.

Beyond open government data

Traditional data portals focused on non-personal data stewarded by government bodies. There are certainly good arguments for prioritising this: when a public body is the only organisation that holds particular data, it needs to make it available if others are going to use it; and non-personal data is less risky to publish than personal data.

However, if we consider the needs of potential users of this data, it falls short in a few ways:

people and organisations in cities want data about the city, not just data the city government holds,
they want to access data that may be sensitive, and therefore can’t be open,
and they may even want to access data city governments hold about themselves.

City data, not just city government data

To meet user needs, the emphasis needs to be on city data rather than city government data. Organisations intending to build digital services within a city are interested in data held by organisations other than the city government. They might be interested in data that is collated at the state, national or even international level: geographic information, census data, data about the weather or environment. They might be interested in data held by private sector companies operating in the city, such as electricity or telecommunications utilities, Airbnb, crowdfunding platforms, or the local Scooter-as-a-Service.

A good example of how these needs can be filled is the High Streets Data Service in London, which draws on data from a range of partners, including procuring commercial data from private sector organisations such as Mastercard and O2, as well as providing cuts of open data from public sector ones such as the UK’s Land Registry to create a service that helps high streets recover from the pandemic.

Organisations and citizens might be interested in data that they’re happy to collect and maintain themselves (or collaboratively, such as Gilbert, AZ’s crowdsourcing of holiday light displays), but that needs a more robust institutional home for sustainability and perceived trustworthiness, which a city can offer. An example of this involving companies, rather than citizens, is London’s Infrastructure Mapping Application on which utilities and construction companies share data on a closed platform that helps ensure they only dig up the road once for water, gas and other connections, saving money and causing less disruption to Londoners.

Sensitive data, not just open data

The classification of data into non-personal vs personal data is not straightforward. When a non-personal dataset has been generated from personal data, it will often contain lingering traces of personal information – “anonymised” is a sliding scale reflecting re-identification risk, not an absolute state. Furthermore, the publication of even completely non-personal data, such as the location of areas protected for scientific, environmental or national security reasons, can be harmful.

More fundamentally, publishing data openly does not always lead to equitable outcomes. As Barbara Prainsack describes, inequities can arise in who is represented in data, who uses it, who benefits from it, and who governs it. Data being non-personal reduces some types of risks, namely those to our right to privacy, but not others. This is why data governance – which we’ll come onto in a later post – is so important: it helps to identify, mitigate and manage risks around data, whatever shape they might have.

Recognising the need for some people and organisations (particularly researchers) to have access to data that cannot be open, many government data portals, such as the London Datastore, have rightly moved beyond only listing open data, and towards providing an inventory of a broader range of datasets, some of which may have more restrictive access conditions placed on them.

Individual level data, not just bulk data

Conversely, individuals having access to personal data held by local governments about them can be incredibly useful and empowering. We can see how this can work through examples in the health sector: being able to give third parties such as travel or venue management companies access to your Covid-19 vaccination status can ease journeys and access to events. Letting car hire or insurance companies directly access your driving licence information saves you time and provides them with additional reassurance about the truth and accuracy of that information.

Secure access to personal data held by public bodies, under the control of the person it’s about (or a legally recognised representative) can lead to smoother, easier and more personalised services. In a similar way, organisations may find it useful to be able to access data governments hold about them (for example if they collect local business taxes). Unlike data from third parties, this account-level data is a kind of data that only governments can provide.

Each of these shifts in the kinds of data city governments may want to provide brings new and different demands on data teams within city governments. They might need to source and procure data from other organisations, run data ethics and governance processes, and develop secure identity, authentication and data provision services. Because of this, city data teams will need a broader range of skills than working directly with data.

Beyond portals

We’ve looked at the scope of data and analytics, what about the scope of service? Traditional government data portals have a self-service model, where they provide a registry or inventory of available datasets, with descriptions and other metadata, and sometimes basic visualisations, that can be browsed or searched by users to find useful data. An analogy might be to a supermarket, where there are rough groupings of types of products and customers can wander the aisles freely, picking items off shelves to take a closer look at nutrition and other information, before choosing the ingredients they want.

There are other models, though. Some cities and other organisations offer a more curated service, broadly oriented around the questions users might have or the things they are trying to achieve, and with advice about which datasets go well together (for example, if you want to use these statistics, you’ll need this reference data). A nice example is the way Los Angeles provides data about Improving Digital Equity in Los Angeles, which provides both a walk through of what the data shows and guidance on how to use data to create similar maps. An analogy here would be a supermarket where ingredients were clustered in terms of the kinds of meals that could be made with them, alongside recipe cards that recommend good combinations.

There are also opportunities to add more social and community features to the provision of data, making it more of a brokerage service. This might include featuring visualisations and analyses that other people have created using particular datasets – providing both inspiration and open source code – and supporting features that let users support each other, such as asking and answering questions about datasets through comments or forums. An analogy would be a service that both provided sample meals and their recipes, but also enabled you to get in touch with local chefs and restaurants who might simply cook you a meal, or help you create your own dishes.

And finally, a city could offer a request service, to enable people to ask for the data or analytics they need. Responses might be pointers to existing datasets, support in their interpretation, or the provision of cuts and queries over data. This may be particularly useful for data that is too sensitive for wide publication; see, for example, the UK Office of National Statistics has a request service which enables people to request particular statistics that they need, which are then published openly after a few days or weeks. The analogy here would be the city itself having a public kitchen, from which people could request anything from advice, to prepared ingredients, to full meals.

Again, a move away from the self-service model requires new skills in teams providing Data as a Service. They might need their own data analytics expertise, knowledge of diverse domains or the ability to source it, skills in community building and moderation, and in customer support.

Supply and demand

This post has focused on the supply of data by a city, and how the data and services cities are providing are shifting. Many cities also aim to boost demand for data and analytics, as well as supplying it. City data teams often find that those they are serving with data – internally and externally – have limited capability to understand or use it. They can design services to meet the needs of existing data users, but there is a large field of potential data users who don’t know what they don’t know.

Many cities try to fill this gap through providing data training and support services, and through incentivising data use, alongside provision, through schemes like New York City’s Innovation Fellows program or the London Office of Technology Innovation, or those in Barcelona and Helsinki. As we reflect on implementing “Data as a Service” in cities, we recognise they need to fit in with these larger programs of activity that focus on making a city and its decision making more data-driven.