On May 3, 2019 I was very pleased to give a keynote talk at the Go Open Data 2019 Conference in Toronto (video recordings of the conference proceedings are now available from the site). The following post includes the gist of my talk, along with hyperlinks to the different sources and examples I referenced. My talk was built around the theme of the conference: Inclusive, Equitable, Ethical, and Impactful.
In my talk this morning I am going to use the conference’s themes of Inclusive, Equitable, Ethical and Impactful to shape my remarks. In particular, I will apply these concepts to data in the smart cities context as this has been garnering so much attention lately. But it is also important to think about these in the artificial intelligence (AI) context which is increasingly becoming part of our everyday interactions with public and private sector actors, and is a part of smart cities as well.
As this is an open data conference, it might be fair to ask what smart cities and AI have to do with open data. In my view, these contexts extend the open data discussion because both depend upon vast quantities of data as inputs. They also complicate it. This is for three broad reasons:
First, the rise of smart cities means that there are expanding categories and quantities of municipal data (and provincial) that could be available as open data. There are also growing quantities of private sector data gathered in urban contexts in a variety of different ways over which arguments for sharing could be made. Thus, there is more and more data and issues of ownership, control and access become complex and often conflictual. Open government data used to be about the operations and activities of government, and there were strong arguments for making it broadly open and accessible. But government data is changing in kind, quality and quantity, particularly in smart cities contexts. Open data may therefore be shifting towards a more nuanced approach to data sharing.
Second, smart cities and AI are just two manifestations of the expanding demand for access to data for multiple new uses. There is not just MORE data, there are more applications for that data and more demand from public, private sector and civil society actors for access to it. Yet the opacity of data-hungry analytics and AI contribute to a deepening unease about data sharing.
Third, there is a growing recognition that perhaps data sharing should not be entirely free and open. Open data, under an open licence, with few if any restrictions and with no registration requirement was a kind of ideal, and it fit with the narrower concept of government data described earlier. But it is one that may not be best suited to our current environment. Not only are there potential use restrictions that we might want to apply to protect privacy or to limit undesirable impacts on individuals or communities, but there might also be arguments for cost recovery as data governance becomes more complex and more expensive. This may particularly be the case if use is predominantly by private sector actors – particularly large foreign companies. The lack of a registration requirement limits our ability to fully understand who is using our data, and it reduces the possibility of holding users to account for misuse. Again this may be something we want to address.
I mentioned that I would use the themes of this conference as a frame for my comments. Let me start with the first – the idea of inclusiveness.
Inclusive
We hear a lot about inclusiveness in smart cities – and at the same time we hear about privacy. These are complicated and intertwined.
The more we move towards using technology as an interface for public and private sector services, for interaction with government, for public consultations, elections, and so on, the more we need to focus on the problem of the digital divide and what it means to include everyone in the benefits of technology. Narrowing the digital divide will require providing greater access to devices, access to WIFI/broadband services, access to computer and data literacy, and access in terms of inclusiveness of differently-abled individuals. These are all important goals, but their achievement will inevitably have the consequence of facilitating the collection of greater quantities and more detailed personal information about those formerly kept on the other side of the digital divide. The more we use devices, the more data we generate. The same can be said of the use of public WIFI. Moving from analog to digital increases our data exhaust, and we are more susceptible to tracking, monitoring, profiling, etc. Consider the controversial LinkNYC Kiosks in New York. These large sidewalk installations include WiFi Access, android tablets, charging stations, and free nation-wide calling. But they have also raised concerns about enhanced tracking and monitoring. This is in part because the kiosks are also equipped with cameras and a range of sensors.
No matter how inclusiveness is manifested, it comes with greater data collection. The more identifiable data collected, the greater the risks to privacy, dignity, and autonomy. But de-identified data also carries its own risks to groups and communities. While privacy concerns may prompt individuals to share less data and to avoid data capture, the value of inclusiveness may actually require having one’s data be part of any collection. In many ways, smart cities are about collecting vast quantities of data of many different kinds (including human behavioural data) for use in analytics in order to identify problems, understand them, and solve them. If one is invisible in the data, so are one’s particular needs, challenges and circumstances. In cases where decisions are made based on available data, we want that data to be as complete and comprehensive as possible in order to minimize bias and to make better diagnoses and decisions. Even more importantly, we want to be included/represented in the data so that our specificity is able to influence outcomes. Inclusiveness in this sense is being counted, and counting.
Yet this type of inclusion has privacy consequences – for individuals as well as groups. One response to this has been to talk about deidentification. And while deidentification may reduce some privacy risks, but it does not reduce or eliminate all of them. It also does not prevent harmful or negative uses of the data (and it may evade the accountability provided by data protection laws). It also does not address the dignity/autonomy issues that come from the sense of being under constant surveillance.
Equitable and Ethical
If we think about issues of equity and ethics in the context of the sharing of data it becomes clear that conventional open data models might not be ideal. These models are based on unrestricted data sharing, or data sharing with a bare minimum of restrictions. Equitable and ethical data sharing may require more restrictions to be placed on data sharing – it may require the creation of frameworks for assessing proposed uses to which the data may be put. And it may even require changing how access to data is provided.
In the privacy context we have already seen discussion about reforming the law to move away from a purely consent-based model to one in which there may be “no-go zones” for data use/processing. The idea is that if we can’t really control the collection of the information, we should turn our attention to identifying and banning certain inappropriate uses. Translated into the data sharing context, licence agreements could be used to put limits on what can be done with data that is shared. Some open data licences already explicitly prohibit any attempts to reidentify deidentified data. The Responsible Data Use Assessment process created by Sidewalk Labs for its proposed data governance framework for Toronto’s Quayside development similarly would require an ‘independent’ body to assess whether a proposed use of urban data is acceptable.
The problem, of course, is that licence-based restrictions require oversight and enforcement to have any meaning. I wrote about this a couple of years ago in the context of the use of social media data for analytics services provided to police services across North America. The analytics companies contracted for access to social media data but were prohibited in their terms of use from using this data in the way they ultimately did. The problem was uncovered after considerable effort by the ACLU and the Brennan Center for Justice – it was not discovered by the social media companies who provided access to their data or who set the terms of use. In the recent Report of Findings by the Privacy Commissioner of Canada into Facebook’s role in the Cambridge Analytica scandal, the Commissioner found that although Facebook’s terms of service with developers prohibited the kind of activities engaged in by Dr Kogan who collected the data, they failed in their duty to safeguard personal information, and in particular, ignored red flags that should have told them that there was a problem. Let’s face it; companies selling access to data may have no interest in policing the behaviour of their customers or in terminating their access. An ‘independent’ body set up to perform such functions may lack the resources and capacity to monitor and enforce compliance.
Another issue that exists with ethical approaches is, of course, whose ethics? Taking an ethical approach does not mean being value-neutral and it does not mean that there will not be winners and losers. It is like determining the public interest – an infinitely malleable concept. This is why the composition of decision-making bodies and the location of decision-making power, when it comes to data collection and data sharing, is so important and so challenging.
Impactful
In approaching this last of the conference’s themes – impactful – I think it is useful to talk about solutions. And since I am almost out of time and this is the start of the day’s events, I am going to be very brief as solutions will no doubt be part of the broader discussion today.
The challenges of big data, AI and smart cities have led to a broad range of different proposed data governance solutions. Some of these are partial; for example, deidentification/anonymization or privacy by design approaches address what data is collected and how, but they do not necessarily address uses.
Some are aspirational. For example, developing ethical approaches to AI such as the Montreal Declaration for a Responsible Development of Artificial Intelligence. Others attempt to embed both privacy and ethics into concrete solutions – for example the federal Directive on Automated Decision-Making for the public sector, which sets parameters for the adoption, implementation and oversight of AI deployment in government. In addition, there are a number of models emerging, including data trusts in all their variety (ODI), or bottom-up solutions such as Civic Data Trusts (see, e.g.: MaRS, Element AI, SeanMcDonald), which involve access moderated by an independent (?), representative (?) body, in the public interest (?) according to set principles.
Safe sharing sites is another concept discussed by Lisa Austin and David Lie of the University of Toronto – they are not necessarily independent of data trusts or civic data trusts. Michel Girard is currently doing very interesting work on the use of data standards (see his recent CIGI paper).
Some solutions may also be rooted in law reform as there are deficiencies in our legal infrastructure when it comes to data governance. One key target of reform is data protection laws, but context-specific laws may also be required.
Many of these solutions are in the debate/discussion/development stage. Without a doubt there is a great deal of work to be done. Let’s start doing it.