Tags
access to information
AI
AIDA
AI governance
AI regulation
Ambush Marketing
artificial intelligence
big data
bill c11
Bill c27
copyright
data governance
data protection
Electronic Commerce
freedom of expression
Geospatial
geospatial data
intellectual property
Internet
internet law
IP
open courts
open data
open government
personal information
pipeda
Privacy
smart cities
trademarks
transparency
|
Tuesday, 07 February 2023 13:50
Court decision explores reidentification risk in access to information request
A recent decision of the Federal Court of Canada exposes the tensions between access to information and privacy in our data society. It also provides important insights into how reidentification risk should be assessed when government agencies or departments respond to requests for datasets with the potential to reveal personal information. The case involved a challenge by two journalists to Health Canada’s refusal to disclose certain data elements in a dataset of persons permitted to grow medical marijuana for personal use under the licensing scheme that existed before the legalization of cannabis. [See journalist Molly Hayes’ report on the story here]. Health Canada had agreed to provide the first character of the Forward Sortation Area (FSA) of the postal codes of licensed premises but declined to provide the second and third characters or the names of the cities in which licensed production took place. At issue was whether these location data constituted “personal information” – which the government cannot disclose under s. 19(1) of the Access to Information Act (ATIA). A second issue was the degree of effort required of a government department or agency to maximize the release of information in a privacy-protective way. Essentially, this case is about “the appropriate analytical approach to measuring privacy risks in relation to the release of information from structured datasets that contain personal information” (at para 2). The licensing scheme was available to those who wished to grow their own marijuana for medical purposes or to anyone seeking to be a “designated producer” for a person in need of medical marijuana. Part of the licence application required the disclosure of the medical condition that justified the use of medical marijuana. Where a personal supply of medical marijuana is grown at the user’s home, location information could easily be linked to that individual. Both parties agreed that the last three characters in a six-character postal code would make it too easy to identify individuals. The dispute concerned the first three characters – the FSA. The first character represents a postal district. For example, Ontario, Canada’s largest province, has five postal districts. The second character indicates whether an area within the district is urban or rural. The third character identifies either a “specific rural region, an entire medium-sized city, or a section of a major city” (at para 12). FSAs differ in size; StatCan data from 2016 indicated that populations in FSAs ranged from no inhabitants to over 130,000. Information about medical marijuana and its production in a rapidly evolving public policy context is a subject in which there is a public interest. In fact, Health Canada proactively publishes some data on its own website regarding the production and use of medical marijuana. Yet, even where a government department or agency publishes data, members of the public can use the ATI system to request different or more specific data. This is what happened in this case. In his decision, Justice Pentney emphasized that both access to information and the protection of privacy are fundamental rights. The right of access to government information, however, does not include a right to access the personal information of third parties. Personal information is defined in the ATIA as “information about an identifiable individual” (s. 3). This means that all that is required for information to be considered personal is that it can be used – alone or in combination with other information – to identify a specific individual. Justice Pentney reaffirmed that the test for personal information from Gordon v. Canada (Health) remains definitive. Information is personal information “where there is a serious possibility that an individual could be identified through the use of that information, alone or in combination with other available information.” (Gordon, at para 34, emphasis added). More recently, the Federal Court has defined a “serious possibility” as “a possibility that is greater than speculation or a ‘mere possibility', but does not need to reach the level of ‘more likely than not’” (Public Safety, at para 53). Geographic information is strongly linked to reidentification. A street address is, in many cases, clearly personal information. However, city, town or even province of residence would only be personal information if it can be used in combination with other available data to link to a specific individual. In Gordon, the Federal Court upheld a decision to not release province of residence data for those who had suffered reported adverse drug reactions because these data could be combined with other available data (including obituary notices and even the observations of ‘nosy neighbors’) to identify specific individuals. The Information Commissioner argued that to meet the ‘serious possibility’ test, Health Canada should be able to concretely demonstrate identifiability by connecting the dots between the data and specific individuals. Justice Pentney disagreed, noting that in the case before him, the expert opinion combined with evidence about other available data and the highly sensitive nature of the information at issue made proof of actual linkages unnecessary. However, he cautioned that “in future cases, the failure to engage in such an exercise might well tip the balance in favour of disclosure” (at para 133). Justice Pentney also ruled that, because the proceeding before the Federal Court is a hearing de novo, he was not limited to considering the data that were available at the time of the ATIP request. A court can take into account data made available after the request and even after the decision of the Information Commissioner. This makes sense. The rapidly growing availability of new datasets as well as new tools for the analysis and dissemination of data demand a timelier assessment of identifiability. Nevertheless, any pending or possible future ATI requests would be irrelevant to assessing reidentification risk, since these would be hypothetical. Justice Pentney noted: “The fact that a more complete mosaic may be created by future releases is both true and irrelevant, because Health Canada has an ongoing obligation to assess the risks, and if at some future point it concludes that the accumulation of information released created a serious risk, it could refuse to disclose the information that tipped the balance” (at para 112). The court ultimately agreed with Health Canada that disclosing anything beyond the first character of the FSA could lead to the identification of some individuals within the dataset, and thus would amount to personal information. Health Canada had identified three categories of other available data: data that it had proactively published on its own website; StatCan data about population counts and FSAs; and publicly available data that included data released in response to previous ATIP requests relating to medical marijuana. In this latter category the court noted that there had been a considerable number of prior requests that provided various categories of data, including “type of license, medical condition (with rare conditions removed), dosage, and the issue date of the licence” (at para 64). Other released data included the licensee’s “year of birth, dosage, sex, medical condition (rare conditions removed), and province (city removed)” (at para 64). Once released, these data are in the public domain, and can contribute to a “mosaic effect” which allows data to be combined in ways that might ultimately identify specific individuals. Health Canada had provided evidence of an interactive map of Canada published on the internet that showed the licensing of medical marijuana by FSA between 2001 and 2007. Justice Pentney noted that “[a]n Edmonton Journal article about the interactive map provided a link to a database that allowed users to search by medical condition, postal code, doctor’s speciality, daily dosage, and allowed storage of marijuana” (at para 66). He stated: “the existence of evidence demonstrating that connections among disparate pieces of relevant information have previously been made and that the results have been made available to the public is a relevant consideration in applying the serious possibility test” (at para 109). Justice Pentney observed that members of the public might already have knowledge (such as the age, gender or address) of persons they know who consume marijuana that they might combine with other released data to learn about the person’s underlying medical condition. Further, he notes that “the pattern of requests and the existence of the interactive map show a certain motivation to glean more information about the administration of the licensing regime” (at para 144). Health Canada had commissioned Dr Khaled El Emam to produce and expert report. Dr. El Emam determined that “there are a number of FSAs that are high risk if either three or two characters of the FSA are released, there are no high-risk FSAs if only the first character is released” (at para 80). Relying on this evidence, Justice Pentney concluded that “releasing more than the first character of an FSA creates a significantly greater risk of reidentification” (at para 157). This risk would meet the “serious possibility” threshold, and therefore the information amounts to “personal information” and cannot be disclosed under the legislation. The Information Commissioner raised issues about the quality of other available data, suggesting that incomplete and outdated datasets would be less likely to create reidentification risk. For example, since cannabis laws had changed, there are now many more people cultivating marijuana for personal use. This would make it harder to connect the knowledge that a particular person was cultivating marijuana with other data that might lead to the disclosure of a medical condition. Justice Pentney was unconvinced since the quantities of marijuana required for ongoing medical use might exceed the general personal use amounts, and thus would still require a licence, creating continuity in the medical cannabis licensing data before and after the legalization of cannabis. He noted: “The key point is not that the data is statistically comparable for the purposes of scientific or social science research. Rather, the question is whether there is a significant possibility that this data can be combined to identify particular individuals.” (at para 118) Justice Pentney therefore distinguishes between the issue of data quality from a data science perspective and data quality from the perspective of someone seeking to identify specific individuals. He stated: “the fact that the datasets may not be exactly comparable might be a problem for a statistician or social scientist, but it is not an impediment to a motivated user seeking to identify a person who was licensed for personal production or a designated producer under the medical marijuana licensing regime” (at para 119). Justice Pentney emphasized the relationship between sensitivity of information and reidentification risk, noting that “the type of personal information in question is a central concern for this type of analysis” (at para 107). This is because “the disclosure of some particularly sensitive types of personal information can be expected to have particularly devastating consequences” (at para 107). With highly sensitive information, it is important to reduce reidentification risk, which means limiting disclosure “as much as is feasible” (at para 108). Justice Pentney also dealt with a further argument that Health Canada should not be able to apply the same risk assessment to all the FSA data; rather, it should assess reidentification risk based on the size of the area identified by the different FSA characters. The legislation allows for severance of information from disclosed records, and the journalists argued that Health Canada could have used severance to reduce the risk of reidentification while releasing more data where the risks were acceptably low. Health Canada responded that to do a more fine-grained analysis of the reidentification risk by FSA would impose an undue burden because of the complexity of the task. In its submissions as intervenor in the case, the Office of the Privacy Commissioner suggested that other techniques could be used to perturb the data so as to significantly lower the risk of reidentification. Such techniques are used, for example, where data are anonymized. Justice Pentney noted that the effort required by a government department or agency was a matter of proportionality. Here, the data at issue were highly sensitive. The already-disclosed first character of the FSA provided general location information about the licences. Given these facts, “[t]he question is whether a further narrowing of the lens would bring significant benefits, given the effort that doing so would require” (at para 181). He concluded that it would not, noting the lack of in-house expertise at Health Canada to carry out such a complex task. Regarding the suggestion of the Privacy Commissioner that anonymization techniques should be applied, he found that while this is not precluded by the ATIA, it was a complex task that, on the facts before him, went beyond what the law requires in terms of severance. This is an interesting and important decision. First, it reaffirms the test for ‘personal information’ in a more complex data society context than the earlier jurisprudence. Second, it makes clear that the sensitivity of the information at issue is a crucial factor that will influence an assessment not just of the reidentification risk, but of tolerance for the level of risk involved. This is entirely appropriate. Not only is personal health information highly sensitive, at the time these data were collected, licensing was an important means of gaining access to medical marijuana for people suffering from serious and ongoing medical issues. Their sharing of data with the government was driven by their need and vulnerability. Failure to robustly protect these data would enhance vulnerability. The decision also clarifies the evidentiary burden on government to demonstrate reidentification risk – something that will vary according to the sensitivity of the data. It highlights the dynamic and iterative nature of reidentification risk assessment as the risk will change as more data are made available. Indirectly, the decision also casts light on the challenges of using the ATI system to access data and perhaps a need to overhaul that system to provide better access to high-quality public-sector information for research and other purposes. Although Health Canada has engaged in proactive disclosure (interestingly, such disclosures were a factor in assessing the ‘other available data’ that could lead to reidentification in this case), more should be done by governments (both federal and provincial) to support and ensure proactive disclosure that better meets the needs of data users while properly protecting privacy. Done properly, this would require an investment in capacity and infrastructure, as well as legislative reform.
Published in
Privacy
Wednesday, 06 July 2022 09:05
Anonymization and De-identification in Bill C-27This is the second post in a series on Bill C-27, a bill introduced in Parliament in June 2022 to reform Canada's private sector data protection law. The first post, on consent provisions, is found here.
In a data-driven economy, data protection laws are essential to protect privacy. In Canada, the proposed Consumer Privacy Protection Act in Bill C-27 will, if passed, replace the aging Personal Information Protection and Electronic Documents Act (PIPEDA) to govern the collection, use and disclosure of personal information by private sector organizations. Personal information is defined in Bill C-27 (as it was in PIPEDA) as “information about an identifiable individual”. The concept of identifiability of individuals from information has always been an important threshold issue for the application of the law. According to established case law, if an individual can be identified directly or indirectly from data, alone or in combination with other available data, then those data are personal information. Direct identification comes from the presence of unique identifiers that point to specific individuals (for example, a name or a social insurance number). Indirect identifiers are data that, if combined with other available data, can lead to the identification of individuals. To give a simple example, a postal code on its own is not a direct identifier of any particular individual, but in a data set with other data elements such as age and gender, a postal code can lead to the identification of a specific individual. In the context of that larger data set, the postal code can constitute personal information. As the desire to access and use more data has grown in the private (and public) sector, the concepts of de-identification and anonymization have become increasingly important in dealing with personal data that have already been collected by organizations. The removal of both direct and indirect identifiers from personal data can protect privacy in significant ways. PIPEDA did not define ‘de-identify’, nor did it create particular rules around the use or disclosure of de-identified information. Bill C-11, the predecessor to C-27, addressed de-identified personal information, and contained the following definition: de-identify means to modify personal information — or create information from personal information — by using technical processes to ensure that the information does not identify an individual or could not be used in reasonably foreseeable circumstances, alone or in combination with other information, to identify an individual This definition was quite inclusive (information created from personal information, for example, would include synthetic data). Bill C-11 set a relative standard for de-identification – in other words, it accepted that de-identification was sufficient if the information could not be used to identify individuals “in reasonably foreseeable circumstances”. This was reinforced by s. 74 which required organizations that de-identified personal information to use measures that were proportionate to the sensitivity of the information and the way in which the information was to be used. De-identification did not have to be perfect – but it had to be sufficient for the context. Bill C-11’s definition of de-identification was criticized by private sector organizations that wanted de-identified data to fall outside the scope of the Act. In other words, they sought either an exemption from the application of the law for de-identified personal information, or a separate category of “anonymized” data that would be exempt from the law. According to this view, if data cannot be linked to an identifiable individual, then they are not personal data and should not be subject to data protection law. For their part, privacy advocates were concerned about the very real re-identification risks, particularly in a context in which there is a near endless supply of data and vast computing power through which re-identification can take place. These concerns are supported by research (see also here and here). The former federal Privacy Commissioner recommended that it be made explicit that the legislation would apply to de-identified data. The changes in Bill C-27 reflect the power of the industry lobby on this issue. Bill C-27 creates separate definitions for anonymized and de-identified data. These are: anonymize means to irreversibly and permanently modify personal information, in accordance with generally accepted best practices, to ensure that no individual can be identified from the information, whether directly or indirectly, by any means. [. . .] de-identify means to modify personal information so that an individual cannot be directly identified from it, though a risk of the individual being identified remains. [my emphasis] Organizations will therefore be pleased that there is now a separate category of “anonymized” data, although such data must be irreversibly and permanently modified to ensure that individuals are not identifiable. This is harder than it sounds; there is, even with synthetic data, for example, still some minimal risk of reidentification. An important concern, therefore, is whether the government is actually serious about this absolute standard, whether it will water it down by amendment before the bill is enacted, or whether it will let interpretation and argument around ‘generally accepted best practices’ soften it up. To ensure the integrity of this provision, the law should enable the Privacy Commissioner to play a clear role in determining what counts as anonymization. Significantly, under Bill C-27, information that is ‘anonymized’ would be out of scope of the statute. This is made clear in a new s. 6(5) which provides that “this Act does not apply in respect of personal information that has been anonymized”. The argument to support this is that placing data that are truly anonymized out of scope of the legislation creates an incentive for industry to anonymize data, and anonymization (if irreversible and permanent) is highly privacy protective. Of course, similar incentives can be present if more tailored exceptions are created for anonymized data without it falling ‘out of scope’ of the law. Emerging and evolving concepts of collective privacy take the view that there should be appropriate governance of the use of human-derived data, even if it has been anonymized. Another argument for keeping anonymized data in scope relates to the importance of oversight, given re-identification risks. Placing anonymized data outside the scope of data protection law is contrary to the recent recommendations of the ETHI Standing Committee of the House of Commons following its hearings into the use of de-identified private sector mobility data by the Public Health Agency of Canada. ETHI recommended that the federal laws be amended “to render these laws applicable to the collection, use, and disclosure of de-identified and aggregated data”. Aggregated data is generally considered to be data that has been anonymized. The trust issues referenced by ETHI when it comes to the use of de-identified data reinforce the growing importance of notions of collective privacy. It might therefore make sense to keep anonymized data within scope of the legislation (with appropriate exceptions to maintain incentives for anonymization) leaving room for governance of anonymization. Bill C-27 also introduces a new definition of “de-identify”, which refers to modifying data so that individuals cannot be directly identified. Direct identification has come to mean identification through specific identifiers such as names, or assigned numbers. The new definition of ‘de-identify’ in C-27 suggests that simply removing direct identifiers will suffice to de-identify personal data (a form of what, in the GDPR, is referred to as pseudonymization). Thus, according to this definition, as long as direct identifiers are removed from a data set, an organization can use data without knowledge or consent in certain circumstances, even though specific individuals might still be identifiable from those data. While it will be argued that these circumstances are limited, the exception for sharing for ‘socially beneficial purposes’ is disturbingly broad given this weak definition (more to come on this in a future blog post). In addition, the government can add new exceptions to the list by regulation. The reference in the definition of ‘de-identify’ only to direct identification is meant to be read alongside s. 74 of Bill C-27, which provides: 74 An organization that de-identifies personal information must ensure that any technical and administrative measures applied to the information are proportionate to the purpose for which the information is de-identified and the sensitivity of the personal information. Section 74 remains unchanged from Bill C-11, where it made more sense, since it defined de-identification in terms of direct or indirect identifiers using a relative standard. In the context of the new definition of ‘de-identify’, it is jarring, since de-identification according to the new definition requires only the removal of direct identifiers. What this, perhaps, means is that although the definition of de-identify only requires removal of direct identifiers, actual de-identification might mean something else. This is not how definitions are supposed to work. In adopting these new definitions, the federal government sought to align its terminology with that used in Quebec’s Loi 25 that reformed its public and private sector data protection laws. The Quebec law provides, in a new s. 23, that: [. . .] For the purposes of this Act, information concerning a natural person is anonymized if it is, at all times, reasonably foreseeable in the circumstances that it irreversibly no longer allows the person to be identified directly or indirectly. Information anonymized under this Act must be anonymized according to generally accepted best practices and according to the criteria and terms determined by regulation. Loi 25 also provides that data is de-identified (as opposed to anonymized) “if it no longer allows the person concerned to be directly identified”. At first glance, it seems that Bill C-27 has adopted similar definitions – but there are differences. First, the definition of anonymization in Loi 25 uses a relative standard (not an absolute one as in C-27). It also makes specific reference not just to generally accepted best practices, but to criteria and terms to be set out in regulation, whereas in setting standards for anonymization, C-27 refers only to “generally accepted best practices”. [Note that in its recommendations following its hearings into the use of de-identified private sector mobility data by the Public Health Agency of Canada, the ETHI Committee of Parliament recommended that federal data protection laws should include “a standard for de-identification of data or the ability for the Privacy Commissioner to certify a code of practice in this regard.”] Second, and most importantly, in the Quebec law, anonymized data does not fall outside the scope of the legislation –instead, a relative standard is used to provide some flexibility while still protecting privacy. Anonymized data are still subject to governance under the law, even though the scope of that governance is limited. Further, under the Quebec law, recognizing that the definition of de-identification is closer to pseudonymization, the uses of de-identified data are more restricted than they are in Bill C-27. Further, in an eye-glazing bit of drafting, s. 2(3) of Bill C-27 provides: 2(3) For the purposes of this Act, other than sections 20 and 21, subsections 22(1) and 39(1), sections 55 and 56, subsection 63(1) and sections 71, 72, 74, 75 and 116, personal information that has been de-identified is considered to be personal information. This is a way of saying that de-identified personal information remains within the scope of the Act except where it does not. Yet, data that has only direct identifiers stripped from it should always be considered personal information, since the reidentification risk, as noted above, could be very high. What s. 2(3) does is allow de-identified data to be treated as anonymized (out of scope) in some circumstances. For example, s. 21 allows organizations to use ‘de-identified’ personal information for internal research purposes without knowledge or consent. The reference in s. 2(3) amplifies this by providing that such information is not considered personal information. As a result, presumably, other provisions in Bill C-27 would not apply. This might include data breach notification requirements – yet if information is only pseudonymized and there is a breach, it is not clear why such provisions should not apply. Pseudonymization might provide some protection to those affected by a breach, although it is also possible that the key was part of the breach, or that individuals remain re-identifiable in the data. The regulator should have jurisdiction. Subsection 22(1) allows for the use and even the disclosure of de-identified personal information between parties to a prospective business transaction. In this context, the de-identified information is not considered personal information (according to s. 2(3)) and so the only safeguards are those set out in s. 22(1) itself. Bizarrely, s. 22(1) makes reference to the sensitivity of the information – requiring safeguards appropriate to its sensitivity, even though it is apparently not considered personal information. De-identified (not anonymized) personal information can also be shared without knowledge or consent for socially beneficial purposes under s. 39(1). (I have a blog post coming on this provision, so I will say no more about it here, other than to note that given the definition of ‘de-identify’, such sharing seems rash and the safeguards provided are inadequate). Section 55 provides for a right of erasure of personal information; since information stripped of direct identifiers is not personal information for the purposes of section 55 (according to s. 2(3)), this constitutes an important limitation on the right of erasure. If data are only pseudonymized, and if the organization retains the key, then why is there no right of erasure? Section 56 addresses the accuracy of personal information. Personal information de-identified according to the definition in C-27 would also be exempted from this requirement. In adopting the definitions of ‘anonymize’ and ‘de-identify’, the federal government meets a number of public policy objectives. It enhances the ability of organizations to make use of data. It also better aligns the federal law with Quebec’s law (at least at the definitional level). The definitions may also create scope for other privacy protective technologies such as pseudonymization (which is what the definition of de-identify in C-27 probably really refers to) or different types of encryption. But the approach it has adopted creates the potential for confusion, for risks to privacy, and for swathes of human-derived data to fall ‘outside the scope’ of data protection law. The government view may be that, once you stir all of Bill C-27’s provisions into the pot, and add a healthy dose of “trust us”, the definition of “de-identify” and its exceptions are not as problematic as they are at first glance. Yet, this seems like a peculiar way to draft legislation. The definition should say what it is supposed to say, rather than have its defects mitigated by a smattering of other provisions in the law and faith in the goodness of others and the exceptions still lean towards facilitating data use rather than protecting privacy. In a nutshell, C-27 has downgraded the definition of de-identification from C-11. It has completely excluded from the scope of the Act anonymized data, but has provided little or no guidance beyond “generally accepted best practices” to address anonymization. If an organization claims that their data are anonymized and therefore outside of the scope of the legislation, it will be an uphill battle to get past the threshold issue of anonymization in order to have a complaint considered under what would be the new law. The organization can simply dig in and challenge the jurisdiction of the Commissioner to investigate the complaint. All personal data, whether anonymized or ‘de-identified’ should remain within the scope of the legislation. Specific exceptions can be provided where necessary. Exceptions in the legislation for the uses of de-identified information without knowledge or consent must be carefully constrained and reinforced with safeguards. Further, the regulator should play a role in establishing standards for anonymization and de-identification. This may involve consultation and collaboration with standards-setting bodies, but references in the legislation must be to more than just “generally accepted best practices”.
Published in
Privacy
|
Electronic Commerce and Internet Law in Canada, 2nd EditionPublished in 2012 by CCH Canadian Ltd. Intellectual Property for the 21st CenturyIntellectual Property Law for the 21st Century: Interdisciplinary Approaches |