Anonymization and De-identification in Bill C-27

Wednesday, 06 July 2022 09:05

Anonymization and De-identification in Bill C-27

Written by Teresa Scassa

font size decrease font size increase font size
Print
E-mail

Rate this item

(6 votes)

This is the second post in a series on Bill C-27, a bill introduced in Parliament in June 2022 to reform Canada's private sector data protection law. The first post, on consent provisions, is found here.

In a data-driven economy, data protection laws are essential to protect privacy. In Canada, the proposed Consumer Privacy Protection Act in Bill C-27 will, if passed, replace the aging Personal Information Protection and Electronic Documents Act (PIPEDA) to govern the collection, use and disclosure of personal information by private sector organizations. Personal information is defined in Bill C-27 (as it was in PIPEDA) as “information about an identifiable individual”. The concept of identifiability of individuals from information has always been an important threshold issue for the application of the law. According to established case law, if an individual can be identified directly or indirectly from data, alone or in combination with other available data, then those data are personal information. Direct identification comes from the presence of unique identifiers that point to specific individuals (for example, a name or a social insurance number). Indirect identifiers are data that, if combined with other available data, can lead to the identification of individuals. To give a simple example, a postal code on its own is not a direct identifier of any particular individual, but in a data set with other data elements such as age and gender, a postal code can lead to the identification of a specific individual. In the context of that larger data set, the postal code can constitute personal information.

As the desire to access and use more data has grown in the private (and public) sector, the concepts of de-identification and anonymization have become increasingly important in dealing with personal data that have already been collected by organizations. The removal of both direct and indirect identifiers from personal data can protect privacy in significant ways. PIPEDA did not define ‘de-identify’, nor did it create particular rules around the use or disclosure of de-identified information. Bill C-11, the predecessor to C-27, addressed de-identified personal information, and contained the following definition:

de-identify means to modify personal information — or create information from personal information — by using technical processes to ensure that the information does not identify an individual or could not be used in reasonably foreseeable circumstances, alone or in combination with other information, to identify an individual

This definition was quite inclusive (information created from personal information, for example, would include synthetic data). Bill C-11 set a relative standard for de-identification – in other words, it accepted that de-identification was sufficient if the information could not be used to identify individuals “in reasonably foreseeable circumstances”. This was reinforced by s. 74 which required organizations that de-identified personal information to use measures that were proportionate to the sensitivity of the information and the way in which the information was to be used. De-identification did not have to be perfect – but it had to be sufficient for the context.

Bill C-11’s definition of de-identification was criticized by private sector organizations that wanted de-identified data to fall outside the scope of the Act. In other words, they sought either an exemption from the application of the law for de-identified personal information, or a separate category of “anonymized” data that would be exempt from the law. According to this view, if data cannot be linked to an identifiable individual, then they are not personal data and should not be subject to data protection law. For their part, privacy advocates were concerned about the very real re-identification risks, particularly in a context in which there is a near endless supply of data and vast computing power through which re-identification can take place. These concerns are supported by research (see also here and here). The former federal Privacy Commissioner recommended that it be made explicit that the legislation would apply to de-identified data.

The changes in Bill C-27 reflect the power of the industry lobby on this issue. Bill C-27 creates separate definitions for anonymized and de-identified data. These are:

anonymize means to irreversibly and permanently modify personal information, in accordance with generally accepted best practices, to ensure that no individual can be identified from the information, whether directly or indirectly, by any means.

[. . .]

de-identify means to modify personal information so that an individual cannot be directly identified from it, though a risk of the individual being identified remains. [my emphasis]

Organizations will therefore be pleased that there is now a separate category of “anonymized” data, although such data must be irreversibly and permanently modified to ensure that individuals are not identifiable. This is harder than it sounds; there is, even with synthetic data, for example, still some minimal risk of reidentification. An important concern, therefore, is whether the government is actually serious about this absolute standard, whether it will water it down by amendment before the bill is enacted, or whether it will let interpretation and argument around ‘generally accepted best practices’ soften it up. To ensure the integrity of this provision, the law should enable the Privacy Commissioner to play a clear role in determining what counts as anonymization.

Significantly, under Bill C-27, information that is ‘anonymized’ would be out of scope of the statute. This is made clear in a new s. 6(5) which provides that “this Act does not apply in respect of personal information that has been anonymized”. The argument to support this is that placing data that are truly anonymized out of scope of the legislation creates an incentive for industry to anonymize data, and anonymization (if irreversible and permanent) is highly privacy protective. Of course, similar incentives can be present if more tailored exceptions are created for anonymized data without it falling ‘out of scope’ of the law.

Emerging and evolving concepts of collective privacy take the view that there should be appropriate governance of the use of human-derived data, even if it has been anonymized. Another argument for keeping anonymized data in scope relates to the importance of oversight, given re-identification risks. Placing anonymized data outside the scope of data protection law is contrary to the recent recommendations of the ETHI Standing Committee of the House of Commons following its hearings into the use of de-identified private sector mobility data by the Public Health Agency of Canada. ETHI recommended that the federal laws be amended “to render these laws applicable to the collection, use, and disclosure of de-identified and aggregated data”. Aggregated data is generally considered to be data that has been anonymized. The trust issues referenced by ETHI when it comes to the use of de-identified data reinforce the growing importance of notions of collective privacy. It might therefore make sense to keep anonymized data within scope of the legislation (with appropriate exceptions to maintain incentives for anonymization) leaving room for governance of anonymization.

Bill C-27 also introduces a new definition of “de-identify”, which refers to modifying data so that individuals cannot be directly identified. Direct identification has come to mean identification through specific identifiers such as names, or assigned numbers. The new definition of ‘de-identify’ in C-27 suggests that simply removing direct identifiers will suffice to de-identify personal data (a form of what, in the GDPR, is referred to as pseudonymization). Thus, according to this definition, as long as direct identifiers are removed from a data set, an organization can use data without knowledge or consent in certain circumstances, even though specific individuals might still be identifiable from those data. While it will be argued that these circumstances are limited, the exception for sharing for ‘socially beneficial purposes’ is disturbingly broad given this weak definition (more to come on this in a future blog post). In addition, the government can add new exceptions to the list by regulation.

The reference in the definition of ‘de-identify’ only to direct identification is meant to be read alongside s. 74 of Bill C-27, which provides:

74 An organization that de-identifies personal information must ensure that any technical and administrative measures applied to the information are proportionate to the purpose for which the information is de-identified and the sensitivity of the personal information.

Section 74 remains unchanged from Bill C-11, where it made more sense, since it defined de-identification in terms of direct or indirect identifiers using a relative standard. In the context of the new definition of ‘de-identify’, it is jarring, since de-identification according to the new definition requires only the removal of direct identifiers. What this, perhaps, means is that although the definition of de-identify only requires removal of direct identifiers, actual de-identification might mean something else. This is not how definitions are supposed to work.

In adopting these new definitions, the federal government sought to align its terminology with that used in Quebec’s Loi 25 that reformed its public and private sector data protection laws. The Quebec law provides, in a new s. 23, that:

[. . .]

For the purposes of this Act, information concerning a natural person is anonymized if it is, at all times, reasonably foreseeable in the circumstances that it irreversibly no longer allows the person to be identified directly or indirectly.

Information anonymized under this Act must be anonymized according to generally accepted best practices and according to the criteria and terms determined by regulation.

Loi 25 also provides that data is de-identified (as opposed to anonymized) “if it no longer allows the person concerned to be directly identified”. At first glance, it seems that Bill C-27 has adopted similar definitions – but there are differences. First, the definition of anonymization in Loi 25 uses a relative standard (not an absolute one as in C-27). It also makes specific reference not just to generally accepted best practices, but to criteria and terms to be set out in regulation, whereas in setting standards for anonymization, C-27 refers only to “generally accepted best practices”. [Note that in its recommendations following its hearings into the use of de-identified private sector mobility data by the Public Health Agency of Canada, the ETHI Committee of Parliament recommended that federal data protection laws should include “a standard for de-identification of data or the ability for the Privacy Commissioner to certify a code of practice in this regard.”]

Second, and most importantly, in the Quebec law, anonymized data does not fall outside the scope of the legislation –instead, a relative standard is used to provide some flexibility while still protecting privacy. Anonymized data are still subject to governance under the law, even though the scope of that governance is limited. Further, under the Quebec law, recognizing that the definition of de-identification is closer to pseudonymization, the uses of de-identified data are more restricted than they are in Bill C-27.

Further, in an eye-glazing bit of drafting, s. 2(3) of Bill C-27 provides:

2(3) For the purposes of this Act, other than sections 20 and 21, subsections 22(1) and 39(1), sections 55 and 56, subsection 63(1) and sections 71, 72, 74, 75 and 116, personal information that has been de-identified is considered to be personal information.

This is a way of saying that de-identified personal information remains within the scope of the Act except where it does not. Yet, data that has only direct identifiers stripped from it should always be considered personal information, since the reidentification risk, as noted above, could be very high. What s. 2(3) does is allow de-identified data to be treated as anonymized (out of scope) in some circumstances. For example, s. 21 allows organizations to use ‘de-identified’ personal information for internal research purposes without knowledge or consent. The reference in s. 2(3) amplifies this by providing that such information is not considered personal information. As a result, presumably, other provisions in Bill C-27 would not apply. This might include data breach notification requirements – yet if information is only pseudonymized and there is a breach, it is not clear why such provisions should not apply. Pseudonymization might provide some protection to those affected by a breach, although it is also possible that the key was part of the breach, or that individuals remain re-identifiable in the data. The regulator should have jurisdiction. Subsection 22(1) allows for the use and even the disclosure of de-identified personal information between parties to a prospective business transaction. In this context, the de-identified information is not considered personal information (according to s. 2(3)) and so the only safeguards are those set out in s. 22(1) itself. Bizarrely, s. 22(1) makes reference to the sensitivity of the information – requiring safeguards appropriate to its sensitivity, even though it is apparently not considered personal information. De-identified (not anonymized) personal information can also be shared without knowledge or consent for socially beneficial purposes under s. 39(1). (I have a blog post coming on this provision, so I will say no more about it here, other than to note that given the definition of ‘de-identify’, such sharing seems rash and the safeguards provided are inadequate). Section 55 provides for a right of erasure of personal information; since information stripped of direct identifiers is not personal information for the purposes of section 55 (according to s. 2(3)), this constitutes an important limitation on the right of erasure. If data are only pseudonymized, and if the organization retains the key, then why is there no right of erasure? Section 56 addresses the accuracy of personal information. Personal information de-identified according to the definition in C-27 would also be exempted from this requirement.

In adopting the definitions of ‘anonymize’ and ‘de-identify’, the federal government meets a number of public policy objectives. It enhances the ability of organizations to make use of data. It also better aligns the federal law with Quebec’s law (at least at the definitional level). The definitions may also create scope for other privacy protective technologies such as pseudonymization (which is what the definition of de-identify in C-27 probably really refers to) or different types of encryption. But the approach it has adopted creates the potential for confusion, for risks to privacy, and for swathes of human-derived data to fall ‘outside the scope’ of data protection law. The government view may be that, once you stir all of Bill C-27’s provisions into the pot, and add a healthy dose of “trust us”, the definition of “de-identify” and its exceptions are not as problematic as they are at first glance. Yet, this seems like a peculiar way to draft legislation. The definition should say what it is supposed to say, rather than have its defects mitigated by a smattering of other provisions in the law and faith in the goodness of others and the exceptions still lean towards facilitating data use rather than protecting privacy.

In a nutshell, C-27 has downgraded the definition of de-identification from C-11. It has completely excluded from the scope of the Act anonymized data, but has provided little or no guidance beyond “generally accepted best practices” to address anonymization. If an organization claims that their data are anonymized and therefore outside of the scope of the legislation, it will be an uphill battle to get past the threshold issue of anonymization in order to have a complaint considered under what would be the new law. The organization can simply dig in and challenge the jurisdiction of the Commissioner to investigate the complaint.

All personal data, whether anonymized or ‘de-identified’ should remain within the scope of the legislation. Specific exceptions can be provided where necessary. Exceptions in the legislation for the uses of de-identified information without knowledge or consent must be carefully constrained and reinforced with safeguards. Further, the regulator should play a role in establishing standards for anonymization and de-identification. This may involve consultation and collaboration with standards-setting bodies, but references in the legislation must be to more than just “generally accepted best practices”.

Read 22604 times

Published in Privacy

Tagged under

Social sharing