Teresa Scassa - Blog

This is the second in a series of posts on Bill C-27’s proposed Artificial Intelligence and Data Act (AIDA). The first post looked at the scope of application of the AIDA. This post considers what activities and what data will be subject to governance.

Bill C-27’s proposed Artificial Intelligence and Data Act (AIDA) governs two categories of “regulated activity” so long as they are carried out “in the course of international or interprovincial trade and commerce”. These are set out in s. 5(1):

(a) processing or making available for use any data relating to human activities for the purpose of designing, developing or using an artificial intelligence system;

(b) designing, developing or making available for use an artificial intelligence system or managing its operations.

These activities are cast in broad terms, capturing activities related both to the general curating of the data that fuel AI, and the design, development, distribution and management of AI systems. The obligations in the statute do not apply universally to all engaged in the AI industry. Instead, different obligations apply to those performing different roles. The chart below identifies the actor in the left-hand column, and the obligation the column on the right.

 

Actor

Obligation

A person who carries out any regulated activity and who processes or makes available for use anonymized data in the course of that activity

(see definition of “regulated activity” in s. 5(1)

s. 6 (data anonymization, use and management)

s. 10 (record keeping regarding measures taken under s. 6)

A person who is responsible for an artificial intelligence system (see definition of ‘person responsible’ in s. 5(2)

s. 7 (assess whether a system is high impact)

s. 10 (record keeping regarding reasons supporting their assessment of whether the system is high-impact under s. 7)

A person who is responsible for a high-impact system (see definition of ‘person responsible’ in s. 5(2; definition of “high-impact” system, s. 5(1))

s. 8 (measures to identify, assess and mitigate risk of harm or biased output)

s. 9 (measures to monitor compliance with the mitigation measures established under s. 8 and the effectiveness of the measures

s. 10 (record keeping regarding measures taken under ss. 8 and 9)

s. 12 (obligation to notify the Minister as soon as feasible if the use of the system results or is likely to result in material harm)

A person who makes available for use a high-impact system

s. 11(1) (publish a plain language description of the system and other required information)

A person who manages the operation of a high-impact system

s. 11(2) (publish a plain language description of how the system is used and other required information)

 

For most of these provisions, the details of what is actually required by the identified actor will depend upon regulations that have yet to be drafted.

A “person responsible” for an AI system is defined in s. 5(2) of the AIDA in these terms:

5(2) For the purposes of this Part, a person is responsible for an artificial intelligence system, including a high-impact system, if, in the course of international or interprovincial trade and commerce, they design, develop or make available for use the artificial intelligence system or manage its operation.

Thus, the obligations in ss. 7, 8, 9, 10 and 11, apply only to those engaged in the activities described in s. 5(1)(b) (designing, developing or making available an AI system or managing its operation). Further, it is important to note that with the exception of sections 6 and 7, the obligations in the AIDA also apply only to ‘high impact’ systems. The definition of a high-impact system has been left to regulations and is as yet unknown.

Section 6 stands out somewhat as a distinct obligation relating to the governance of data used in AI systems. It applies to a person who carries out a regulated activity and who “processes or makes available for use anonymized data in the course of that activity”. Of course, the first part of the definition of a regulated activity includes someone who processes or makes available for use “any data relating to human activities for the purpose of designing, developing or using” an AI system. So, this obligation will apply to anyone “who processes or makes available for use anonymized data” (s. 6) in the course of “processing or making available for use any data relating to human activities for the purpose of designing, developing or using an artificial intelligence system” (s. 5(1)). Basically, then for s. 6 to apply, the anonymized data must be processed for the purposes of development of an AI system. All of this must also be in the course if international or interprovincial trade and commerce.

Note that the first of these two purposes involves data “related to human activities” that are used in AI. This is interesting. The new Consumer Privacy Protection Act (CPPA) that forms the first part of Bill C-27 will regulate the collection, use and disclosure of personal data in the course of commercial activity. However, it provides, in s. 6(5), that: “For greater certainty, this Act does not apply in respect of personal information that has been anonymized.” By using the phrase “data relating to human activities” instead of “personal data”, s. 5(1) of the AIDA clearly addresses human-derived data that fall outside the definition of personal information in the CPPA because of anonymization.

Superficially, at least, s. 6 of the AIDA appears to pick up the governance slack that arises where anonymized data are excluded from the scope of the CPPA. [See my post on this here]. However, for this to happen, the data have to be used in relation to an “AI system”, as defined in the legislation. Not all anonymized data will be used in this way, and much will depend on how the definition of an AI system is interpreted. Beyond that, the AIDA only applies to a ‘regulated activity’ which is one carried out in the course of international and inter-provincial trade and commerce. It does not apply outside the trade and commerce context, nor does it apply to any excluded actors [as discussed in my previous post here]. As a result, there remain clear gaps in the governance of anonymized data. Some of those gaps might (eventually) be filled by provincial governments, and by the federal government with respect to public-sector data usage. Other gaps – e.g., with respect to anonymized data used for purposes other than AI in the private sector context – will remain. Further, governance and oversight under the proposed CPPA will be by the Privacy Commissioner of Canada, an independent agent of Parliament. Governance under the AIDA (as will be discussed in a forthcoming post) is by the Minister of Industry and his staff, who are also responsible for supporting the AI industry in Canada. Basically, the treatment of anonymized data between the CPPA and the AIDA creates a significant governance gap in terms of scope, substance and process.

On the issue of definitions, it is worth making a small side-trip into ‘personal information’. The definition of ‘personal information’ in the AIDA provides that the term “has the meaning assigned by subsections 2(1) and (3) of the Consumer Privacy Protection Act.” Section 2(1) is pretty straightforward – it defines “personal information” as “information about an identifiable individual”. However, s. 2(3) is more complicated. It provides:

2(3) For the purposes of this Act, other than sections 20 and 21, subsections 22(1) and 39(1), sections 55 and 56, subsection 63(1) and sections 71, 72, 74, 75 and 116, personal information that has been de-identified is considered to be personal information.

The default rule for ‘de-identified’ personal information is that it is still personal information. However, the CPPA distinguishes between ‘de-identified’ (pseudonymized) data and anonymized data. Nevertheless, for certain purposes under the CPPA – set out in s. 2(3) – de-identified personal information is not personal information. This excruciatingly-worded limit on the meaning of ‘personal information’ is ported into the AIDA, even though the statutory provisions referenced in s. 2(3) are neither part of AIDA nor particularly relevant to it. Since the legislator is presumed not to be daft, then this must mean that some of these circumstances are relevant to the AIDA. It is just not clear how. The term “personal information” is used most significantly in the AIDA in the s. 38 offense of possessing or making use of illegally obtained personal information. It is hard to see why it would be relevant to add the CPPA s. 2(3) limit on the meaning of ‘personal information’ to this offence. If de-identified (not anonymized) personal data (from which individuals can be re-identified) are illegally obtained and then used in AI, it is hard to see why that should not also be captured by the offence.

 

Published in Privacy

This is the second post in a series on Bill C-27, a bill introduced in Parliament in June 2022 to reform Canada's private sector data protection law. The first post, on consent provisions, is found here.

In a data-driven economy, data protection laws are essential to protect privacy. In Canada, the proposed Consumer Privacy Protection Act in Bill C-27 will, if passed, replace the aging Personal Information Protection and Electronic Documents Act (PIPEDA) to govern the collection, use and disclosure of personal information by private sector organizations. Personal information is defined in Bill C-27 (as it was in PIPEDA) as “information about an identifiable individual”. The concept of identifiability of individuals from information has always been an important threshold issue for the application of the law. According to established case law, if an individual can be identified directly or indirectly from data, alone or in combination with other available data, then those data are personal information. Direct identification comes from the presence of unique identifiers that point to specific individuals (for example, a name or a social insurance number). Indirect identifiers are data that, if combined with other available data, can lead to the identification of individuals. To give a simple example, a postal code on its own is not a direct identifier of any particular individual, but in a data set with other data elements such as age and gender, a postal code can lead to the identification of a specific individual. In the context of that larger data set, the postal code can constitute personal information.

As the desire to access and use more data has grown in the private (and public) sector, the concepts of de-identification and anonymization have become increasingly important in dealing with personal data that have already been collected by organizations. The removal of both direct and indirect identifiers from personal data can protect privacy in significant ways. PIPEDA did not define ‘de-identify’, nor did it create particular rules around the use or disclosure of de-identified information. Bill C-11, the predecessor to C-27, addressed de-identified personal information, and contained the following definition:

de-identify means to modify personal information — or create information from personal information — by using technical processes to ensure that the information does not identify an individual or could not be used in reasonably foreseeable circumstances, alone or in combination with other information, to identify an individual

This definition was quite inclusive (information created from personal information, for example, would include synthetic data). Bill C-11 set a relative standard for de-identification – in other words, it accepted that de-identification was sufficient if the information could not be used to identify individuals “in reasonably foreseeable circumstances”. This was reinforced by s. 74 which required organizations that de-identified personal information to use measures that were proportionate to the sensitivity of the information and the way in which the information was to be used. De-identification did not have to be perfect – but it had to be sufficient for the context.

Bill C-11’s definition of de-identification was criticized by private sector organizations that wanted de-identified data to fall outside the scope of the Act. In other words, they sought either an exemption from the application of the law for de-identified personal information, or a separate category of “anonymized” data that would be exempt from the law. According to this view, if data cannot be linked to an identifiable individual, then they are not personal data and should not be subject to data protection law. For their part, privacy advocates were concerned about the very real re-identification risks, particularly in a context in which there is a near endless supply of data and vast computing power through which re-identification can take place. These concerns are supported by research (see also here and here). The former federal Privacy Commissioner recommended that it be made explicit that the legislation would apply to de-identified data.

The changes in Bill C-27 reflect the power of the industry lobby on this issue. Bill C-27 creates separate definitions for anonymized and de-identified data. These are:

anonymize means to irreversibly and permanently modify personal information, in accordance with generally accepted best practices, to ensure that no individual can be identified from the information, whether directly or indirectly, by any means.

[. . .]

de-identify means to modify personal information so that an individual cannot be directly identified from it, though a risk of the individual being identified remains. [my emphasis]

Organizations will therefore be pleased that there is now a separate category of “anonymized” data, although such data must be irreversibly and permanently modified to ensure that individuals are not identifiable. This is harder than it sounds; there is, even with synthetic data, for example, still some minimal risk of reidentification. An important concern, therefore, is whether the government is actually serious about this absolute standard, whether it will water it down by amendment before the bill is enacted, or whether it will let interpretation and argument around ‘generally accepted best practices’ soften it up. To ensure the integrity of this provision, the law should enable the Privacy Commissioner to play a clear role in determining what counts as anonymization.

Significantly, under Bill C-27, information that is ‘anonymized’ would be out of scope of the statute. This is made clear in a new s. 6(5) which provides that “this Act does not apply in respect of personal information that has been anonymized”. The argument to support this is that placing data that are truly anonymized out of scope of the legislation creates an incentive for industry to anonymize data, and anonymization (if irreversible and permanent) is highly privacy protective. Of course, similar incentives can be present if more tailored exceptions are created for anonymized data without it falling ‘out of scope’ of the law.

Emerging and evolving concepts of collective privacy take the view that there should be appropriate governance of the use of human-derived data, even if it has been anonymized. Another argument for keeping anonymized data in scope relates to the importance of oversight, given re-identification risks. Placing anonymized data outside the scope of data protection law is contrary to the recent recommendations of the ETHI Standing Committee of the House of Commons following its hearings into the use of de-identified private sector mobility data by the Public Health Agency of Canada. ETHI recommended that the federal laws be amended “to render these laws applicable to the collection, use, and disclosure of de-identified and aggregated data”. Aggregated data is generally considered to be data that has been anonymized. The trust issues referenced by ETHI when it comes to the use of de-identified data reinforce the growing importance of notions of collective privacy. It might therefore make sense to keep anonymized data within scope of the legislation (with appropriate exceptions to maintain incentives for anonymization) leaving room for governance of anonymization.

Bill C-27 also introduces a new definition of “de-identify”, which refers to modifying data so that individuals cannot be directly identified. Direct identification has come to mean identification through specific identifiers such as names, or assigned numbers. The new definition of ‘de-identify’ in C-27 suggests that simply removing direct identifiers will suffice to de-identify personal data (a form of what, in the GDPR, is referred to as pseudonymization). Thus, according to this definition, as long as direct identifiers are removed from a data set, an organization can use data without knowledge or consent in certain circumstances, even though specific individuals might still be identifiable from those data. While it will be argued that these circumstances are limited, the exception for sharing for ‘socially beneficial purposes’ is disturbingly broad given this weak definition (more to come on this in a future blog post). In addition, the government can add new exceptions to the list by regulation.

The reference in the definition of ‘de-identify’ only to direct identification is meant to be read alongside s. 74 of Bill C-27, which provides:

74 An organization that de-identifies personal information must ensure that any technical and administrative measures applied to the information are proportionate to the purpose for which the information is de-identified and the sensitivity of the personal information.

Section 74 remains unchanged from Bill C-11, where it made more sense, since it defined de-identification in terms of direct or indirect identifiers using a relative standard. In the context of the new definition of ‘de-identify’, it is jarring, since de-identification according to the new definition requires only the removal of direct identifiers. What this, perhaps, means is that although the definition of de-identify only requires removal of direct identifiers, actual de-identification might mean something else. This is not how definitions are supposed to work.

In adopting these new definitions, the federal government sought to align its terminology with that used in Quebec’s Loi 25 that reformed its public and private sector data protection laws. The Quebec law provides, in a new s. 23, that:

[. . .]

For the purposes of this Act, information concerning a natural person is anonymized if it is, at all times, reasonably foreseeable in the circumstances that it irreversibly no longer allows the person to be identified directly or indirectly.

Information anonymized under this Act must be anonymized according to generally accepted best practices and according to the criteria and terms determined by regulation.

Loi 25 also provides that data is de-identified (as opposed to anonymized) “if it no longer allows the person concerned to be directly identified”. At first glance, it seems that Bill C-27 has adopted similar definitions – but there are differences. First, the definition of anonymization in Loi 25 uses a relative standard (not an absolute one as in C-27). It also makes specific reference not just to generally accepted best practices, but to criteria and terms to be set out in regulation, whereas in setting standards for anonymization, C-27 refers only to “generally accepted best practices”. [Note that in its recommendations following its hearings into the use of de-identified private sector mobility data by the Public Health Agency of Canada, the ETHI Committee of Parliament recommended that federal data protection laws should include “a standard for de-identification of data or the ability for the Privacy Commissioner to certify a code of practice in this regard.”]

Second, and most importantly, in the Quebec law, anonymized data does not fall outside the scope of the legislation –instead, a relative standard is used to provide some flexibility while still protecting privacy. Anonymized data are still subject to governance under the law, even though the scope of that governance is limited. Further, under the Quebec law, recognizing that the definition of de-identification is closer to pseudonymization, the uses of de-identified data are more restricted than they are in Bill C-27.

Further, in an eye-glazing bit of drafting, s. 2(3) of Bill C-27 provides:

2(3) For the purposes of this Act, other than sections 20 and 21, subsections 22(1) and 39(1), sections 55 and 56, subsection 63(1) and sections 71, 72, 74, 75 and 116, personal information that has been de-identified is considered to be personal information.

This is a way of saying that de-identified personal information remains within the scope of the Act except where it does not. Yet, data that has only direct identifiers stripped from it should always be considered personal information, since the reidentification risk, as noted above, could be very high. What s. 2(3) does is allow de-identified data to be treated as anonymized (out of scope) in some circumstances. For example, s. 21 allows organizations to use ‘de-identified’ personal information for internal research purposes without knowledge or consent. The reference in s. 2(3) amplifies this by providing that such information is not considered personal information. As a result, presumably, other provisions in Bill C-27 would not apply. This might include data breach notification requirements – yet if information is only pseudonymized and there is a breach, it is not clear why such provisions should not apply. Pseudonymization might provide some protection to those affected by a breach, although it is also possible that the key was part of the breach, or that individuals remain re-identifiable in the data. The regulator should have jurisdiction. Subsection 22(1) allows for the use and even the disclosure of de-identified personal information between parties to a prospective business transaction. In this context, the de-identified information is not considered personal information (according to s. 2(3)) and so the only safeguards are those set out in s. 22(1) itself. Bizarrely, s. 22(1) makes reference to the sensitivity of the information – requiring safeguards appropriate to its sensitivity, even though it is apparently not considered personal information. De-identified (not anonymized) personal information can also be shared without knowledge or consent for socially beneficial purposes under s. 39(1). (I have a blog post coming on this provision, so I will say no more about it here, other than to note that given the definition of ‘de-identify’, such sharing seems rash and the safeguards provided are inadequate). Section 55 provides for a right of erasure of personal information; since information stripped of direct identifiers is not personal information for the purposes of section 55 (according to s. 2(3)), this constitutes an important limitation on the right of erasure. If data are only pseudonymized, and if the organization retains the key, then why is there no right of erasure? Section 56 addresses the accuracy of personal information. Personal information de-identified according to the definition in C-27 would also be exempted from this requirement.

In adopting the definitions of ‘anonymize’ and ‘de-identify’, the federal government meets a number of public policy objectives. It enhances the ability of organizations to make use of data. It also better aligns the federal law with Quebec’s law (at least at the definitional level). The definitions may also create scope for other privacy protective technologies such as pseudonymization (which is what the definition of de-identify in C-27 probably really refers to) or different types of encryption. But the approach it has adopted creates the potential for confusion, for risks to privacy, and for swathes of human-derived data to fall ‘outside the scope’ of data protection law. The government view may be that, once you stir all of Bill C-27’s provisions into the pot, and add a healthy dose of “trust us”, the definition of “de-identify” and its exceptions are not as problematic as they are at first glance. Yet, this seems like a peculiar way to draft legislation. The definition should say what it is supposed to say, rather than have its defects mitigated by a smattering of other provisions in the law and faith in the goodness of others and the exceptions still lean towards facilitating data use rather than protecting privacy.

In a nutshell, C-27 has downgraded the definition of de-identification from C-11. It has completely excluded from the scope of the Act anonymized data, but has provided little or no guidance beyond “generally accepted best practices” to address anonymization. If an organization claims that their data are anonymized and therefore outside of the scope of the legislation, it will be an uphill battle to get past the threshold issue of anonymization in order to have a complaint considered under what would be the new law. The organization can simply dig in and challenge the jurisdiction of the Commissioner to investigate the complaint.

All personal data, whether anonymized or ‘de-identified’ should remain within the scope of the legislation. Specific exceptions can be provided where necessary. Exceptions in the legislation for the uses of de-identified information without knowledge or consent must be carefully constrained and reinforced with safeguards. Further, the regulator should play a role in establishing standards for anonymization and de-identification. This may involve consultation and collaboration with standards-setting bodies, but references in the legislation must be to more than just “generally accepted best practices”.

Published in Privacy

Canadian Trademark Law

Published in 2015 by Lexis Nexis

Canadian Trademark Law 2d Edition

Buy on LexisNexis

Electronic Commerce and Internet Law in Canada, 2nd Edition

Published in 2012 by CCH Canadian Ltd.

Electronic Commerce and Internet Law in Canada

Buy on CCH Canadian

Intellectual Property for the 21st Century

Intellectual Property Law for the 21st Century:

Interdisciplinary Approaches

Purchase from Irwin Law