Teresa Scassa - Blog

A Global News story about Statistics Canada’s collection of detailed financial data of a half million Canadians has understandably raised concerns about privacy and data security. It also raises interesting questions about how governments can or should meet their obligations to produce quality national statistics in an age of big data.

According to Andrew Russell’s follow-up story, Stats Canada plans to collect detailed customer information from Canada’s nine largest banks. The information sought includes financial information including account balances, transaction data, credit card and bill payments. It is unclear whether the collection has started.

As a national statistical agency, Statistics Canada is charged with the task of collecting and producing data that “ensures Canadians have the key information on Canada's economy, society and environment that they require to function effectively as citizens and decision makers.” Canadians are perhaps most familiar with providing census data to Statistics Canada, including more detailed data through the long form census. However, the agency’s data collection is not limited to the census.

Statistics Canada’s role is important, and the agency has considerable expertise in carrying out its mission and in protecting privacy in the data it collects. This is not to say, however, that Statistics Canada never makes mistakes and never experiences privacy breaches. One of the concerns, therefore, with this large-scale collection of frankly sensitive data is the increased risk of privacy breaches.

The controversial collection of detailed financial data finds its legislative basis in this provision of the Statistics Act:

13 A person having the custody or charge of any documents or records that are maintained in any department or in any municipal office, corporation, business or organization, from which information sought in respect of the objects of this Act can be obtained or that would aid in the completion or correction of that information, shall grant access thereto for those purposes to a person authorized by the Chief Statistician to obtain that information or aid in the completion or correction of that information. [My emphasis]

Essentially, it conveys enormous power on Stats Canada to request “documents or records” from third parties. Non-compliance with a request is an offence under s. 32 of the Act, which carries a penalty on conviction of a fine of up to $1000. A 2017 amendment to the legislation removed the possibility of imprisonment for this offence.

In case you were wondering whether Canada’s private sector data protection legislation offers any protection when it comes to companies sharing customer data with Statistics Canada, rest assured that it does not. Paragraph 7(3)(c.1) of the Personal Information Protection and Electronic Documents Act provides that an organization may disclose personal information without the knowledge or consent of an individual where the disclosure is:

(c.1) made to a government institution or part of a government institution that has made a request for the information, identified its lawful authority to obtain the information and indicated that

[. . .]

(iii) the disclosure is requested for the purpose of administering any law of Canada or a province

According to the Global News story, Statistics Canada notified the Office of the Privacy Commissioner about its data collection plan and obtained the Commissioner’s advice. In his recent Annual Report to Parliament the Commissioner reported on Statistic’s Canada’s growing practice of seeking private sector data:

We have consulted with Statistics Canada (StatCan) on a number of occasions over the past several years to discuss the privacy implications of its collection of administrative data – such as individuals’ mobile phone records, credit bureau reports, electricity bills, and so on. We spoke with the agency about this again in the past year, after a number of companies contacted us with concerns about StatCan requests for customer data.

The Commissioner suggested that Stats Canada might consider the collection of only data that has been deidentified at source rather than detailed personal information. He also recommended an ongoing assessment of the necessity and effectiveness of such programs.

The Commissioner also indicated that one of the problems with the controversial data collection by Statistics Canada is its lack of openness. He stated: “many Canadians might be surprised to learn the government is collecting their information in this way and for this purpose.” While part of this lack of transparency lies in the decision not to be more upfront about the data collection, part of it lies in the fact that the legislation itself – while capable of being read to permit this type of collection – clearly does not expressly contemplate it. Section 13 was drafted in a pre-digital, pre-big data era. It speaks of “documents or records”, and not “data”. While it is possible to interpret it so as to include massive quantities of data, the original drafters no doubt contemplated a collection activity on a much more modest scale. If Section 13 really does include the power to ask any organization to share its data with Stats Canada, then it has become potentially limitless in scope. At the time it was drafted, the limits were inherent in the analogue environment. There was only so much paper Stats Canada could ask for, and only so much paper it had the staff to process. In addition, there was only so much data that entities and organizations collected because they experienced the same limitations. The digital era means not only that there is a vast and increasing amount of detailed data collected by private sector organizations, but that this data can be transferred in large volumes with relative ease, and can be processed and analyzed with equal facility.

Statistics Canada is not the only national statistics organization to be using big data to supplement and enhance its data collection and generation. In some countries where statistical agencies struggle with a lack of human resources and funding, big data from the private sector offer opportunities to meet the data needs of their governments and economies. Statistical agencies everywhere recognize the potential of big data to produce more detailed, fine-grained and reliable data about many aspects of the economy. For example, the United Nations maintains a big data project inventory that catalogues experiments by national statistical agencies around the world with big data analytics. Remember the cancellation of the long form census by the Harper government? This was not a measure to protect Canadians’ privacy by collecting less information; it was motivated by a belief that better and more detailed data could be sought using other means – including reliance on private sector data.

It may well be that Statistics Canada needs the power to collect digital data to assist in data collection programs that serve national interests. However, the legislation that authorizes such collection must be up-to-date with our digital realities. Transparency requires an amendment to the legislation that would specifically enable the collection and use of digital and big data from the private sector tor statistical purposes. Debate over the scope and wording of such a provision would give both the public and the potential third party data sources an opportunity to identify their concerns. It would also permit the shaping of limits and conditions that are specific to the nature and risks of this form of data collection.

Published in Privacy

The Supreme Court of Canada has issued its unanimous decision in The Queen v. Philip Morris International Inc. This appeal arose out of an ongoing lawsuit brought by the province of British Columbia against tobacco companies to recover the health care costs associated with tobacco-related illnesses in the province. Similar suits brought by other provincial governments are at different stages across the country. In most cases, the litigation is brought under provincial legislation passed specifically to enable and to structure this recourse.

The central issue in this case concerned the degree of access to be provided to Philip Morris International (PMI)to the databases relied upon by the province to calculate tobacco-related health care costs. PMI wanted access to the databases in order to develop its own experts’ opinions on the nature and extent of these costs, and to challenge the opinions to be provided by provincial experts who would have full access to the databases. Although the databases contained aggregate, de-identified data, the government refused access, citing the privacy interests of British Columbians in their health care data. As a compromise, they offered limited and supervised access to the databases at Statistics Canada Data Centre. Although the other tobacco company defendants accepted this compromise, PMI did not, and sought a court order granting it full access. The court at first instance and later the Court of Appeal for British Columbia sided with PMI and ordered that access be provided. The SCC overturned this order.

This case had been watched with interest by many because of the broader issues onto which it might have shed some light. On one view, the case raised issues about how to achieve fairness in litigation where one party relies on its own vast stores of data – which might include confidential commercial data – and the other party seeks to test the validity or appropriateness of analytics based on this data. What level of access, if any, should be granted, and under what conditions? Another issue of broader interest was, where potentially re-identifiable personal information is sought, what measures are appropriate to protect privacy, including the deemed undertaking rule. Others were interested in knowing what parameters the court might set for assessing the re-identification risk where anonymized data are disclosed.

Those who hoped for broader take-aways for big data, data analytics and privacy, are bound to be disappointed in the decision. In deciding in favour of the BC government, the Supreme Court largely confined its decision to an interpretation of the specific language of the Tobacco Damages and Health Care Costs Recovery Act. The statute offered the government two ways to proceed against tobacco companies – it could seek damages related to the healthcare costs of specific individuals, in which case the health records of those individuals would be subject to discovery, or it could proceed in a manner that considered only aggregate health care data. The BC government chose the latter route. Section 2(5) set out the rules regarding discovery in an aggregate action. The focus of the Supreme Court’s interpretation was s. 2(5)(b) of the Act which reads:

2(5)(b) the health care records and documents of particular individual insured persons or the documents relating to the provision of health care benefits for particular individual insured persons are not compellable except as provided under a rule of law, practice or procedure that requires the production of documents relied on by an expert witness [My emphasis]

While it was generally accepted that this meant that the tobacco companies could not have access to individual health care records, PMI argued that the aggregate data was not a document “relating to the provision of health care benefits for particular individual insured persons”, and therefore its production could be compelled.

The Supreme Court disagreed. Writing for the unanimous court, Justice Brown defined both “records” and “documents” as “means of storing information” (at para 22). He therefore found that the relevant databases “are both “records” and “documents” within the meaning of the Act.” (at para 22) He stated:

Each database is a collection of health care information derived from original records or documents which relate to particular individual insured persons. That information is stored in the databases by being sorted into rows (each of which pertains to a particular individual) and columns (each of which contains information about the field or characteristic that is being recorded, such as the type of medical service provided). (at para 22)

He also observed that many of the fields in the database were filled with data from individual patient records, making the databases “at least in part, collections of health care information taken from individuals’ clinical records and stored in an aggregate form alongside the same information drawn from the records of others.” (at para 23) As a result, the majority found that the databases qualified under the legislation as “documents relating to the provision of health care benefits for particular individual insured persons”, whether or not those individuals were identified within the database.

Perhaps the most interesting passage in the Court’s decision is the following:

The mere alteration of the method by which that health care information is stored — that is, by compiling it from individual clinical records into aggregate databases — does not change the nature of the information itself. Even in an aggregate form, the databases, to the extent that they contain information drawn from individuals’ clinical records, remain “health care records and documents of particular individual insured persons”. (at para 24)

A reader eager to draw lessons for use in other contexts might be see the Court to be saying that aggregate data derived from personal data are still personal data. This would certainly be important in the context of current debates about whether the deidentification of personal information removes it from the scope of private sector data protection laws such as the Personal Information Protection and Electronic Documents Act. But it would be a mistake to read that much into this decision. The latter part of the quoted passage grounds the Court’s conclusion on this point firmly in the language of the BC tobacco legislation. Later the Court specifically rejects the idea that a “particular” individual under the BC statute is the same as an “identifiable individual”.

Because the case is decided on the basis of the interpretation of s. 2(5)(b), the Court neatly avoids a discussion of what degree of reidentification risk would turn aggregate or anonymized data into information about identifiable individuals. This topic is also of great interest in the big data context, particularly in relation to data protection law. And, although it might have been interesting to know whether any degree of reidentification risk could be sufficiently mitigated by the deemed undertaking rule so as to permit discovery remains unexplored territory, those looking for a discussion of the relationship between re-identification risk and the deemed undertaking rule will also have to wait for a different case.

Published in Privacy

On June 13, 2018 the Supreme Court of Canada handed down a decision that may have implications for how issues of bias in algorithmic decision-making in Canada will be dealt with. Ewert v. Canada is the result of an eighteen-year struggle by Mr. Ewert, a federal inmate and Métis man, to challenge the use of certain actuarial risk-assessment tools to make decisions about his carceral needs and about his risk of recidivism. His concerns, raised in his initial grievance in 2000, have been that these tools were “developed and tested on predominantly non-Indigenous populations and that there was no research confirming that they were valid when applied to Indigenous persons.” (at para 12) After his grievances went nowhere, he eventually sought a declaration in Federal Court that the tests breached his rights to equality and to due process under the Canadian Charter of Rights and Freedoms, and that they were also a breach of the Corrections and Conditional Release Act (CCRA), which requires the Correctional Service of Canada (CSC) to “take all reasonable steps to ensure that any information about an offender that it uses is as accurate, up to data and complete as possible.” (s. 24(1)). Although the Charter arguments were unsuccessful, the majority of the Supreme Court of Canada agreed with the trial judge that CSC had breached its obligations under the CCRA. Two justices in dissent agreed with the Federal Court of Appeal that neither the Charter nor the CCRA had been breached.

Although this is not explicitly a decision about ‘algorithmic decision-making’ as the term is used in the big data and artificial intelligence (AI) contexts, the basic elements are present. An assessment tool developed and tested using a significant volume of data is used to generate predictive data to aid in decision-making in individual cases. The case also highlights a common concern in the algorithmic decision-making context: that either the data used to develop and train the algorithm, or the assumptions coded into the algorithm, create biases that can lead to inaccurate predictions about individuals who fall outside the dominant group that has influenced the data and the assumptions.

As such, my analysis is not about the particular circumstances of Mr. Ewert, nor is it about the impact of the judgement within the correctional system in Canada. Instead, I parse the decision to see what it reveals about how courts might approach issues of bias in algorithmic decision-making, and what impact the decision may have in this emerging context.

1. ‘Information’ and ‘accuracy’

A central feature of the decision of the majority in Ewert is its interpretation of s. 24(1) of the CCRA. To repeat the wording of this section, it provides that “The Service shall take all reasonable steps to ensure that any information about an offender that it uses is as accurate, up to date and complete as possible.” [My emphasis] In order to conclude that this provision was breached, it was necessary for the majority to find that Mr. Ewert’s test results were “information” within the meaning of this section, and that the CCRA had not taken all reasonable steps to ensure its accuracy.

The dissenting justices took the view that when s. 24(1) referred to “information” and to the requirement to ensure its accuracy, the statute included only the kind of personal information collected from inmates, information about the offence committed, and a range of other information specified in s. 23 of the Act. The dissenting justices preferred the view of the CSC that “information” meant ““primary facts” and not “inferences or assessments drawn by the Service”” (at para 107). The majority disagreed. It found that when Parliament intended to refer to specific information in the CCRA it did so. When it used the term “information” in an unqualified way, as it did in s. 24(1), it had a much broader meaning. Thus, according to the majority, “the knowledge the CSC might derive from the impugned tools – for example, that an offender has a personality disorder or that there is a high risk than an offender will violently reoffend – is “information” about that offender” (at para 33). This interpretation of “information” is an important one. According to the majority, profiles and predictions applied to a person are “information” about that individual.

In this case, the Crown had argued that s. 24(1) should not apply to the predictive results of the assessment tools because it imposed an obligation to ensure that “information” is “as accurate” as possible. It argued that the term “accurate” was not appropriate to the predictive data generated by the tools. Rather, the tools “may have “different levels of predictive validity, in the sense that they predict poorly, moderately well or strongly””. (at para 43) The dissenting justices were clearly influenced by this argument, finding that: “a psychological test can be more or less valid or reliable, but it cannot properly be described as being “accurate” or “inaccurate”.” (at para 115) According to the dissent, all that was required was that accurate records of an inmate’s test scores must be maintained – not that the tests themselves must be accurate. The majority disagreed. In its view, the concept of accuracy could be adapted to different types of information. When applied to psychological assessment tools, “the CSC must take steps to ensure that it relies on test scores that predict risks strongly rather than those that do so poorly.” (at para 43)

It is worth noting that the Crown also argued that the assessment tools were important in decision-making because “the information derived from them is objective and thus mitigates against bias in subjective clinical assessments” (at para 41). While the underlying point is that the tools might produce more objective assessments than individual psychologists who might bring their own biases to an assessment process, the use of the term “objective” to describe the output is troubling. If the tools incorporate biases, or are not appropriately sensitive to cultural differences, then the output is ‘objective’ in only a very narrow sense of the word, and the use of the word masks underlying issues of bias. Interestingly, the majority took the view that if the tools are considered useful “because the information derived from them can be scientifically validated. . . this is all the more reason to conclude that s. 24(1) imposes an obligation on the CSC to take reasonable steps to ensure that the information is accurate.” (at para 41)

It should be noted that while this discussion all revolves around the particular wording of the CCRA, Principle 4.6 of Schedule I of the Personal Information Protection and Electronic Documents Act (PIPEDA) contains the obligation that: “Personal information shall be as accurate, complete, and up-to-date as is necessary for the purposes for which it is to be used.” Further, s. 6(2) of the Privacy Act provides that: “A government institution shall take all reasonable steps to ensure that personal information that is used for an administrative purpose by the institution is as accurate, up-to-date and complete as possible.A similar interpretation of “information” and “accuracy” in these statutes could be very helpful in addressing issues of bias in algorithmic decision-making more broadly.

2. Reasonable steps to ensure accuracy

According to the majority, “[t]he question is not whether the CSC relied on inaccurate information, but whether it took all reasonable steps to ensure that it did not.” (at para 47). This distinction is important – it means that Mr. Ewert did not have to show that his actual test scores were inaccurate, something that would be quite burdensome for him to do. According to the majority, “[s]howing that the CSC failed to take all reasonable steps in this respect may, as a practical matter, require showing that there was some reason for the CSC to doubt the accuracy of information in its possession about an offender.” (at para 47, my emphasis) The majority noted that the trial judge had found that “the CSC had long been aware of concerns regarding the possibility of psychological and actuarial tools exhibiting cultural bias.” (at para 49) The concerns had led to research being carried out in other jurisdictions about the validity of the tools when used to assess certain other cultural minority groups. The majority also noted that the CSC had carried out research “into the validity of certain actuarial tools other than the impugned tools when applied to Indigenous offenders” (at para 49) and that this research had led to those tools no longer being used. However, in this case, in spite of concerns, the CSC had taken no steps to assess the validity of the tools, and it continued to apply them to Indigenous offenders. The majority noted that the CCRA, which set out guiding principles in s. 4, specifically required correctional policies and practices to respect cultural, linguistic and other differences and to take into account “the special needs of women, aboriginal peoples, persons requiring mental health care and other groups” (s. 4(g)) The majority found that this principle “represents an acknowledgement of the systemic discrimination faced by Indigenous persons in the Canadian correctional system.” (at para 53) As a result, it found it incumbent on CSC to give “meaningful effect” to this principle “in performing all of its functions”. In particular, the majority found that “this provision requires the CSC to ensure that its practices, however neutral they may appear to be, do not discriminate against Indigenous persons.”(at para 54) The majority observed that although it has been 25 years since this principle was added to the legislation, “there is nothing to suggest that the situation has improved in the realm of corrections” (at para 60). It expressed dismay that “the gap between Indigenous and non-Indigenous offenders has continued to widen on nearly every indicator of correctional performance”. (at para 60) It noted that “Although many factors contributing to the broader issue of Indigenous over-incarceration and alienation from the criminal justice system are beyond the CSC’s control, there are many matters within its control that could mitigate these pressing societal problems. . . Taking reasonable steps to ensure that the CSC uses assessment tools that are free of cultural bias would be one.”(at para 61) [my emphasis]

According to the majority of the Court, therefore, what is required by s. 24(1) of the CCRA is for the CSC to carry out research into whether and to what extent the assessment tools it uses “are subject to cross-cultural variance when applied to Indigenous offenders.” (at para 67) Any further action would depend on the results of the research.

What is interesting here is that the onus is placed on the CSC (influenced by the guiding principles in the CCRA) to take positive steps to verify the validity of the assessment tools on which it relies. The Court does not specify who is meant to carry out the research in question, what standards it should meet, or how extensive it should be. These are important issues. It should be noted that discussions of algorithmic bias often consider solutions involving independent third-party assessment of the algorithms or the data used to develop them.

3. The Charter arguments

Two Charter arguments were raised by counsel for Mr. Ewert. The first was a s. 7 due process argument. Counsel for Mr. Ewert argued that reliance on the assessment tools violated his right to liberty and security of the person in a manner that was not in accordance with the principles of fundamental justice. The tools were argued to fall short of the principles of fundamental justice because of their arbitrariness (lacking any rational connection to the government objective) and overbreadth. The court was unanimous in finding that reliance on the tools was not arbitrary, stating that “The finding that there is uncertainty about the extent to which the tests are accurate when applied to Indigenous offenders is not sufficient to establish that there is no rational connection between reliance on the tests and the relevant government objective.” (at para 73) Without further research, the extent and impact of any cultural bias could not be known.

Mr. Ewert also argued that the results of the use of the tools infringed his right to equality under s. 15 of the Charter. The Court gave little time or attention to this argument, finding that there was not enough evidence to show that the tools had a disproportionate impact on Indigenous inmates when compared to non-Indigenous inmates.

The Charter is part of the Constitution and applies only to government action. There are many instances in which governments may come to rely upon algorithmic decision-making. While concerns might be raised about bias and discriminatory impacts from these processes, this case demonstrates the challenge faced by those who would raise such arguments. The decision in Ewert suggests that in order to establish discrimination, it will be necessary either to demonstrate discriminatory impacts or effects, or to show how the algorithm itself and/or the data used to develop it incorporate biases or discriminatory assumptions. Establishing any of these things will impose a significant evidentiary burden on the party raising the issue of discrimination. Even where the Charter does not apply and individuals must rely upon human rights legislation, establishing discrimination with complex (and likely inaccessible or non-transparent algorithms and data) will be highly burdensome.

Concluding thoughts

This case raises important and interesting issues that are relevant in algorithmic decision-making of all kinds. The result obtained in this case favoured Mr. Ewert, but it should be noted that it took him 18 years to achieve this result, and he required the assistance of a dedicated team of lawyers. There is clearly much work to do to ensure that fairness and transparency in algorithmic decision-making is accessible and realizable.

Mr. Ewert’s success was ultimately based, not upon human rights legislation or the Charter, but upon federal legislation which required the keeping of accurate information. As noted above, PIPEDA and the Privacy Act impose a similar requirement on organizations that collect, use or disclose personal information to ensure the accuracy of that information. Using the interpretive approach of the Supreme Court of Canada in Ewert v. Canada, this statutory language may provide a basis for supporting a broader right to fair and unbiased algorithmic decision-making. Yet, as this case also demonstrates, it may be challenging for those who feel they are adversely impacted to make their case, absent evidence of long-standing and widespread concerns about particular tests in specific contexts.

 

Published in Privacy

The recent scandal regarding the harvesting and use of the personal information of millions of Facebook users in order to direct content towards them aimed at influence their voting behavior raises some interesting questions about the robustness of our data protection frameworks. In this case, a UK-based professor collected personal information via an app, ostensibly for non-commercial research purposes. In doing so he was bound by terms of service with Facebook. The data collection was in the form of an online quiz. Participants were paid to answer a series of questions, and in this sense they consented to and were compensated for the collection of this personal information. However, their consent was to the use of this information only for non-commercial academic research. In addition, the app was able to harvest personal information from the Facebook friends of the study participants – something which took place without the knowledge or consent of those individuals. The professor later sold his app and his data to Cambridge Analytica, which used it to target individuals with propaganda aimed at influencing their vote in the 2016 US Presidential Election.

A first issue raised by this case is a tip-of-the-iceberg issue. Social media platforms – not just Facebook – collect significant amounts of very rich data about users. They have a number of strategies for commercializing these treasure troves of data, including providing access to the platform to app developers or providing APIs on a commercial basis that give access to streams of user data. Users typically consent to some secondary uses of their personal information under the platform’s terms of service (TOS). Social media platform companies also have TOS that set the terms and conditions under which developers or API users can obtain access to the platform and/or its data. What the Cambridge Analytica case reveals is what may (or may not) happen when a developer breaches these TOS.

Because developer TOS are a contract between the platform and the developer, a major problem is the lack of transparency and the grey areas around enforcement. I have written about this elsewhere in the context of another ugly case involving social media platform data – the Geofeedia scandal (see my short blog post here, full article here). In that case, a company under contract with Twitter and other platforms misused the data it contracted for by transforming it into data analytics for police services that allowed police to target protesters against police killings of African American men. This was a breach of contractual terms between Twitter and the developer. It came to public awareness only because of the work of a third party (in that case, the ACLU of California). In the case of Cambridge Analytica, the story also only came to light because of a whistleblower (albeit one who had been involved with the company’s activities). In either instance it is important to ask whether, absent third party disclosure, the situation would ever have come to light. Given that social media companies provide, on a commercial basis, access to vast amounts of personal information, it is important to ask what, if any, proactive measures they take to ensure that developers comply with their TOS. Does enforcement only take place when there is a public relations disaster? If so, what other unauthorized exploitations of personal information are occurring without our knowledge or awareness? And should platform companies that are sources of huge amounts of personal information be held to a higher standard of responsibility when it comes to their commercial dealing with this personal information?

Different countries have different data protection laws, so in this instance I will focus on Canadian law, to the extent that it applies. Indeed, the federal Privacy Commissioner has announced that he is looking into Facebook’s conduct in this case. Under the Personal Information Protection and Electronic Documents Act (PIPEDA), a company is responsible for the personal information it collects. If it shares those data with another company, it is responsible for ensuring proper limitations and safeguards are in place so that any use or disclosure is consistent with the originating company’s privacy policy. This is known as the accountability principle. Clearly, in this case, if the data of Canadians was involved, Facebook would have some responsibility under PIPEDA. What is less clear is how far this responsibility extents. Clause 4.1.3 of Schedule I to PIPEDA reads: “An organization is responsible for personal information in its possession or custody, including information that has been transferred to a third party for processing. The organization shall use contractual or other means to provide a comparable level of protection while the information is being processed by a third party.” [My emphasis]. One question, therefore, is whether it is enough for Facebook to simply have in place a contract that requires its developers to respect privacy laws, or whether Facebook’s responsibility goes further. Note that in this case Facebook appears to have directed Cambridge Analytica to destroy all improperly collected data. And it appears to have cut Cambridge Analytica off from further access to its data. Do these steps satisfy Facebook’s obligations under PIPEDA? It is not at all clear that PIPEDA places any responsibilities on organizations to actively supervise or monitor companies with which it has shared data under contract. It is fair to ask, therefore, whether in cases where social media platforms share huge volumes of personal data with developers, is the data-sharing framework in PIPEDA sufficient to protect the privacy interests of the public.

Another interesting question arising from the scandal is whether what took place amounts to a data breach. Facebook has claimed that it was not a data breach – from their perspective, this is a case of a developer that broke its contract with Facebook. It is easy to see why Facebook would want to characterize the incident in this way. Data breaches can bring down a whole other level of enforcement, and can also give rise to liability in class action law suits for failure to properly protect the information. In Canada, new data breach notification provisions (which have still not come into effect under PIPEDA) would impose notification requirements on an organization that experienced a breach. It is interesting to note, though, that he data breach notification requirements are triggered where there is a “real risk of significant harm to an individual” [my emphasis]. Given what has taken place in the Cambridge Analytical scandal, it is worth asking whether the drafters of this provision should have included a real risk of significant harm to the broader public. In this case, the personal information was used to subvert democratic processes, something that is a public rather than an individual harm.

The point about public harm is an important one. In both the Geofeedia and the Cambridge Analytica scandals, the exploitation of personal information was on such a scale and for such purposes that although individual privacy may have been compromised, the greater harms were to the public good. Our data protection model is based upon consent, and places the individual and his or her choices at its core. Increasingly, however, protecting privacy serves goals that go well beyond the interests of any one individual. Not only is the consent model broken in an era of ubiquitous and continuous collection of data, it is inadequate to address the harms that come from improper exploitation of personal information in our big data environment.

Published in Privacy

In October 2016, the data analytics company Geofeedia made headlines when the California chapter of the American Civil Liberties Union (ACLU) issued the results of a major study which sought to determine the extent to which police services in California were using social media data analytics. These analytics were based upon geo-referenced information posted by ordinary individuals to social media websites such as Twitter and Facebook. Information of this kind is treated as “public” in the United States because it is freely contributed by users to a public forum. Nevertheless, the use of social media data analytics by police raises important civil liberties and privacy questions. In some cases, users may not be aware that their tweets or posts contain additional meta data including geolocation information. In all cases, the power of data analytics permits rapid cross-referencing of data from multiple sources, permitting the construction of profiles that go well beyond the information contributed in single posts.

The extent to which social media data analytics are used by police services is difficult to assess because there is often inadequate transparency both about the actual use of such services and the purposes for which they are used. Through a laborious process of filing freedom of information requests the ACLU sought to find out which police services were contracting for social media data analytics. The results of their study showed widespread use. What they found in the case of Geofeedia went further. Although Geofeedia was not the only data analytics company to mine social media data and to market its services to government authorities, its representatives had engaged in email exchanges with police about their services. In these emails, company employees used two recent sets of protests against police as examples of the usefulness of social media data analytics. These protests were those that followed the death in police custody of Freddie Gray, a young African-American man who had been arrested in Baltimore, and the shooting death by police of Michael Brown, an eighteen-year-old African-American man in Ferguson, Missouri. By explicitly offering services that could be used to monitor those who protested police violence against African Americans, the Geofeedia emails aggravated a climate of mistrust and division, and confirmed a belief held by many that authorities were using surveillance and profiling to target racialized communities.

In a new paper, just published in the online, open-access journal SCRIPTed, I use the story around the discovery of Geofeedia’s activities and the backlash that followed to frame a broader discussion of police use of social media data analytics. Although this paper began as an exploration of the privacy issues raised by the state’s use of social media data analytics, it shifted into a paper about transparency. Clearly, privacy issues – as well as other civil liberties questions – remain of fundamental importance. Yet, the reality is that without adequate transparency there simply is no easy way to determine whether police are relying on social media data analytics, on what scale and for what purposes. This lack of transparency makes it difficult to hold anyone to account. The ACLU’s work to document the problem in California was painstaking and time consuming, as was a similar effort by the Brennan Center for Justice, also discussed in this paper. And, while the Geofeedia case provided an important example of the real problems that underlie such practices, it only came to light because Geofeedia’s employees made certain representations by email instead of in person or over the phone. A company need only direct that email not be used for these kinds of communications for the content of these communications to disappear from public view.

My paper examines the use of social media data analytics by police services, and then considers a range of different transparency issues. I explore some of the challenges to transparency that may flow from the way in which social media data analytics are described or characterized by police services. I then consider transparency from several different perspectives. In the first place I look at transparency in terms of developing explicit policies regarding social media data analytics. These policies are not just for police, but also for social media platforms and the developers that use their data. I then consider transparency as a form of oversight. I look at the ways in which greater transparency can cast light on the activities of the providers and users of social media data and data analytics. Finally, I consider the need for greater transparency around the monitoring of compliance with policies (those governing police or developers) and the enforcement of these policies.

A full text of my paper is available here under a CC Licence.

Published in Privacy

A recent news story from the Ottawa area raises interesting questions about big data, smart cities, and citizen engagement. The CBC reported that Ottawa and Gatineau have contracted with Strava, a private sector company to purchase data on cycling activity in their municipal boundaries. Strava makes a fitness app that can be downloaded for free onto a smart phone or other GPS-enabled device. The app uses the device’s GPS capabilities to gather data about the users’ routes travelled. Users then upload their data to Strava to view the data about their activities. Interested municipalities can contract with Strava Metro for aggregate de-identified data regarding users’ cycling patterns over a period of time (Ottawa and Gatineau have apparently contracted for 2 years’ worth of data). According to the news story, their goal is to use this data in planning for more bike-friendly cities.

On the face of it, this sounds like an interesting idea with a good objective in mind. And arguably, while the cities might create their own cycling apps to gather similar data, it might be cheaper in the end for them to contract for the Strava data rather than to design and then promote the use of theirs own apps. But before cities jump on board with such projects, there are a number of issues that need to be taken into account.

One of the most important issues, of course, is the quality of the data that will be provided to the city, and its suitability for planning purposes. The data sold to the city will only be gathered from those cyclists who carry GPS-enabled devices, and who use the Strava app. This raises the question of whether some cyclists – those, for example, who use bikes to get around to work, school or to run errands and who aren’t interested in fitness apps – will not be included in planning exercises aimed at determining where to add bike paths or bike lanes. Is the data most likely to come from spandex-wearing, affluent, hard core recreational cyclists than from other members of the cycling community? The cycling advocacy group Citizens for Safe Cycling in Ottawa is encouraging the public to use the app to help the data-gathering exercise. Interestingly, this group acknowledges that the typical Strava user is not necessarily representative of the average Ottawa cyclist. This is in part why they are encouraging a broader public use of the app. They express the view that some data is better than no data. Nevertheless, it is fair to ask whether this is an appropriate data set to use in urban planning. What other data will be needed to correct for its incompleteness, and are there plans in place to gather this data? What will the city really know about who is using the app and who is not? The purchased data will be deidentified and aggregated. Will the city have any idea of the demographic it represents? Still on the issue of data quality, it should be noted that some Strava users make use of the apps’ features to ride routes that create amusing map pictures (just Google “strava funny routes” to see some examples). How much of the city’s data will reflect this playful spirit rather than actual data about real riding routes is a question also worth asking.

Some ethical issues arise when planning data is gathered in this way. Obviously, the more people in Ottawa and Gatineau who use this app, the more data there will be. Does this mean that the cities have implicitly endorsed the use of one fitness app over another? Users of these apps necessarily enable tracking of their daily activities – should the city be encouraging this? While it is true that smart phones and apps of all variety are already harvesting tracking data for all sorts of known and unknown purposes, there may still be privacy implications for the user. Strava seems to have given good consideration to user privacy in its privacy policy, which is encouraging. Further, the only data sold to customers by Strava is deidentified and aggregated – this protects the privacy of app users in relation to Strava’s clients. Nevertheless, it would be interesting to know if the degree of user privacy protection provided was a factor for either city in choosing to use Strava’s services.

Another important issue – and this is a big one in the emerging smart cities context – relates to data ownership. Because the data is collected by Strava and then sold to the cities for use in their planning activities, it is not the cities’ own data. The CBC report makes it clear that the contract between Strava and its urban clients leaves ownership of the data in Strava’s hands. As a result, this data on cycling patterns in Ottawa cannot be made available as open data, nor can it be otherwise published or shared. It will also not be possible to obtain the data through an access to information request. This will surely reduce the transparency of planning decisions made in relation to cycling.

Smart cities and big data analytics are very hot right now, and we can expect to see all manner of public-private collaborations in the gathering and analysis of data about urban life. Much of this data may come from citizen-sensors as is the case with the Strava data. As citizens opt or are co-opted into providing the data that fuels analytics, there are many important legal, ethical and public policy questions which need to be asked.

Last week I wrote about a very early ‘finding’ under Canada’s Personal Information Protection and Electronic Documents Act which raises some issues about how the law might apply in the rapidly developing big data environment. This week I look at a more recent ‘finding’ – this time 5 years old – that should raise red flags regarding the extent to which Canada’s laws will protect individual privacy in the big data age.

In 2009, the Assistant Privacy Commissioner Elizabeth Denham (who is now the B.C. Privacy Commissioner) issued her findings as a result of an investigation into a complaint by the Canadian Internet Policy and Public Interest Clinic into the practices of a Canadian direct marketing company. The company combined information from different sources to create profiles of individuals linked to their home addresses. Customized mailing lists based on these profiles were then sold to clients looking for individuals falling within particular demographics for their products or services.

Consumer profiling is a big part of big data analytics, and today consumer profiles will draw upon vast stores of personal information collected from a broad range of online and offline sources. The data sources at issue in this case were much simpler, but the lessons that can be learned remain important.

The respondent organization used aggregate geodemographic data, which it obtained from Statistics Canada, and which was sorted according to census dissemination areas. This data was not specific to particular identifiable individuals – the aggregated data was not meant to reveal personal information, but it did give a sense of, for example, distribution of income by geographic area (in this case, by postal code). The company then took name and address information from telephone directories so as to match the demographic data with the name and location information derived from the directories. Based on the geo-demographic data, assumptions were made about income, marital status, likely home-ownership, and so on. The company also added its own assumptions about religion, ethnicity and gender based upon the telephone directory information – essentially drawing inferences based upon the subscribers’ names. These assumptions were made according to ‘proprietary models’. Other proprietary models were used to infer whether the individuals lived in single or multi-family dwellings. The result was a set of profiles of named individuals with inferences drawn about their income, ethnicity and gender. CIPPIC’s complaint was that the respondent company was collecting, using and disclosing the personal information of Canadians without their consent.

The findings of the Assistant Privacy Commissioner (APC) are troubling for a number of reasons. She began by characterizing the telephone directory information as “publicly available personal information”. Under PIPEDA, information that falls into this category, as defined by the regulations, can be collected, used and disclosed without consent, so long as the collection, use and disclosure are for the purposes for which it was made public. Telephone directories fall within the Regulations Specifying Publicly Available Information. However, the respondent organization did more than simply resell directory information.

Personal information is defined in PIPEDA as “information about an identifiable individual”. The APC characterized the aggregate geodemographic data as information about certain neighborhoods, and not information about identifiable individuals. She stated that “the fact that a person lives in a neighborhood with certain characteristics” was not personal information about that individual.

The final piece of information associated with the individuals in this case was the set of assumptions about, among other things, religion, ethnicity and gender. The APC characterized these as “assumptions”, rather than personal information – after all, the assumptions might not be correct.

Because the respondent’s clients provided the company with the demographic characteristics of the group it sought to reach, and because the respondent company merely furnished names and addresses in response to these requests, the APC concluded that the only personal information that was collected, used or disclosed was publicly available personal information for which consent was not required. (And, in case you are wondering, allowing people to contact individuals was one of the purposes for which telephone directory information is published – so the “use” by companies of sending out marketing information fell within the scope of the exception).

And thus, by considering each of the pieces of information used in the profile separately, the respondent’s creation of consumer profiles from diffuse information sources fell right through the cracks in Canada’s data protection legislation. This does not bode well for consumer privacy in an age of big data analytics.

The most troubling part of the approach taken by the APC is that which dismisses “assumptions” made about individuals as being merely assumptions and not personal information. Consumer profiling is about attributing characteristics to individuals based on an analysis of their personal information from a variety of sources. It is also about acting on those assumptions once the profile is created. The assumptions may be wrong, the data may be flawed, but the consumer will nonetheless have to bear the effects of that profile. These effects may be as minor as being sent advertising that may or may not match their activities or interests; but they could be as significant as decisions made about entitlements to certain products or services, about what price they should be offered for products or services, or about their desirability as a customer, tenant or employee. If the assumptions are not “actual” personal information, they certainly have the same effect, and should be treated as personal information. Indeed, the law accepts that personal information in the hands of an organization may be incorrect (hence the right to correct personal information), and it accepts that opinions about an individual constitute their personal information, even though the opinions may be unfair.

The treatment of the aggregate geodemographic information is also problematic. On its own, it is safe to say that aggregate geodemographic information is information about neighborhoods and not about individuals. But when someone looks up the names and addresses of the individuals living in an area and matches that information to the average age, income and other data associated with their postal codes, then they have converted that information into personal information. As with the ethnicity and gender assumptions, the age, income, and other assumptions may be close or they may be way off base. Either way, they become part of a profile of an individual that will be used to make decisions about that person. Leslie O’Keefe may not be Irish, he may not be a woman, and he may not make $100,000 a year – but if he is profiled in this way for marketing or other purposes, it is not clear why he should have no recourse under data protection laws.

Of course, the challenged faced by the APC in this case was how to manage the ‘balance’ set out in s. 3 of PIPEDA between the privacy interests of individuals and the commercial need to collect, use and disclose personal information. In this case, to find that consent – that cornerstone of data protection laws – was required for the use and disclosure of manufactured personal information, would be to hamstring an industry built on the sale of manufactured personal information. As the use – and the sophistication – of big data and big data analytics advances, organizations will continue to insist that they cannot function or compete without the use of massive stores of personal information. If this case is any indication, decision makers will be asked to continue to blur and shrink the edges of key concepts in the legislation, such as “consent” and “personal information”.

The PIPEDA complaint in this case dealt with relatively unsophisticated data used for relatively mundane purposes, and its importance may be too easily overlooked as a result. But how we define personal information and how we interpret data protection legislation will have enormous importance as to role of big data analytics in our lives continues to grow. Both this decision and the one discussed last week offer some insights into how Canada’s data protection laws might be interpreted or applied – and they raise red flags about the extent to which these laws are adequately suited to protecting privacy in the big data era.

Published in Privacy

A long past and largely forgotten ‘finding* from the Office of the Privacy Commissioner of Canada offers important insights into the challenges that big data and big data analytics will pose for the protection of Canadians’ privacy and consumer rights.

13 years ago, former Privacy Commissioner George Radwanski issued his findings on a complaint that had been brought against a bank. The complainant had alleged that the bank had wrongfully denied her access to her personal information. The requirement to provide access is found in the Personal Information Protection and Electronic Documents Act (PIPEDA). The right of access also comes with a right to demand the correction of any errors in the personal information in the hands of the organization. This right is fundamentally important, not just to privacy. Without access to the personal information being used to inform decision-making, consumers have very little recourse of any kind against adverse or flawed decision-making.

The complainant in this case had applied for and been issued a credit card by the bank. What she sought was access to the credit score that had been used to determine her entitlement to the card. The bank had relied upon two credit scores in reaching its decision. The first was the type produced by a credit reporting agency – in this case, Equifax. The second was an internal score generated by the bank using its own data and algorithm. The bank was prepared to release the former to the complainant, but refused to give her access to the latter. The essence of the complaint, therefore, was whether the bank had breached its obligations under PIPEDA to give her access to the personal information it held about her.

The Privacy Commissioner’s views on the interpretation and application of the statute in this case are worth revisiting 13 years later as big data analytics now fuel so much decision-making regarding consumers and their entitlement to or eligibility for a broad range of products and services. Credit reporting agencies are heavily regulated to ensure that decisions about credit-worthiness are made fairly and equitably, and to ensure that individuals have clear rights to access and to correct information in their files. For example, credit reporting legislation may limit the types of information and the data sources that may be used by credit reporting agencies in arriving at their credit scores. But big data analytics are now increasingly relied upon by all manner of organizations that are not regulated in the same way as credit-reporting agencies. These analytics are used to make decisions of similar importance to consumers – including decisions about credit-worthiness. There are few limits on the data that is used to fuel these analytics, nor is there much transparency in the process.

In this case, the bank justified its refusal to disclose its internal credit score on two main grounds. First, it argued that this information was not “personal information” within the meaning of PIPEDA because it was ‘created’ internally and not collected from the consumer or any other sources. The bank argued that this meant that it did not have to provide access, and that in any event, the right of access was linked to the right to request correction. The nature of the information – which was generated based upon a proprietary algorithm – was such that was not “facts” that could be open to correction.

The argument that generated information is not personal information is a dangerous one, as it could lead to a total failure of accountability under data protection laws. The Commissioner rejected this argument. In his view, it did not matter whether the information was generated or collected; nor did it matter whether it was subject to correction or not. The information was personal information because it related to the individual. He noted that “opinions” about an individual were still considered to be personal information, even though they are not subject to correction. This view of ‘opinions’ is consistent with subsequent findings and decisions under PIPEDA and comparable Canadian data protection laws. Thus, in the view of the Commissioner, the bank’s internally generated credit score was the complainant’s personal information and was subject to PIPEDA.

The bank’s second argument was more successful, and is problematic for consumers. The bank argued that releasing the credit score to the complainant would reveal confidential commercial information. Under s. 9(3)(b) of PIPEDA, an organization is not required to release personal information in such circumstances. The bank was not arguing so much that the complainant’s score itself was confidential commercial information; rather, what was confidential were the algorithms used to arrive at the score. The bank argued that these algorithms could be reverse-engineered from a relatively small sample of credit scores. Thus, a finding that such credit scores must be released to individuals would leave the bank open to the hypothetical situation where a rival might organize or pay 20 or so individuals to seek access to their internally generated credit scores in the hands of the bank, and that set of scores could then be used to arrive at the confidential algorithms. The Commissioner referred this issue to an expert on algorithms and concluded that “although an exact determination of a credit-scoring model was difficult and highly unlikely, access to customized credit scores would definitely make it easier to approximate a bank’s model.”

The Commissioner noted that under s. 9(3)(b) there has to be some level of certainty that the disclosure of personal information will reveal confidential commercial information before disclosure can be refused. In this case, the Commissioner indicated that he had “some difficulty believing that either competitors or rings of algorithmically expert fraud artists would go to the lengths involved.” He went on to say that “[t]he spectre of the banks falling under systematic assault from teams of loan-hungry mathematicians is simply not one I find particularly persuasive.” Notwithstanding this, he ruled in favour of the bank. He noted that other banks shared the same view as the respondent bank, and that competition in the banking industry was high. Since he had found it was technically possible to reverse-engineer the algorithm, he was of the view that he had to find that the release of the credit score would reveal confidential commercial information. He was satisfied with the evidence the bank supplied to demonstrate how closely guarded the credit-scoring algorithm was. He noted that in the UK and Australia, relatively new guidelines required organizations to provide only general information regarding why credit was denied.

The lack of transparency of algorithms used in the big data environment becomes increasingly problematic the more such algorithms are used. Big data analytics can be used to determine credit-worthiness – and such these determinations are made not just by banks but by all manner of companies that extend consumer credit through loans, don’t-pay-for-a-year deals, purchase-by-installment, store credit cards, and so on. They can also be used to determine who is entitled to special offers or promotions, for price discrimination (where some customers are offered better prices for the same products or services), and in a wide range of other contexts. Analytics may also be used by prospective employers, landlords or others whose decisions may have important impacts on people’s lives. Without algorithmic transparency, it might be impossible to know whether the assumptions, weightings or scoring factors are biased, influenced by sexism or racism (or other discriminatory considerations), or simply flawed.

There may be some comfort to be had that in this case the Commissioner was allowed to have access to the scoring model used. He stated that he found it innocuous – although it is not clear what kind of scrutiny he gave it. After all, his mandate extended only to decisions relating to the management of personal information, and did not extend to issues of discrimination. It is also worth noting that the Commissioner seems to suggest that each case must be decided on its own facts, and that what the complainant stood to gain and the respondent stood to lose were relevant considerations. In this case, the complainant had not been denied credit, so in the Commissioner’s view there was little benefit to her in the release of the information to be weighed against the potential harm to the bank. Nevertheless, the decision raises a red flag around transparency in the big data context.

In the next week or so I will be posting a ‘Back to the Future II’ account of another, not quite so old, PIPEDA finding that is also significant in the big data era. Disturbingly, this decision eats away at Commissioner Radwanski’s conclusion on the issue of “personal information” as it relates to generated or inferred information about individuals. Stay tuned!



* Because the Privacy Commissioner of Canada has no order-making powers, he can only issue “findings” in response to complaints filed with the office. The ‘findings’ are essentially opinions as to how the act applies in the circumstances of the complaint. If the complaint is considered well-founded, the Commissioner can also make recommendations as to how the organization should correct these practices. For binding orders or compensation the complainant must first go through the complaints process and then take the matter to the Federal Court. Few complainants do so. Thus, while findings are non-binding and set no precedent, they do provide some insight into how the Commissioner would interpret and apply the legislation.

 

Published in Privacy

Canadian Trademark Law

Published in 2015 by Lexis Nexis

Canadian Trademark Law 2d Edition

Buy on LexisNexis

Electronic Commerce and Internet Law in Canada, 2nd Edition

Published in 2012 by CCH Canadian Ltd.

Electronic Commerce and Internet Law in Canada

Buy on CCH Canadian

Intellectual Property for the 21st Century

Intellectual Property Law for the 21st Century:

Interdisciplinary Approaches

Purchase from Irwin Law