Teresa Scassa - Blog

Displaying items by tag: big data

A recent news story from the Ottawa area raises interesting questions about big data, smart cities, and citizen engagement. The CBC reported that Ottawa and Gatineau have contracted with Strava, a private sector company to purchase data on cycling activity in their municipal boundaries. Strava makes a fitness app that can be downloaded for free onto a smart phone or other GPS-enabled device. The app uses the device’s GPS capabilities to gather data about the users’ routes travelled. Users then upload their data to Strava to view the data about their activities. Interested municipalities can contract with Strava Metro for aggregate de-identified data regarding users’ cycling patterns over a period of time (Ottawa and Gatineau have apparently contracted for 2 years’ worth of data). According to the news story, their goal is to use this data in planning for more bike-friendly cities.

On the face of it, this sounds like an interesting idea with a good objective in mind. And arguably, while the cities might create their own cycling apps to gather similar data, it might be cheaper in the end for them to contract for the Strava data rather than to design and then promote the use of theirs own apps. But before cities jump on board with such projects, there are a number of issues that need to be taken into account.

One of the most important issues, of course, is the quality of the data that will be provided to the city, and its suitability for planning purposes. The data sold to the city will only be gathered from those cyclists who carry GPS-enabled devices, and who use the Strava app. This raises the question of whether some cyclists – those, for example, who use bikes to get around to work, school or to run errands and who aren’t interested in fitness apps – will not be included in planning exercises aimed at determining where to add bike paths or bike lanes. Is the data most likely to come from spandex-wearing, affluent, hard core recreational cyclists than from other members of the cycling community? The cycling advocacy group Citizens for Safe Cycling in Ottawa is encouraging the public to use the app to help the data-gathering exercise. Interestingly, this group acknowledges that the typical Strava user is not necessarily representative of the average Ottawa cyclist. This is in part why they are encouraging a broader public use of the app. They express the view that some data is better than no data. Nevertheless, it is fair to ask whether this is an appropriate data set to use in urban planning. What other data will be needed to correct for its incompleteness, and are there plans in place to gather this data? What will the city really know about who is using the app and who is not? The purchased data will be deidentified and aggregated. Will the city have any idea of the demographic it represents? Still on the issue of data quality, it should be noted that some Strava users make use of the apps’ features to ride routes that create amusing map pictures (just Google “strava funny routes” to see some examples). How much of the city’s data will reflect this playful spirit rather than actual data about real riding routes is a question also worth asking.

Some ethical issues arise when planning data is gathered in this way. Obviously, the more people in Ottawa and Gatineau who use this app, the more data there will be. Does this mean that the cities have implicitly endorsed the use of one fitness app over another? Users of these apps necessarily enable tracking of their daily activities – should the city be encouraging this? While it is true that smart phones and apps of all variety are already harvesting tracking data for all sorts of known and unknown purposes, there may still be privacy implications for the user. Strava seems to have given good consideration to user privacy in its privacy policy, which is encouraging. Further, the only data sold to customers by Strava is deidentified and aggregated – this protects the privacy of app users in relation to Strava’s clients. Nevertheless, it would be interesting to know if the degree of user privacy protection provided was a factor for either city in choosing to use Strava’s services.

Another important issue – and this is a big one in the emerging smart cities context – relates to data ownership. Because the data is collected by Strava and then sold to the cities for use in their planning activities, it is not the cities’ own data. The CBC report makes it clear that the contract between Strava and its urban clients leaves ownership of the data in Strava’s hands. As a result, this data on cycling patterns in Ottawa cannot be made available as open data, nor can it be otherwise published or shared. It will also not be possible to obtain the data through an access to information request. This will surely reduce the transparency of planning decisions made in relation to cycling.

Smart cities and big data analytics are very hot right now, and we can expect to see all manner of public-private collaborations in the gathering and analysis of data about urban life. Much of this data may come from citizen-sensors as is the case with the Strava data. As citizens opt or are co-opted into providing the data that fuels analytics, there are many important legal, ethical and public policy questions which need to be asked.

Last week I wrote about a very early ‘finding’ under Canada’s Personal Information Protection and Electronic Documents Act which raises some issues about how the law might apply in the rapidly developing big data environment. This week I look at a more recent ‘finding’ – this time 5 years old – that should raise red flags regarding the extent to which Canada’s laws will protect individual privacy in the big data age.

In 2009, the Assistant Privacy Commissioner Elizabeth Denham (who is now the B.C. Privacy Commissioner) issued her findings as a result of an investigation into a complaint by the Canadian Internet Policy and Public Interest Clinic into the practices of a Canadian direct marketing company. The company combined information from different sources to create profiles of individuals linked to their home addresses. Customized mailing lists based on these profiles were then sold to clients looking for individuals falling within particular demographics for their products or services.

Consumer profiling is a big part of big data analytics, and today consumer profiles will draw upon vast stores of personal information collected from a broad range of online and offline sources. The data sources at issue in this case were much simpler, but the lessons that can be learned remain important.

The respondent organization used aggregate geodemographic data, which it obtained from Statistics Canada, and which was sorted according to census dissemination areas. This data was not specific to particular identifiable individuals – the aggregated data was not meant to reveal personal information, but it did give a sense of, for example, distribution of income by geographic area (in this case, by postal code). The company then took name and address information from telephone directories so as to match the demographic data with the name and location information derived from the directories. Based on the geo-demographic data, assumptions were made about income, marital status, likely home-ownership, and so on. The company also added its own assumptions about religion, ethnicity and gender based upon the telephone directory information – essentially drawing inferences based upon the subscribers’ names. These assumptions were made according to ‘proprietary models’. Other proprietary models were used to infer whether the individuals lived in single or multi-family dwellings. The result was a set of profiles of named individuals with inferences drawn about their income, ethnicity and gender. CIPPIC’s complaint was that the respondent company was collecting, using and disclosing the personal information of Canadians without their consent.

The findings of the Assistant Privacy Commissioner (APC) are troubling for a number of reasons. She began by characterizing the telephone directory information as “publicly available personal information”. Under PIPEDA, information that falls into this category, as defined by the regulations, can be collected, used and disclosed without consent, so long as the collection, use and disclosure are for the purposes for which it was made public. Telephone directories fall within the Regulations Specifying Publicly Available Information. However, the respondent organization did more than simply resell directory information.

Personal information is defined in PIPEDA as “information about an identifiable individual”. The APC characterized the aggregate geodemographic data as information about certain neighborhoods, and not information about identifiable individuals. She stated that “the fact that a person lives in a neighborhood with certain characteristics” was not personal information about that individual.

The final piece of information associated with the individuals in this case was the set of assumptions about, among other things, religion, ethnicity and gender. The APC characterized these as “assumptions”, rather than personal information – after all, the assumptions might not be correct.

Because the respondent’s clients provided the company with the demographic characteristics of the group it sought to reach, and because the respondent company merely furnished names and addresses in response to these requests, the APC concluded that the only personal information that was collected, used or disclosed was publicly available personal information for which consent was not required. (And, in case you are wondering, allowing people to contact individuals was one of the purposes for which telephone directory information is published – so the “use” by companies of sending out marketing information fell within the scope of the exception).

And thus, by considering each of the pieces of information used in the profile separately, the respondent’s creation of consumer profiles from diffuse information sources fell right through the cracks in Canada’s data protection legislation. This does not bode well for consumer privacy in an age of big data analytics.

The most troubling part of the approach taken by the APC is that which dismisses “assumptions” made about individuals as being merely assumptions and not personal information. Consumer profiling is about attributing characteristics to individuals based on an analysis of their personal information from a variety of sources. It is also about acting on those assumptions once the profile is created. The assumptions may be wrong, the data may be flawed, but the consumer will nonetheless have to bear the effects of that profile. These effects may be as minor as being sent advertising that may or may not match their activities or interests; but they could be as significant as decisions made about entitlements to certain products or services, about what price they should be offered for products or services, or about their desirability as a customer, tenant or employee. If the assumptions are not “actual” personal information, they certainly have the same effect, and should be treated as personal information. Indeed, the law accepts that personal information in the hands of an organization may be incorrect (hence the right to correct personal information), and it accepts that opinions about an individual constitute their personal information, even though the opinions may be unfair.

The treatment of the aggregate geodemographic information is also problematic. On its own, it is safe to say that aggregate geodemographic information is information about neighborhoods and not about individuals. But when someone looks up the names and addresses of the individuals living in an area and matches that information to the average age, income and other data associated with their postal codes, then they have converted that information into personal information. As with the ethnicity and gender assumptions, the age, income, and other assumptions may be close or they may be way off base. Either way, they become part of a profile of an individual that will be used to make decisions about that person. Leslie O’Keefe may not be Irish, he may not be a woman, and he may not make $100,000 a year – but if he is profiled in this way for marketing or other purposes, it is not clear why he should have no recourse under data protection laws.

Of course, the challenged faced by the APC in this case was how to manage the ‘balance’ set out in s. 3 of PIPEDA between the privacy interests of individuals and the commercial need to collect, use and disclose personal information. In this case, to find that consent – that cornerstone of data protection laws – was required for the use and disclosure of manufactured personal information, would be to hamstring an industry built on the sale of manufactured personal information. As the use – and the sophistication – of big data and big data analytics advances, organizations will continue to insist that they cannot function or compete without the use of massive stores of personal information. If this case is any indication, decision makers will be asked to continue to blur and shrink the edges of key concepts in the legislation, such as “consent” and “personal information”.

The PIPEDA complaint in this case dealt with relatively unsophisticated data used for relatively mundane purposes, and its importance may be too easily overlooked as a result. But how we define personal information and how we interpret data protection legislation will have enormous importance as to role of big data analytics in our lives continues to grow. Both this decision and the one discussed last week offer some insights into how Canada’s data protection laws might be interpreted or applied – and they raise red flags about the extent to which these laws are adequately suited to protecting privacy in the big data era.

Published in Privacy

A long past and largely forgotten ‘finding* from the Office of the Privacy Commissioner of Canada offers important insights into the challenges that big data and big data analytics will pose for the protection of Canadians’ privacy and consumer rights.

13 years ago, former Privacy Commissioner George Radwanski issued his findings on a complaint that had been brought against a bank. The complainant had alleged that the bank had wrongfully denied her access to her personal information. The requirement to provide access is found in the Personal Information Protection and Electronic Documents Act (PIPEDA). The right of access also comes with a right to demand the correction of any errors in the personal information in the hands of the organization. This right is fundamentally important, not just to privacy. Without access to the personal information being used to inform decision-making, consumers have very little recourse of any kind against adverse or flawed decision-making.

The complainant in this case had applied for and been issued a credit card by the bank. What she sought was access to the credit score that had been used to determine her entitlement to the card. The bank had relied upon two credit scores in reaching its decision. The first was the type produced by a credit reporting agency – in this case, Equifax. The second was an internal score generated by the bank using its own data and algorithm. The bank was prepared to release the former to the complainant, but refused to give her access to the latter. The essence of the complaint, therefore, was whether the bank had breached its obligations under PIPEDA to give her access to the personal information it held about her.

The Privacy Commissioner’s views on the interpretation and application of the statute in this case are worth revisiting 13 years later as big data analytics now fuel so much decision-making regarding consumers and their entitlement to or eligibility for a broad range of products and services. Credit reporting agencies are heavily regulated to ensure that decisions about credit-worthiness are made fairly and equitably, and to ensure that individuals have clear rights to access and to correct information in their files. For example, credit reporting legislation may limit the types of information and the data sources that may be used by credit reporting agencies in arriving at their credit scores. But big data analytics are now increasingly relied upon by all manner of organizations that are not regulated in the same way as credit-reporting agencies. These analytics are used to make decisions of similar importance to consumers – including decisions about credit-worthiness. There are few limits on the data that is used to fuel these analytics, nor is there much transparency in the process.

In this case, the bank justified its refusal to disclose its internal credit score on two main grounds. First, it argued that this information was not “personal information” within the meaning of PIPEDA because it was ‘created’ internally and not collected from the consumer or any other sources. The bank argued that this meant that it did not have to provide access, and that in any event, the right of access was linked to the right to request correction. The nature of the information – which was generated based upon a proprietary algorithm – was such that was not “facts” that could be open to correction.

The argument that generated information is not personal information is a dangerous one, as it could lead to a total failure of accountability under data protection laws. The Commissioner rejected this argument. In his view, it did not matter whether the information was generated or collected; nor did it matter whether it was subject to correction or not. The information was personal information because it related to the individual. He noted that “opinions” about an individual were still considered to be personal information, even though they are not subject to correction. This view of ‘opinions’ is consistent with subsequent findings and decisions under PIPEDA and comparable Canadian data protection laws. Thus, in the view of the Commissioner, the bank’s internally generated credit score was the complainant’s personal information and was subject to PIPEDA.

The bank’s second argument was more successful, and is problematic for consumers. The bank argued that releasing the credit score to the complainant would reveal confidential commercial information. Under s. 9(3)(b) of PIPEDA, an organization is not required to release personal information in such circumstances. The bank was not arguing so much that the complainant’s score itself was confidential commercial information; rather, what was confidential were the algorithms used to arrive at the score. The bank argued that these algorithms could be reverse-engineered from a relatively small sample of credit scores. Thus, a finding that such credit scores must be released to individuals would leave the bank open to the hypothetical situation where a rival might organize or pay 20 or so individuals to seek access to their internally generated credit scores in the hands of the bank, and that set of scores could then be used to arrive at the confidential algorithms. The Commissioner referred this issue to an expert on algorithms and concluded that “although an exact determination of a credit-scoring model was difficult and highly unlikely, access to customized credit scores would definitely make it easier to approximate a bank’s model.”

The Commissioner noted that under s. 9(3)(b) there has to be some level of certainty that the disclosure of personal information will reveal confidential commercial information before disclosure can be refused. In this case, the Commissioner indicated that he had “some difficulty believing that either competitors or rings of algorithmically expert fraud artists would go to the lengths involved.” He went on to say that “[t]he spectre of the banks falling under systematic assault from teams of loan-hungry mathematicians is simply not one I find particularly persuasive.” Notwithstanding this, he ruled in favour of the bank. He noted that other banks shared the same view as the respondent bank, and that competition in the banking industry was high. Since he had found it was technically possible to reverse-engineer the algorithm, he was of the view that he had to find that the release of the credit score would reveal confidential commercial information. He was satisfied with the evidence the bank supplied to demonstrate how closely guarded the credit-scoring algorithm was. He noted that in the UK and Australia, relatively new guidelines required organizations to provide only general information regarding why credit was denied.

The lack of transparency of algorithms used in the big data environment becomes increasingly problematic the more such algorithms are used. Big data analytics can be used to determine credit-worthiness – and such these determinations are made not just by banks but by all manner of companies that extend consumer credit through loans, don’t-pay-for-a-year deals, purchase-by-installment, store credit cards, and so on. They can also be used to determine who is entitled to special offers or promotions, for price discrimination (where some customers are offered better prices for the same products or services), and in a wide range of other contexts. Analytics may also be used by prospective employers, landlords or others whose decisions may have important impacts on people’s lives. Without algorithmic transparency, it might be impossible to know whether the assumptions, weightings or scoring factors are biased, influenced by sexism or racism (or other discriminatory considerations), or simply flawed.

There may be some comfort to be had that in this case the Commissioner was allowed to have access to the scoring model used. He stated that he found it innocuous – although it is not clear what kind of scrutiny he gave it. After all, his mandate extended only to decisions relating to the management of personal information, and did not extend to issues of discrimination. It is also worth noting that the Commissioner seems to suggest that each case must be decided on its own facts, and that what the complainant stood to gain and the respondent stood to lose were relevant considerations. In this case, the complainant had not been denied credit, so in the Commissioner’s view there was little benefit to her in the release of the information to be weighed against the potential harm to the bank. Nevertheless, the decision raises a red flag around transparency in the big data context.

In the next week or so I will be posting a ‘Back to the Future II’ account of another, not quite so old, PIPEDA finding that is also significant in the big data era. Disturbingly, this decision eats away at Commissioner Radwanski’s conclusion on the issue of “personal information” as it relates to generated or inferred information about individuals. Stay tuned!



* Because the Privacy Commissioner of Canada has no order-making powers, he can only issue “findings” in response to complaints filed with the office. The ‘findings’ are essentially opinions as to how the act applies in the circumstances of the complaint. If the complaint is considered well-founded, the Commissioner can also make recommendations as to how the organization should correct these practices. For binding orders or compensation the complainant must first go through the complaints process and then take the matter to the Federal Court. Few complainants do so. Thus, while findings are non-binding and set no precedent, they do provide some insight into how the Commissioner would interpret and apply the legislation.

 

Published in Privacy

Canadian Trademark Law

Published in 2015 by Lexis Nexis

Canadian Trademark Law 2d Edition

Buy on LexisNexis

Electronic Commerce and Internet Law in Canada, 2nd Edition

Published in 2012 by CCH Canadian Ltd.

Electronic Commerce and Internet Law in Canada

Buy on CCH Canadian

Intellectual Property for the 21st Century

Intellectual Property Law for the 21st Century:

Interdisciplinary Approaches

Purchase from Irwin Law