Teresa Scassa - Blog

Displaying items by tag: data scraping

Skirmishes over right to freely access and use “publicly available” data hosted by internet platform companies have led to an interesting decision from the U.S. District Court from the Northern District of California. The decision is on a motion for an interlocutory injunction, so it does not decide the merits of the competing claims. Nevertheless, it provides insight into a set of issues that are likely only to increase in importance as these rich troves of data are mined by competitors, opportunistic businesses, big data giants, researchers and civil society actors.

The parties in hiQ Labs Inc. v LinkedIn Corp. are companies whose business models are based upon career-related personal information provided by professionals. LinkedIn offers a professional networking platform to over 500 million users, and it is easily the leading company in its space. hiQ, for its part, is a data analytics company with two main products aimed at enterprises. The first is “Keeper”, a product which informs corporations about which of their employees are at greatest risk of being poached by other companies. The second is “Skill Mapper” which provides businesses with summaries of the skills of their employees. For both of its products hiQ relies on data that it scrapes from LinkedIn’s publicly accessible web pages.

Data featured on LinkedIn’s site are provided by users who create accounts and populate their profiles with a broad range of information about their background and skills. LinkedIn members have some control over the extent to which their information will be shared by others. They can choose to limit access to their profile information to only their close contacts or to an expanded list of contacts. Alternatively, they can provide access to all other members of LinkedIn. They also have the option to make their profiles entirely public. These public profiles are searchable by search engines such as Google. It is the data in the fully public profiles that is scraped and used by hiQ.

hiQ is not the only company that scrapes data from LinkedIn as part of an independent business model. In fact, LinkedIn has only recently attempted to take legal action against a large number of users of its data. hiQ was just one of many companies that received a cease and desist letter from LinkedIn. Because being cut off from the LinkedIn data would effectively decimate its business, hiQ responded by seeking a declaration from the California court that its activities were legal. The recent decision from the court is in relation to hiQ’s request for an interlocutory injunction that will allow it to continue to access the LinkedIn data pending resolution of the substantive legal issues raised by both sides.

hiQ argued that in moving against its data scraping activities, LinkedIn engaged in unfair business practices, and violated its free speech rights under the California constitution. LinkedIn, for its part, argued that hiQ’s data scraping activities violated the Computer Fraud and Abuse Act (CFAA), as well as the digital locks provisions Digital Millennium Copyright Act (DMCA) (although these latter claims do not feature in the decision on the interlocutory injunction).

Like other platform companies, access to and use of LinkedIn’s site is governed by website Terms of Service (TOS). These TOS prohibit data scraping. When LinkedIn demanded that hiQ cease scraping data from its site, it also implemented technological protection measures to prevent access by hiQ to its data. LinkedIn’s claims under the CFAA and the DMCA are based largely on the circumvention of these technological barriers by hiQ.

The court ultimately granted the injunction barring LinkedIn from limiting hiQ’s access to its publicly available data pending the resolution of the issues in the case. In doing so, it expressed its doubts that the CFAA applied to hiQ’s activity, noting that if it did, it would “profoundly impact open access to the Internet.” It also found that attempts by LinkedIn to block hiQ’s access might be in breach of state law as anti-competitive behavior. In reaching its decision, the court had some interesting things to say about the importance of access to publicly accessible data, and the privacy rights of those who provided the data. These issues are highlighted in the discussion below.

In deciding whether to grant an interlocutory injunction, a court must assess both the possibility of irreparable harm and the balance of convenience as between the parties. In this case, the court found that denying hiQ access to LinkedIn data would essentially put it out of business – causing it irreparable harm. LinkedIn argued that it was imperative that it be allowed to protect its data because of its users’ privacy interests. While hiQ only scraped data from public profiles, LinkedIn argued that even those users with public profiles had privacy interests. I noted that 50 million of its users with public profiles had selected its “Do Not Broadcast” feature which prevents profile updates from being broadcast to a user’s connections. LinkedIn described this as a privacy feature that would essentially be circumvented by routine data scraping. The court was not convinced. In the first place, it found that there might be many reasons besides privacy concerns that motivated users to choose “do not broadcast”. It gave as an example the concern by users that their connections not be spammed by endless notifications. The Court also noted that LinkedIn had its own service for professional recruiters that kept them apprised of updates even from users who had implemented “Do Not Broadcast”. The court dismissed arguments by LinkedIn that this was different because users had consented to such sharing in their privacy policy. The court stated: “It is unlikely, however, that most users’ actual privacy expectations are shaped by the fine print of a privacy policy buried in the User Agreement that likely few, if any, users have actually read.” [Emphasis in original] This is interesting, because the court discounts the relevance of a privacy policy in informing users’ expectations of privacy. Essentially, the court finds that users who make their profiles public have no real expectation of privacy in the information. LinkedIn could therefore not rely on its users’ privacy interests to justify its actions.

In assessing whether the parties raised serious questions going to the merits of the case, the court considered LinkedIn’s arguments about the CFAA. The CFAA essentially criminalizes intentional access to a computer without authorization, or in a way that exceeds the authorization provided, with the result that information is obtained. The question, therefore, was whether hiQ’s continued access to the LinkedIn site after LinkedIn expressly revoked any permission and tried to bar its access, was a violation of the CFAA. The court dismissed the cases cited by LinkedIn in support of its position, noting that these cases involved unauthorized access to password protected sites as opposed to accessing publicly available information.

The court observed that the CFAA was enacted largely to deal with the problem of computer hacking. It noted that if the application of the law was extended to publicly accessible websites it would greatly expand the scope of the legislation with serious consequences. The court noted that this would mean that “merely viewing a website in contravention of a unilateral directive from a private company would be a crime.” [Emphasis in original] It went on to note that “The potential for such exercise of power over access to publicly viewable information by a private entity weaponized by the potential of criminal sanctions is deeply concerning.” The court placed great emphasis on the importance of an open internet. It noted that “LinkedIn, here, essentially seeks to prohibit hiQ from viewing a sign publicly visible to all”. It clearly preferred an interpretation of the CFAA that would be limited to unauthorized access to a computer system through some form of “authentication gateway”.

The court also found that hiQ raised serious questions that LinkedIn’s behavior might fall afoul of competition laws in California. It noted that LinkedIn is in a dominant position in the field of professional networking, and that it might be leveraging its position to get a “competitively unjustified advantage in a different market.” It also accepted that it was possible that LinkedIn was denying its competitors access to an essential facility that it controls.

The court was not convinced by hiQ’s arguments that the technological barriers erected by LinkedIn violated the free speech guarantees in the California Constitution. Nevertheless, it found that on balance the public interest favoured the granting of the injunction to hiQ pending the outcome of litigation on the merits.

This dispute is extremely interesting and worth following. There are a growing number of platforms that host vast stores of publicly accessible data, and these data are often relied upon by upstart businesses (as well as established big data companies, researchers, and civil society) for a broad range of purposes. The extent to which a platform company can control its publicly accessible data is an important one, and one which, as the California court points out, will have important public policy ramifications. The related privacy issues – where the data is also personal information – are also important and interesting. These latter issues may be treated differently in different jurisdictions depending upon the applicable data protection laws.

Published in Privacy

Canadian Trademark Law

Published in 2015 by Lexis Nexis

Canadian Trademark Law 2d Edition

Buy on LexisNexis

Electronic Commerce and Internet Law in Canada, 2nd Edition

Published in 2012 by CCH Canadian Ltd.

Electronic Commerce and Internet Law in Canada

Buy on CCH Canadian

Intellectual Property for the 21st Century

Intellectual Property Law for the 21st Century:

Interdisciplinary Approaches

Purchase from Irwin Law