Legal Landscape of Web Scraping and Practice Tips
July 26, 2021
By Tony Caldwell and Carina Arellano*
This Legal Alert is a follow up to our June 3, 2021 Legal Alert, “Supreme Court Narrows Scope of the Computer Fraud and Abuse Act,” and provides an overview of relevant legal developments related to the topic as well as considerations for entities seeking to engage in web-scraping practices.
Computer Fraud and Abuse Act
The Computer Fraud and Abuse Act (“CFAA”)1 generally prohibits computer hacking and enumerates severe criminal penalties when an individual “intentionally accesses a computer without authorization or exceeds [his or her] authorized access.” The CFAA also includes a private right of action in which persons suffering “damage” or “loss” as a result of a CFAA violation can sue the violator for money damages and equitable relief.2
Two recent high-profile cases concerning CFAA interpretation will affect the legality of web scraping for publicly available information. As a reminder, web scraping is the process of extracting data from a website or specific webpage.
Van Buren v. United States 3and “Exceeding Authorized Access”
On June 3, 2021, the U.S. Supreme Court ruled in Van Buren that an individual “exceeds authorized access” when such individual “accesses a computer with authorization but then obtains information located in particular areas of the computer—such as files, folders, or databases—that are off limits to [the individual].” In other words, the Court found that an individual may be liable under the CFAA on a “gates-up-and-down” approach. On the one hand, an individual may not be liable under the CFAA if they are authorized to access an entire computer system, (i.e. the gates are up), and access any portion(s) thereof. On the other hand, an individual may be liable under the CFAA if they are granted access only to a limited portion of a computer system, (i.e. the gates are down), and the individual accesses areas within the system beyond those to which such individuals was granted access.
LinkedIn v. hiQ Labs, Inc.4 and “Without Authorization”
On June 14, 2021, the U.S. Supreme Court granted LinkedIn’s petition for certiorari filed in the hiQ case. The Court subsequently vacated the Ninth Circuit’s prior ruling and remanded the case back to the appeals court for further consideration in light of the Court’s ruling in Van Buren. For purposes of CFAA liability, it would seem that hiQ may rest on whether LinkedIn actually lowered the gate (i.e. limited access) to publicly available information on its website through technical restrictions and a formal revocation of access such that hiQ’s access to LinkedIn’s data was “without authorization.” Ultimately, the Ninth Circuit may determine that CFAA liability does not apply to publicly available website data and that the gate for public website content is always up.
As of today, in addition to potential CFAA liability, various State statutes and common law claims, may be applicable to web scraping actions.
In light of the foregoing, entities who are seeking to engage in web scraping practices may consider the following:
- Avoid performing prohibited operations on websites. This may include using tools that allow circumvention of the security measures that are in place to deter automatic data downloads and/or ignoring explicit limitations on the allowable access to data that may forbid duplication and storage. Such avoidance may mitigate rish of breach of website contract claims.
- Use the target website’s relevant Application Programming Interface (API). An API is a set of procedures and communication protocols that provide access to the data of an application, operating system, or other services. Because APIs are controlled by the owner of the dataset in question, this approach may give an entity seeking relevant data clear access to the owner’s publicly available data for free or at a set price.
- Respect the robots.txt file on the target website. A robots.txt file will contain instructions on how bots should treat a site when they access it. By disregarding the robots.txt file, an entity may overload the target website with requests and cause the website to kick the bot off the website. If the scraping entity does not follow such instructions, the scraping entity could cause physical harm to the computer network by consuming a significant portion of the target website’s capacity, which may lead to a potential trespass to chattels claims.
- Adhere to restrictive actions taken by the target website. This may include the use of CAPTCHAs, rate limits, and/or blocking of IP addresses. By reproducing a portion of a website’s database deemed to be a trade secret, a scraping entity may be subject to unlawful misappropriation of trade secret claims.
- Respect cease-and-desist letters. While the Ninth Circuit has not opined on whether the formal revocation (in the form of a cease-and-desist letter) of hiQ’s access to LinkedIn’s information is enough to close the gate on hiQ’s access to public information, it is prudent to be aware of this unanswered legal question.
- Avoid collecting personally identifiable information. This may include personally identifiable information collection and use, such as the E.U.’s General Data Protection Regulation (“GDPR”) and the California Consumer Privacy Act of 2018 (“CCPA”). If the scraping entity collects personally identifible information, it may be in violation of privacy statutes – certain of which include a private right of action.
- Avoid bypassing or deceptively creating access permissions to gain access to a computer network. This may include websites with username and password requirements. Otherwise, the scraping entity could be accessing information which may no longer be deemed “publicly available.” Additionally, web scraping that simulates organic human use of the website may constitute a fraudulent misrepresentation and may lead to potential fraudulent misrepresentation claims.
- Avoid using scraped data for commercial purposes. This may include reproducing scraping information on other website(s) and/or using such information for commercial gain. If the scraping entity uses scraped data for commercial purposes, the scraping entity may be subject to the Digital Millennium Copyright Act (“DMCA”) claims that protects the creative selection, coordination, and arrangement of information and materials forming a database or compilation.
We will continue to monitor the hiQ case as well as other web scraping claims brought under the CFAA, and provide additional developments as they become available.
*Carina Arellano is a summer associate in Snell & Wilmer’s Phoenix office, working under the supervision of Tony Caldwell. She is anticipated to graduate from the Sandra Day O’Connor College of Law in May 2022.
- 18 U.S.C. § 1030
- 18 U.S.C. § 1030(g)
- 593 U.S. __ (2021)
- 938 F.3d 985 (9th Cir. 2019); No. 19-1116, 2021 WL 2405144 (U.S. June 14, 2021)
©2023 Snell & Wilmer L.L.P. All rights reserved. The purpose of this publication is to provide readers with information on current topics of general interest and nothing herein shall be construed to create, offer, or memorialize the existence of an attorney-client relationship. The content should not be considered legal advice or opinion, because it may not apply to the specific facts of a particular matter. As guidance in areas is constantly changing and evolving, you should consider checking for updated guidance, or consult with legal counsel, before making any decisions.
The material in this newsletter may not be reproduced, distributed, transmitted, cached or otherwise used, except with the written permission of Snell & Wilmer.