Scraping: Is It Allowed to Get Data From The Internet?
Jul 27th - 3min.
Can we automatically "scrape" data from the internet without restrictions? Is it all there for the taking? Are there any laws we need to take into account and how ethical is web scraping?
Now that you are familiar with the term scraping, we can go a step further and ask ourselves an important question. Can we just 'scrape' data from the internet automatically and is it all up for grabs or are there restrictions that we have to take into account? Do you need some refreshment about the basics of scraping? Then read “What is scraping? Automatic data collection from the internet ”.
Automation: a trick to work faster
Scraping is a loophole to retrieve data faster. With scraping you do nothing but replacing manual actions or a person with an automated (ro)bot. So, we can also put this question differently. May you manually retrieve and copy information from a website? And are you allowed to create a list of leads 'by hand'?
Yes, in principle we could have obtained the same information manually, which is crucial in scraping: we replace human actions with a bot that automatically performs manual actions. Most of the information on the internet is public.
However, we must make an important side note here, the intention with scraping is very important. What you plan to do with the data after you’ve retrieved it is decisive. Copying a recipe from a website to forward to a friend is no problem, but putting it on your own website and asking money for it isn’t okay! The copyright law applies to this as well as to databases.
Two laws that pull the strings: copyright and GPDR
In fact, we have to take into account two different laws: the copyright and GDPR regulation. We will quote an example to clarify this. We scrape LinkedIn to generate leads. We forward these leads to our own CRM system. In this case, we can say that copyright legislation no longer applies, because we do not sell this data.
We still need to pay attention! In principle, it isn’t allowed to send a message to these people in accordance with GDPR legislation. The General Data Protection Regulation doesn’t allow you to just use scraped data. When it comes to personal data, you’re even not allowed to own this data.
Even when it comes to published data such as on a Facebook profile it doesn’t mean that you can copy this data for your own use. This information can be placed under certain permissions. In short, there are laws in place that oblige us to be careful.
Case study: start-up wins scraping lawsuit against LinkedIn
Scraping is and remains a gray area.
In some cases, you can even win a lawsuit against a giant, such as LinkedIn. At least that’s what the American start-up hiQ succeeded in. The social medium's intention was to restrain the start-up by forbidding collecting public information from LinkedIn and analyzing it. But that went wrong quickly. The San Francisco judge ruled that LinkedIn was just trying to boycott a competitor. They therefore imposed LinkedIn to remove all restrictions that hiQ had given.
The start-up, on the other hand, was allowed to continue to scrape and analyze all public information. “Based on profile embellishments, hiQ warns other companies that they might expect idle in the short term. The start-up says that they don’t tip employers in response to changes in individual profiles, but rather to look at the big picture. ” (source: De Morgen (Staalduine, 2017))
The legal system threatens to fall behind
Whether you’re allowed to scrape or not is difficult for a judge to evaluate. Usually, they have no technical background and don’t really understand what indexing or crawling data entails. They can, however, pass judgment on infringements of the copyright law.
To date, there is no explicit legislation on scraping. The GDPR protects the processing of personal data but doesn’t impose any provisions for scraping. And that, while scraping or crawling is practically as old as the first search engine. Justice is hopping behind (again), luckily we’re already used to that. Even the GDPR legislation was called obsolete by some even before it came into effect.
The real influence of the GDPR
The GDPR legislation limits a lot of scraping possibilities to download data sets. Originally, these regulations were created to limit companies such as Google and Facebook. The opposite happened in reality. The result is that giants such as Facebook have adjusted their “terms & conditions” a few inches and that’s it. Local companies, on the other hand, are the real victims of this legislation. GDPR has actually ensured that monster companies have gained even more power.