Go Back

An API as a Tool against Scraping

Aug 24th - 2min.

An API As A Tool Against Scraping

Giants like Facebook don't want other companies to index them, because they can't control that. To avoid scrapers, they will consciously share their data with you via their public API or Application Programming Interface.

Giants such as Facebook don't want other companies to index their platforms because they can't control this. To avoid scrapers, they will consciously share their data with you via their public Application Programming Interface (API). We will explain the advantages and disadvantages of a public API in this blog. Next, we will introduce you to a private API.

The pros and cons of a public API

An API offers stable information, is well documented and, unlike scraping, retrieving information from an API is not a cat-and-mouse game. In most cases, there are clear rules regarding the information and data available on APIs. For example, you can only store data from Instagram for a limited time. Using a public API is the most legal and "clean" way of scraping.

The biggest issue with solely relying on a public API is that companies choose to share their data, if they no longer want to share their data, you're out of luck. A company can shut down its API whenever they want. Some businesses (completely) depend on API's from others, such as plugins for web browsers and social media tools. Building your business entirely on another company's API is never a good plan!

There are ample stories about companies misusing the power of their API. For example, there would have been an app, similar to Buffer, that was gaining popularity. When the company refused to sell itself to Facebook because they also integrated with Twitter and preferred to remain independent, their access to the API was denied. This, of course, is a disaster for social media tools.

Private VS public API

An app like Facebook, for example, also uses an API to connect its own microservices and request data. In this case, we’re talking about the private API. This is an endpoint that contains data, which Facebook doesn’t want to share with you and of which no documentation is available. Your application, on the other hand, will integrate with the public Facebook API. To clarify this structure, we created the following scheme:

You could compare the difference between a private and public API with the phone number of a CEO and customer service. Everyone can find and call the customer service number, it doesn’t matter that customer support receives many calls. A CEO, on the other hand, shouldn’t be easily interrupted and doesn’t give his/her number to everyone. In this example, the connection between customer services and the CEO is the private API. The customer support phone is the public API.

In some cases, the public API is too limited. The goal of scrapers is to intercept the private API in order to retrieve more data. Information and data from a private API is super stable, compact and clear. It is like intercepting the number of the CEO.

Facebook (API) for developers

You can just Google "Facebook API" and you will automatically find "Facebook for developers". Certain parts of their API can be used without permission, such as the integration of a login function. For other functionalities you need permission, so you will need to pass a verification process. Large Belgian players such as Proximus, for example, also share their (chosen) data via their API.