Every move you make on the internet is tracked. Whether it's for legit website analytics or shady user profiling, websites know who you are. It's creepy, I know. But have you ever wondered how all of this happens in your browser? In my other article, I outlined the things involved in online tracking. In this article, I'll try to explain how tracking happens on your browser.
Not your old school newspaper
To most people, reading a page on the internet is analogous to reading a newspaper. To read the news on newspaper, you pick up a newspaper and start flipping. To read the news on your phone, you pick up your phone and start swiping. They're done pretty much in the same way so they must be the same, right? Well not really.
The web is powered by the HTTP protocol. You can think of the HTTP protocol as a request-response protocol. That is, in order to get content, you'd have to ask for that content first. For instance, the web browser you're using right now had to make a request for this page and every single thing on this page just to show you the content you see right now.
It's more than just a request
The interesting thing about HTTP requests is that it's more than just asking for content. It's also sending a lot of metadata. Take the following snippet for example. This I got from Firefox's Developer Tools when I searched for "cats" on Google's home page search box (potential identifier values replaced with
GET /search?source=hp&ei=xyz&q=cats&btnK=Google+Search&oq=cats&gs_l=xyz Host: www.google.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br Referer: https://www.google.com/ DNT: 1 Connection: keep-alive Cookie: 1P_JAR=xyz; NID=xyz; OGPC=xyz Upgrade-Insecure-Requests: 1 Pragma: no-cache Cache-Control: no-cache TE: Trailers
It's definitely more than just cats.
The browser normally sends some default data. Things like which browser it is (
User-Agent), where the request came from (
Referer), how it wants the content constructed (
Cache-Control), how it wants to receive content (
Accept-*), how it wants the connection be treated (
Then there are the extra things like cookies, query strings, headers, and request bodies. These are added to the request either by hard-coding them in code, by forms and scripts on the page, by software and extensions installed, or even by web servers.
Getting to know all about you
Tracking starts with the server giving your browser a universally unique ID. This is given to you, via a cookie in the response, during the very first request you make to the tracking server. From there, future requests to the tracking server will carry tracking data together with this ID. Basically, data is tied to an ID which is tied to your browser and by extension, you.
At the bare minimum, trackers want to know your movement across the internet. This information is obtained from the
Referer header. Browser and OS usage is obtained from the
User-Agent header. IP addresses can be obtained from
Some trackers provide APIs to allow developers to programmatically report custom data. For instance, Google Analytics allows the reporting of "pages" and "events". Page Tracking is used to report page visits or views on sections of an app. Event Tracking is used to report user interactions.
Of course, anything can be taken to the extreme. EFF has the tool called Panopticlick which simulates browser fingerprinting. To accomplish this, it gathers several pieces of data from your browser in order to generate the fingerprint. This list of gathered data shows just how much information browser APIs reveal about your device.
E.T. phone home
The last leg of the race involves sending all of this gathered data to a tracking server via a request. As mentioned earlier, any resource can make a request. However, there are certain aspects that make certain resource types less than ideal.
Pop-up windows and iframes are annoying and bad for user experience. Scripts and styles require parsing and can break if errors are thrown. Fonts, audio and video are heavy. They're also fairly recent tech so they have varying browser support. XHR is subject to the Same-Origin Policy unless you make the extra effort to setup Cross-Origin Resource Sharing. This leaves us with images.
// All it takes is this one line to send a loaded request (new Image()).src = 'https://tracking-domain.com?d=' + stringLoadedWithData
The concept of tracking is simple: a request loaded with metadata sent to a tracking server. Do this to every user on the internet on every single page they visit, then you have yourself a data collection monster. Pixels anyone? Anyone?
Hopefully this article gave you insight on how tracking happens on your browser. As always, if you have comments or suggestions, feel free to drop a line.