# Web Discovery Project Overview
*Web Discovery Project* is a methodology and system developed by Brave but heavily inspired by Cliqz's [Human Web](https://0x65.dev/blog/2019-12-03/human-web-collecting-data-in-a-socially-responsible-manner.html), we recommend to check the blog post as additional material even though there are, and will be, significant departures as *WDP* evolves.
## Motivation
*Web Discovery Project* is a methodology and system developed by Brave to collect data generated by their users while protecting their privacy and anonymity.
Brave needs data to power is privacy search. This data, provided by Brave users, is collected in a very different way than typical data collection. We want to depart from the current standard model, where users must trust that the company collecting the data will not misuse it, ever, in any circumstance. We do not want users to have no other choice but to trust us. There are many ways a trust model can fail. Hackers can steal data. Governments can issue subpoenas, or get direct access to the data. Unethical employees can dig into the data for personal interests. Companies can go bankrupt and the data auctioned to the highest bidder. Finally, companies can unilaterally decide to change their privacy policies.
In the trust model, which is the industry standard, the user has very little control and protection comes only from privacy policy and enforcing bodies. We believe we must do better, if only for selfish reasons because we use our own products, and consequently, our own data is collected. We are not comfortable with only a promise based on Terms of Service and Privacy Policy agreements. It is not enough for us, and should not be enough for our users either. As someone once said, if you do not like reality, feel free to change it. The Web Discovery Project is our proposal for a more responsible and less invasive data collection.
## Fundamentals
The fundamental idea of the *Web Discovery Project* data collection is simple: **to actively prevent [Record Linkage](https://en.wikipedia.org/wiki/Record_linkage)**.
Record linkage is the ability to know that multiple data elements, e.g. messages or records, come from the same user. This linkage leads to sessions, and these sessions are very dangerous with regards to privacy. For instance, [Google Analytics data can be used to build sessions that can sometimes be de-anonymized by anyone that has access to them](http://josepmpujol.net/public/papers/big_green_tracker.pdf). Was it intentional? Most likely not. Will Google Analytics try to de-anonymize the data? I bet not. But still, the session is there, stored somewhere, and trust that it is not going to be misused is the only protection we have.
The *Web Discovery Project* is a methodology and system designed to collect data which cannot be turned into sessions once they reached Brave. How? By strictly forbidding any user-identifier that could be used to link records as belonging to the same person, considering not only explicit UIDs but also implicit ones. Consequently, aggregation of user's data in the server-side (on Brave premises) is not technically feasible, as we have no means to know who is the original owner of the data.
This file has been truncated. show original