Google Safe Browsing is enabled by default in Brave Browser. Brave’s documentation explains how Brave has made the feature more privacy-preserving. One thing I couldn’t find info about, and Keith on Twitter thinks might be wrong, is the browser’s own information about how the feature works.
According to Brave’s settings pages, “if a page does something suspicious, URLs and bits of page content are sent to Google Safe Browsing”. Is this true? Does Brave really send bits of the actual page content to Google or is the text just a leftover from Chromium?
@karlemilnikka let me edit. I didn’t click on your links and was providing some of the same details you linked to. But if you look through in detail, I don’t see where it says it shares content. It has hash data and it sends some metadata when downloading. So what is it you’re looking at?
I guess should say, what areas specifically?
Thanks for the reply. It’s not mentioned in the support documentation, only in the browser settings (brave://settings/security). See the text I highlighted in the screenshot.
1 Like
A privacy-preserving system
Safe Browsing in Brave has the following privacy properties:
- URLs are never sent to the Google-operated server.
- The vast majority of website visits do not lead to server requests.
- On desktop, the browser does not connect to the server directly; instead, it routes through a Brave-operated proxy server so that Google servers never see your IP addresses.
At the core of Safe Browsing are lists of dangerous sites or files. Rather than submitting full URLs to a server, the protocol is structured so that the bulk of these checks are done locally using lists downloaded from the Safe Browsing server.
When a user visits a website with a URL of phishing.example/
:
- The browser will turn the URL into
bac52b0b455d4b0435379a9cb61d43cd54bcd0f17ff0a5477b2598373fd7b997
using the SHA-256 hash function.
- Then it will truncate this “hash” to
bac52b0b
and look it up against its local copy of the Safe Browsing lists.
- If this hash prefix is found in a list (and only in that case), the browser will ask the Safe Browsing server for the list of all full-size hashes that start with that prefix (only the prefix is sent, not the full hash).
- Finally, the browser will compare the hash of the website URL with the full hashes it just received and show a warning page in case of a match.
and then:
In addition to the URL-based checks described in the previous section, downloaded files undergo a second check after they’re fully downloaded by the browser, but before they’re made available to the user. This check involves sending metadata about the downloaded file (but not the file itself) to an application reputation service operated by Google. This service returns a verdict which determines which warning (if any) will be shown to the user.
I’m not familiar with coding but I ran the metadata portion through ChatGPT and asked what it collects. It shared as below:
This code defines a protocol buffer message (ClientDownloadRequest) that collects a wide range of metadata about a file download. In summary, it gathers:
- Download Information:
- URL: The final download URL after any redirects.
- File Length: The size of the file in bytes.
- Download Type: An enumeration indicating the nature of the file (e.g., Windows executable, Chrome extension, APK, etc.).
- Integrity and Security Checks:
- Cryptographic Digests: SHA-256, SHA-1, and MD5 digests for verifying file integrity.
- Digital Signature Information: Details on certificate chains, signed data, and extended attributes that help validate the authenticity and trustworthiness of the binary.
- Certificate Chains: A complete chain from the signing certificate to a trusted root, similar to what is used for SSL/TLS or code signing.
- Resource Tracking:
- Resources and Redirects: A repeated list of resources that include not only the download URL but also any intermediate URLs (redirects, tab URLs, plugin URLs) encountered during the download process.
- Network Details: Optional information like remote IP addresses and referrer URLs.
- File and Binary Specific Metadata:
- File Basename: The intended file name on the disk.
- Image Headers: Detailed headers for Windows Portable Executable (PE) files and Mac OS X Mach-O binaries, which can include headers, section data, and debug information.
- Archived Binary Details: Information about binaries or archives within an archive, including whether they are encrypted or executable.
- Additional Context:
- Locale: The locale of the device (e.g., en_US).
- User Interaction: A flag to indicate if the download was initiated by the user.
- Advanced Processing Info: Metadata for document processing (such as detecting macros in documents) and summaries of archive contents.
- Tailored Information: Data used by the server to provide customized safe browsing warnings or verdicts.
Overall, the structure is designed to collect comprehensive details not only about the file being downloaded but also about its source, security attributes, and contextual metadata—all of which can be used to analyze the file’s integrity, trustworthiness, and potential security risks.
Ah, okay. I believe this is related to the metadata being shared.
It includes details like the URL, file size, and other attributes, which might technically be considered “some content.” However, based on what I have seen, it does not appear to expose any actual page content. Also, this metadata sharing only happens when downloading something, not when simply visiting a site.
That said, @Mattches or @fmarier would probably have a more definitive answer. I just wanted to do some research and share what little I could. Keep in mind that I am just another user with limited knowledge on the subject.
1 Like
François Marier (@fmarier) replied to me on Twitter.
The highlighted string is wrong (we need to fix this). It comes from Chromium which also has client-side phishing detection. That’s what’s described in the highlighted sentence. We disable that part of Safe Browsing in Brave.
1 Like
Thanks for providing the update. I’m glad you asked and we were able to have a conversation because it’s definitely educational and good to know