Feature request: de-rank / unlist / hide paywalled pages

Problem

Sadly, the Internet is becoming less and less open it seems. More and more websites are requiring logins / accounts to view their content. It is getting so egregious that probably about 20% of all websites I visit from the results of Brave Search are paywalls / login prompts instead of the content shown in the search result preview. (I am not blaming the Brave search algorithm, to be clear; this may be the case with Google et. al. as well.)

Solution

It would be awesome if Brave Search implemented a feature to detect, de-rank, or filter out search results that show a login prompt or paywall instead of the content indexed in the search engine result. Even if the results included some sort of “Might be Paywalled / Requires Login” tag next to each such result, that would be an incredible user experience improvement.

Alternative Solution

I don’t know how websites manage to get indexed when they have paywalls or login prompts, but I imagine they strategically show the content without these obstructions when the User-Agent header (or something else) indicates that they are being viewed by a search engine crawler instead of a normal browser.

If this is the case, maybe the Brave browser itself could detect this scenario and retry the navigation to the indexed URL with a spoofed User-Agent header and render that result to the user’s browser. (Or at least ask if the user wants this.)

@jonathanwilbur, this is a great request, and perhaps Brave will consider it in the future.

In the meantime, if you’re unaware, users can create their own Goggles to handle things like this. It’s likely something that would need to be managed by users rather than Brave.

Here are some resources on how to create and use Goggles:

Brave is committed to not reranking or censoring results directly. They provide the most relevant results from their database, and it’s up to us as users to use tools like Goggles to customize or rerank those results.

I’m hoping they simplify the process of creating Goggles in the future, but for now, this is essentially what Brave has communicated when similar questions have been raised.

I understand the principle of not censoring results, but I think more is going on than merely a text-matching when I search for something already: for instance, when I perform a search, I start off in using “Moderate” Safe Search, which presumably filters out some vulgar results or results with malware. So it seems like there is already a little “censorship” happening at some level, if you want to call it that; but it is understood that this is desirable as a default and you can turn it off. I think it would be objectively desirable to add demotion of “walled” results to the default ranking algorithm, so users do not have to use Goggles to “hack” these results away.

That said, however, it sounds like Brave would be more open to merely tagging search results that use paywalls / login walls and defining Goggles rules whereby these can be re-ranked. I looked into Goggles and I don’t see a feature where I could do this, but it seems like it shouldn’t be too complicated to add it there. But adding recognition of this to the crawler itself and to the existing data might be a challenge…

While speaking of that, I do want to also remind that Brave doesn’t use a web crawler like people tend to think of when thinking of existing search engines like Google. Essentially we Users are the crawler, so long as we opt into it. If you’re not familiar with what I’m speaking of, you may want to check out Brave’s article on Web Discovery Project

Essentially it’s like they discuss at https://brave.com/search/#censorship

  • Does Brave Search filter, downrank, or censor search results?

No, Brave Search does not filter, downrank, or censor search results. Nor will we change our search algorithm to increase or decrease the prominence of results in response to current events or anyone’s political, religious, ethical, or other beliefs. Brave Search—like Brave itself—is intended to be a user-first portal to the Web, free of Big Tech’s manipulation.

However, there is one exception to this rule: We do need to comply with laws governing search engines, including CSAM, copyright takedown (DMCA), right to be forgotten (GDPR), and nation-state orders.

Also note that, if you’ve chosen to enable it, Brave Search can check Google for fallback mixing in your browser. If you’ve enabled fallback mixing, and a result is censored, filtered, or re-ranked in Google, those changes would pass through to our results. You can easily see how often a third-party result is mixed (via our independence score), and our aim is to gradually reduce this mixing over time.

There’s nothing existing. We would have to create it. The key here is someone would have to know all websites that are paywalled and compile a list. Then you’d use the syntax to create a Goggles list. For example, you can take a look at a Goggle someone setup to Downrank State Media - Germany at https://search.brave.com/goggles/profile?goggles_id=https%3A%2F%2Fraw.githubusercontent.com%2FAlpSantoGlobalMomentumLLC%2FBraveGoggles%2F4ec6a398154f14b6eebbe71f20b2a5b68f065bfb%2Fde_lib_conservative.Goggles

They boost some sites, such as:

$boost=10,site=ef-magazin.de
$boost=10,site=misesde.org

Then downrank others, such as:

$downrank=3,site=sueddeutsche.de
$downrank=3,site=zeit.de

I don’t think any of this would be easy to track automatically. Especially as it would have to figure out how much requires a paid subscription compared to which websites may just have content locked behind a free account or something.

I see. I guess the indexes are receiving the previews from people that are logged in.

I still think it would be generally desirable for Goggles itself to have some kind of “tags” / “labels” feature so that you can filter on those tags. Things like HasPaywall, RequiresLogin, VulgarSpeech, Violence could be filtered out generally rather than users having to have a “list of every paywalled page.txt.” This would be useful for more than just the thread’s use case.

But it does seem like creating a domains list is a somewhat viable technique until then.

Thank you for all your help!

1 Like