API output file question - Differentiating suspended sites

Hi all! @Mattches … Not sure who to tag in this question. Long post, sorry…

I use the API output file provided at https://publishers-distro.basicattentiontoken.org/api/v1/public/channels

While processing the data, I am flagging 404 NOT FOUND, 403 Forbidden (for manual review), Account Suspended, 500(x) unavailable (manual review), and INVALID_HTTPS errors.

About 5%-10% of sites are being flagged for serious reasons.

My question is that if the API file contains so many websites that are in violation of publisher TOS, how do I know what sites are “suspended” internally. OR is the api file only including approved channels and if a channel is removed, is it removed from the API file?

There are going to be thousands of sites in clear violation of the Publisher TOS (many of whom just wanted the referral ID I presume).

How you handle it is up to you, I just want my data to be accurate and inclusive only of sites who meet publisher requirements.

You raise a good point. Essentially we don’t have any process that goes and checks to see if a site is still verified after its initial creation. That would get us into some issue with some DNS verified checked sites.

To answer your question

how do I know what sites are “suspended” internally?

We don’t currently expose this because suspended accounts have the possibility to be unsuspended and should still be able to receive tips and contributions.

is the api file only including approved channels and if a channel is removed, is it removed from the API file?

Correct. This channels endpoint is used by the browser to essentially determine which sites are verified publishers and can receive tips to their publishers account.

Some information I extended to Maxence who runs batgrowth.com is that we will soon be only showing the verified websites who have completed KYC as a publisher (due to legal requirements with facilitating tips). I can’t give an estimated date of when this will take place because we’re still working through the messaging, and PR but it will happen before EOY.

1 Like

@cory in the api file there are two true/false attributes (just after domain). It appears that the first one is false, then the publisher channel is not active.

I show approximately 37000 sites, 15000 are “not” verified publishers (if my findings are correct).

I will just go with what we have and when api v2 is out i will be using that :+1:

I show approximately 37000 sites, 15000 are “not” verified publishers (if my findings are correct).

Ah yeah, that is the case. That’s to prevent auto-contribute from contributing to banking, government websites, etc.

We talked about this at Brave yesterday because there was a user who raised the concern on Twitter that this is still happening for some websites that should be automatically excluded. I think we’ll automate this but you can see the total list here. https://github.com/brave-intl/publishers/blob/staging/config/excluded_site_channels.yml

Well… BraveDB.com has reports for slow/404 not found/BAD SSL… I guess it makes sense to track excluded sites and make that searchable.

Thanks for the info/reply