Data w/ Storage Rights TOS

Our team is interested in using Brave’s search results to build a dataset for training open-source ML models. You offer a “Data w/ Storage Rights” subscription, which seems to imply we can do what we want with it, but the terms of service that we’re pointed to does not give any clarification on this plan:
https://api.search.brave.com/app/documentation/general/terms-of-service

Could you help clarify what the acceptable use is?

  1. Can we store the data?
  2. Can we train an ML model with the results of the data (i.e. the webpages)?
  3. Can we release said ML model to the public?
  4. Can we release said data to the public?

@oumi-balerion let me start by saying I don’t work for Brave and am very limited on anything here. I’m going to tag in @steeven and @Mattches to see if either can help on it, but want to ask if you tried to email [email protected] ad the website instructs?

Then in the interim at least want to make sure that you have seen all the information below.

https://brave.com/search/api/ does speak a bit as you have mentioned, that says you’d have rights to store data, use for AI interface, etc:

Plus there’s:
https://brave.com/ai/using-brave-search-api/ and https://brave.com/ai/what-sets-brave-search-api-apart/ if you hadn’t browsed through them.

But for any conclusive answers, I’d say try the email or wait to see if either of the people from Brave I have tagged may be able to follow-up.

Hi, just to echo a few points from Saoiray

  • Can we store the data?

With the data storage plan option, yes.

  • Can we train an ML model with the results of the data (i.e. the webpages)?

The API responses can be used for ML training. ML training is becoming a popular use case for the Brave Search API.

  • Can we release said data to the public?

I am not 100% clear on how you intend to “release the data” from how the question was framed, but if there are specific questions related to this item and the terms of use, I recommend you connect directly with our Search API team at [email protected] who can go through questions directly regarding the specific use case this question os referring to. Our team works with our customers across an array of use cases and will be able to answer specifics of your use case with greater precision than I am able to here.

Hope this is helpful, and great to hear you’re considering using our API!

1 Like

@Saoiray @luke.mulks Thanks! Yes I’ve sent emails to that email thread over a week ago, no response yet.

Yeah the public documentation implies that we should be good, but the license agreement seems to be written in a way as if it the storage rights offering never existed.

Regarding releasing the data, I’m generally referring to collecting URLs related to a particular search query, filtering them down to specific websites, pulling those websites data (within their own license constraints), then publicizing the dataset which will be a combination of:

  1. Search Query
  2. URLs associated with the query
  3. HTML associated with the URLs

It’s not clear what the specific guidelines are related to 1+2 (Brave doesn’t own 3) for storage rights.

If there’s a way you can pass along that I’ve sent an email to the search team that’d be great!