Our team is interested in using Brave’s search results to build a dataset for training open-source ML models. You offer a “Data w/ Storage Rights” subscription, which seems to imply we can do what we want with it, but the terms of service that we’re pointed to does not give any clarification on this plan: https://api.search.brave.com/app/documentation/general/terms-of-service
Could you help clarify what the acceptable use is?
Can we store the data?
Can we train an ML model with the results of the data (i.e. the webpages)?
@oumi-balerion let me start by saying I don’t work for Brave and am very limited on anything here. I’m going to tag in @steeven and @Mattches to see if either can help on it, but want to ask if you tried to email [email protected] ad the website instructs?
Can we train an ML model with the results of the data (i.e. the webpages)?
The API responses can be used for ML training. ML training is becoming a popular use case for the Brave Search API.
Can we release said data to the public?
I am not 100% clear on how you intend to “release the data” from how the question was framed, but if there are specific questions related to this item and the terms of use, I recommend you connect directly with our Search API team at [email protected] who can go through questions directly regarding the specific use case this question os referring to. Our team works with our customers across an array of use cases and will be able to answer specifics of your use case with greater precision than I am able to here.
Hope this is helpful, and great to hear you’re considering using our API!
@Saoiray@luke.mulks Thanks! Yes I’ve sent emails to that email thread over a week ago, no response yet.
Yeah the public documentation implies that we should be good, but the license agreement seems to be written in a way as if it the storage rights offering never existed.
Regarding releasing the data, I’m generally referring to collecting URLs related to a particular search query, filtering them down to specific websites, pulling those websites data (within their own license constraints), then publicizing the dataset which will be a combination of:
Search Query
URLs associated with the query
HTML associated with the URLs
It’s not clear what the specific guidelines are related to 1+2 (Brave doesn’t own 3) for storage rights.
If there’s a way you can pass along that I’ve sent an email to the search team that’d be great!