Feature Request - Custom Context Length

I’m currently facing a challenge that I believe many of you might find intriguing or share similar frustrations. When using the Brave browser integrated with my custom ollama server setup for web page queries via Leo UI, I encounter an issue where the context length is automatically cropped by default.

This crop affects not only user experience but also impedes my RAG (Retrieval Augmented Generation) middleware efforts to enhance response quality on these web page queries within a single platform.

The goal here is simple – we need users of Brave browser with Leo integration the ability to decide and control how much context they want their queries should include when performing web page lookups, without inadvertently cutting off essential information that would otherwise enrich responses from both ollama server (our conversational model) as well as RAG supported pipeline end points.

6 Likes

i ran into this limitation when summarizing a pdf with about 74k characters (~20k tokens) when the model i’m using supports 128k tokens. this would be a great feature and honestly a no-brainer if you’re gonna support custom models

2 Likes

I was bitten by this today when attempting to summarize a medium-length blog post using ollama and qwen2.5, which has 128K context. I was surprised that Leo complained that only 56% of the content was able to be summarized.

Adding an optional “Custom context length” field to the “Bring your own model” form makes sense to me:

2 Likes

please be assured, that i go +1 with that request. I would so badly need such a feature as it also happens from time to time, that the cookies stuff on the site also counts into the limit so that there is only limited useful “payload” getting posted to the llm because of the char limits.
I would really appreciate your effort in this regard.