I am not sure what the confusion is about since it is a very common situation on the web today. It also happens on your phone. An example of picture-in-picture is when you are trying to navigate to your messaging app because someone has texted you but the video you are watching in youtube, instead of minimizing in the background floats on top of your screen and continues to play. This, on a cell phone is the devil’s own work, because why obscure half the screen or more when I am obviously trying to do something else.
Another example is when you are using a navigation application and you need to minimize it. The app assumes you have not idea what you really want and keeps a screen on the front/top layer of the UI instead of allowing you to minimize it fully. This is seen on CNN and other websites with imbedded video. This is not the same as autoplay, although many of these videos also autoplay and I find the picture-in-picture settings allow it to autoplay even when you have it set specifically to stop.
The solution on a cell phone is to disable the picture-in-picture option in the settings - which of course has to be done in the most problematic way - each application individually.
However, this post is referring to the desktop browser behavior, which I am also having a problem with which is why I came across the topic. The problem is that when you scroll down a page the video at the top of the page - if it is playing or not - follows you down the page. It is like a floating CSS frame that is static on the page. It generally populates in the lower corner of the screen but it is a nuisance on the same level as pop up advertising and I am frustrated that the concept is so foreign to everyone answering these posts because it means that no one from the Brave development team is paying attention to this and I feel it is a big deal. It can be used for browser hijacking and malicious code. It is, I feel, one of the things Brave as a browser was designed to stop.
I have included two images so that you can see the actual frames we are referencing and hopefully this clears up any confusion.