Is PageGraph branch's basic Graph representation feature completed or still under development?

Hi,
I am following https://github.com/brave/brave-browser/wiki/PageGraph.

In the docs it says “PageGraph’s name comes from the graph-based representation of the document’s execution it builds in memory. Every relevant event in the document (a node being created, a network request being triggered, a script being executed, etc.) is recorded in the graph, noting both the relevant event, and the event’s cause.”

I build this project and exported graphml file for a webpage but could not see the graph representation of DOM’s execution feature by opening the .graphml file in yed graphviewer/gephi/other graphml file viewer Application

  1. Have I made any mistake in building brave-browser or exporting .graphml file or opening the .graphml file in appropriate application?
  2. Or Is this branch code base is underdevelopment and the graph based representation of DOM’s execution feature is not completed yet?

Can you please share more resource to learn about the pagegraph feature?

Hi @alamin19,

Welcome to community, and thanks for writing in!

CC’ing my colleague @pes.

2 Likes

Hi @alamin19!

Excited to hear about your interest in PageGraph!

PageGraph is working well, and while its not 100% feature complete (it doesn’t handle, for example, module scripts, and will ASSERT hard if it hits functionality it can’t correctly attribute, we see on ~5% of websites) its been very useful and we’re using it for a variety of ongoing research projects.

I couldn’t say whether you’ve exported the file correctly or built things correctly from the details provided, but if you can send me the graphml file you’ve generated i’d be happy to take a look at things and see what I can tell from there.

Also, your message is a good nudge on my end that I should clean up the tooling we’re using to extract information from the graph. I expect you have all the data, since those graphs grow very quickly, it may be tricky to understand how the graph encodes different operations.

Hi @pes,
Thank you for your reply.

What I Need:

  1. Graph-based representation of an HTML page ie. DOM.
  2. Every relevant event in the document (a node being created, a network request being triggered, a script being executed, etc.) is recorded in the graph, noting both the relevant event, and the event’s cause.

What I did:

  1. followed the instruction here and build the brave browser.
    https://github.com/brave/brave-browser/wiki/PageGraph
  2. Loaded a very simple custom made HTML page and google.com home page in the brave browser.
  3. Exported .graphml file from the brave browser in the following steps:
    Right-click on the webpage => Inspect => Developer console window => Under element menu there is a sub-menu page graph => In page Graph sub menu there is button “Save full page graph” which exports a .graphml file.
  4. I tried to open this .graphml file using yED graph editor tools and got an error. [ Attachment below]
  5. I was able to open the .graphml file in gephi tool but the output was not what I was expected. It did not contain the details that I needed.

What was my expectation:
After exporting the .graphml file I will be able to open that file using a visualizing tool and I will be able to visualize the graph with many details like Every relevant event in the document (a node being created, a network request being triggered, a script being executed, etc.) is recorded in the graph, noting both the relevant event, and the event’s cause.

Attaching:

  1. Sample HTML file. loaded in the brave browser in the localhost.
  2. Exported .graphml file for the sample HTML
  3. Exported .graphml file for the google.com home page
  4. An error showed in yED editor tools when trying to open the .graphml file
  5. Image of the graph shown in gephi tools. [This tool’s graph representation does not contain enough details that I need]Hi @pes,
    Thank you for your reply.
    Uploading these files in google drive because I can not upload .graphml file in here.

https://drive.google.com/drive/folders/1l5GKp9iZvengLakuhBuwFfpZka7Knr3N?usp=sharing

Could you please tell me what I am missing here?

Hi @alamin19,

I tried opening the file you referenced in yED and it seems to open fine to me. yED gives annoying errors about not supporting longs and truncating them to uints (despite the graphml format supporting long) but its not a issue unless you have some really really really wild graphs (i’ve never seen a graph id approach MAX_INT).

I’ve attached a screenshot of yED’s rendering of the graph.

The graph you attached includes all the event you described.

Nodes being created are depicted as edges of “edge type” “create node”, with the edge leaving the thing doing the creating (i.e. the parser or a script unit) and pointing at the node being created.

Event registrations are encoded as edges with “edge type” “add event listener”, with the edge leaving the script registering the event, and pointing at the DOM node having the event registered on it.

Script executions are encoded as an edge with “edge type” “execute”, leaving the HTML script element (or similar) that “owns” the code, and pointing at the Script Node (representing code independent of how it got on the page), etc.

Hope this helps! If you have further questions, just let me know