What is: Context-aware Visual Attention-based (CoVA) webpage object detection pipeline?
Source | CoVA: Context-aware Visual Attention for Webpage Information Extraction |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Context-Aware Visual Attention-based end-to-end pipeline for Webpage Object Detection (CoVA) aims to learn function f to predict labels y = [] for a webpage containing N elements. The input to CoVA consists of:
- a screenshot of a webpage,
- list of bounding boxes [x, y, w, h] of the web elements, and
- neighborhood information for each element obtained from the DOM tree.
This information is processed in four stages:
- the graph representation extraction for the webpage,
- the Representation Network (RN),
- the Graph Attention Network (GAT), and
- a fully connected (FC) layer.
The graph representation extraction computes for every web element i its set of K neighboring web elements . The RN consists of a Convolutional Neural Net (CNN) and a positional encoder aimed to learn a visual representation for each web element i ∈ {1, ..., N}. The GAT combines the visual representation of the web element i to be classified and those of its neighbors, i.e., ∀k ∈ to compute the contextual representation for web element i. Finally, the visual and contextual representations of the web element are concatenated and passed through the FC layer to obtain the classification output.