Segmentation
Segmentation module divides a web page into smaller parts, based on both tag properties and some visual cues. It is an extended version of Vision Based Page Segmentation Algorithm (VIPS). Our main contribution with segmentation module is to extend both tag set and rule set. When a web page is segmented by this module, it generates a block tree of the page. This tree is structurally similar to the underlying DOM tree, but it includes the hierarchy of visual elements on web pages.
Further Reading: eMine Deliverable 2 and eMine Deliverable 4
Available at: ACTF – eMine Page
Heuristic Role Detection
After segmenting a web page, heuristic role detection module labels the visual elements with their corresponding roles in the page. The system includes a rule generator, which converts the attributes of the roles in the knowledge base to rules, and a role detector which applies these rules on the visual elements to detect the role. The attributes of the roles are stored in an ontology. The system also uses Jess Rule Engine in role detection.
Further Reading: eMine Deliverable 3 and eMine Deliverable 4
The source code will be available soon via ACTF, but in the meantime, if you would like to access this component, please get in touch with us.
eMine Scanpath Analysis Algorithm
eMine Scanpath Analysis Algorithm takes a set of scanpaths and return a scanpath which is common in all the given scanpaths, by trying to find the most similar two scanpaths in the given list. It does this by using the the Levenshtein Distance which is the traditional String-edit algorithm. Then it removes these two scanpaths from the given list of scanpaths and introduces their common scanpath to the list of scanpaths given originally. This continues until there is only one scanpath. It also generates a JSON file which stores the common scanpath of a web page.
Further Reading: eMine Deliverable 5 and eMine Deliverable 6
The source code will be available soon via ACTF, but in the meantime, if you would like to access this component, please get in touch with us.
eMine Transcoding Module
As the final step of eMine project, transcoding module combines segmentation and role detection modules with the outputs of the eMine Scanpath Analysis Algorithm. The transcoding application provides two different modes. In the first mode, the web page is transcoded with respect to the common scanpath of the web page which is loaded by the user as a JSON file. This file consists of the block names in the common scanpath. The other mode is based on the roles of visual elements. Using a general role path, the visual elements are grouped under their roles and transcoding application follows the order of roles in the common role path.
Further Reading: eMine Deliverable 7 and eMine Deliverable 8
The source code will be available soon via ACTF, but in the meantime, if you would like to access this component, please get in touch with us.