The document was partitioned into several topic blocks through parsing the document into DOM(Document Object Model) tree and comparing the semantic similarity.
英
美
- 对页面源文件进行解析时,利用文档的结构信息生成DOM树,并在此基础上划分文档主题。