Abstract | Automatically assessing the quality of health related Web pages is an emerging method for assisting consumers in evaluating online health information. We propose a rule-based method of detecting technical criteria for automatic quality assessment in this paper. Firstly, we defined corresponding measurable indicators for each criterion with the indicator value and expected location. Then candidate lines that may contain indicators are extracted by matching the indicator value with the content of a Web page. The actual location of a candidate line is detected by analyzing the Web page DOM tree. The expression pattern of each candidate line is identified by regular expressions. Each candidate line is classified into a criterion according to rules for matching location and expression patterns. The occurrences of criteria on a Web page are summarized based on the results of line classification. The performance of this rule-based criteria detection method is tested on two data sets. It is also compared with a direct criteria detection method. The results show that the overall accuracy of the rule-based method is higher than that of the direct detection method. Some criteria, such as authors name, authors credential and authors affiliation, which were difficult to detect using the direct detection method, can be effectively detected based on location and expression patterns. The approach of rule-based detecting criteria for assessing the quality of health Web pages is effective. Automatic detection of technical criteria is complementary to the evaluation of content quality, and it can contribute in assessing the comprehensive quality of health related Web sites. |
---|