Online news web text extraction based on modified maximum subsequence segmentation
Priyanka Gangurde, Dipali Rakh, Sushant Valvi
daily lives we use the web as main information source. We search the online news on the web pages. A huge amount of advertisements and external links on the news web page which is complicated to extract the news from the original HTML document. News in web pages contain a lot of extra contents like advertisements, headers, footers, external links and navigation bar which is not useful for the users. Redundant and irrelevant information is distributed and mixed in whole page, it is hard for user to automatically identify the useful information in the page. This not only increases the cost for search on Web pages, but also it is difficult for users of small display devices. In this paper, we proposed the maximum subsequence segmentation algorithm for extract the news from the web page and convert it into Multilanguage. We get accurate result by using maximum subsequence segmentation algorithm.