International Journal of Multidisciplinary Research and Development

International Journal of Multidisciplinary Research and Development






ISSN Online: 2349-4182
ISSN Print: 2349-5979


International Journal of Multidisciplinary Research and Development
International Journal of Multidisciplinary Research and Development
Vol. 2, Issue 4 (2015)

Online news web text extraction based on modified maximum subsequence segmentation


Priyanka Gangurde, Dipali Rakh, Sushant Valvi

daily lives we use the web as main information source. We search the online news on the web pages. A huge amount of advertisements and external links on the news web page which is complicated to extract the news from the original HTML document. News in web pages contain a lot of extra contents like advertisements, headers, footers, external links and navigation bar which is not useful for the users. Redundant and irrelevant information is distributed and mixed in whole page, it is hard for user to automatically identify the useful information in the page. This not only increases the cost for search on Web pages, but also it is difficult for users of small display devices. In this paper, we proposed the maximum subsequence segmentation algorithm for extract the news from the web page and convert it into Multilanguage. We get accurate result by using maximum subsequence segmentation algorithm.
Pages : 10-12 | 1357 Views | 598 Downloads
Please use another browser.