ARCHIVES
VOL. 9, ISSUE 7 (2022)
Text analytics on Covid-19 sentiment in Malaysia using K-means clustering approach
Authors
Ashraff Ruslan, Norshahida Shaadan, Muhammad Khalis Abdul Karim
Abstract
Text clustering has been acknowledged as a useful technique in clustering the unstructured data. One of the main resources of unstructured data is digital media where people usually obtained from the internet. For example, the English Wikipedia includes 6,295,065 articles and has averages of 593 new articles per day. Hence, this study concerns on the web mining technique for clustering text data related to COVID-19 (Cov-19) sentiment from digital media. 20 articles related to Malaysian sentiment on Cov-19 were extracted using phyton language. The feature selection and assigning weightage to the terms were performed using Term Frequency-Inverse Document Frequency (TF-IDF) approach. Furthermore, the k-means clustering was used for text clustering using sklearn module in python. This approach capable to cluster 20 articles into 6 main themes based on period where Malaysian government acted on Cov-19 issues, Cov-19 death cases, timeline of pandemic, the origin of the virus, diagnosis and treatment measure of Cov-19 and medical institution which handle the cases. In conclusion, the proposed clustering technique managed to indicate the focus content on which Cov-19 were discussed in the digital media. Hence, this approach can be further extend to more depth research area such as sentiment analysis and building corpus on other fields.
Download
Pages:43-49
How to cite this article:
Ashraff Ruslan, Norshahida Shaadan, Muhammad Khalis Abdul Karim "Text analytics on Covid-19 sentiment in Malaysia using K-means clustering approach". International Journal of Multidisciplinary Research and Development, Vol 9, Issue 7, 2022, Pages 43-49
Download Author Certificate
Please enter the email address corresponding to this article submission to download your certificate.
