Design and Develop Semantic Textual Document Clustering Model
Abstract
The utilization of textual documents is spontaneously increasing over the internet, email, web pages, reports, journals, articles and they stored in the electronic database format. It is challenging to find and access these documents without proper classification mechanisms. To overcome such difficulties we proposed a semantic document clustering model and develop this model. The document pre-processing steps, semantic information from WordNet help us to be bioavailable the semantic relation from raw text. By reminding the limitation of traditional clustering algorithms on the natural language, we consider semantic clustering by COBWEB conceptual clustering. Clustering quality and high accuracy were one of the most important aims of our research, and we chose F-Measure evaluation for ensuring the purity of clustering. However, there still exist many challenges, like the word, high spatial property, extracting core linguistics from texts, and assignment adequate description for the generated clusters. By the help of Word Net database, we eliminate those issues. In this research paper, there have a proposed framework and describe our development evaluation with evaluation.
Full Text: PDF DOI: 10.15640/jcsit.v5n2a4
Abstract
The utilization of textual documents is spontaneously increasing over the internet, email, web pages, reports, journals, articles and they stored in the electronic database format. It is challenging to find and access these documents without proper classification mechanisms. To overcome such difficulties we proposed a semantic document clustering model and develop this model. The document pre-processing steps, semantic information from WordNet help us to be bioavailable the semantic relation from raw text. By reminding the limitation of traditional clustering algorithms on the natural language, we consider semantic clustering by COBWEB conceptual clustering. Clustering quality and high accuracy were one of the most important aims of our research, and we chose F-Measure evaluation for ensuring the purity of clustering. However, there still exist many challenges, like the word, high spatial property, extracting core linguistics from texts, and assignment adequate description for the generated clusters. By the help of Word Net database, we eliminate those issues. In this research paper, there have a proposed framework and describe our development evaluation with evaluation.
Full Text: PDF DOI: 10.15640/jcsit.v5n2a4
Browse Journals
Journal Policies
Information
Useful Links
- Call for Papers
- Submit Your Paper
- Publish in Your Native Language
- Subscribe the Journal
- Frequently Asked Questions
- Contact the Executive Editor
- Recommend this Journal to Librarian
- View the Current Issue
- View the Previous Issues
- Recommend this Journal to Friends
- Recommend a Special Issue
- Comment on the Journal
- Publish the Conference Proceedings
Latest Activities
Resources
Visiting Status
Today | 118 |
Yesterday | 258 |
This Month | 4979 |
Last Month | 7528 |
All Days | 1465106 |
Online | 20 |