Web Based Project 113

Topic Detection by Clustering Keywords

 To find prominent topic in a collection of documents. We here propose a system to detect topic from a collection of document. We use an efficient method to discover topic in a collection of documents known as topic model. A topic model is a type of statistical model for discovering topics from collection of documents. One would expect particular words to appear in the document more or less frequently: “dog” and “bone” will appear more often in documents about dogs, “cat” and “meow” will appear in documents about cats, and “the” and “is” will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. Our proposed system captures this intuition in a mathematical framework and will examine topic of particular set of documents. Here the system will extract keywords and will use clustering algorithm in order to discover topic from particular set of documents. System will extract keywords which occur often and will cluster this keywords using clustering algorithm and will detect topic from a collection of documents. This system takes co occurrence of terms into account which gives best result. This system can be useful for web crawlers and for web users. This system will help the web users to easily search information for particular topic. When the user will search for particular topic, system will extract various keywords from the set of documents which will match topic name mentioned by the web user and will cluster the keywords and will provide topic related information to the user. Web users will get information quickly for respective topic they are searching for.

Features:

  • The system will extract keywords and will use clustering algorithm in order to discover topic for particular set of documents.

  • This system takes co occurrence of terms into account which gives best result.

  • System will extract keywords which occur often and will cluster this keywords using clustering algorithm and will detect topic from a collection of documents.

  • This system will help the web users to easily search information for particular topic.

  • System uses a method known as topic model. A topic model is a type of statistical model for discovering topics from collection of documents.

Feasibility Study

This system will extract keywords which occur often from collection of documents and will cluster the words using clustering algorithm and system will detect topic from a collection of documents.

  • Economic Feasibility

This system will help the web users to easily search information for particular topic. This system will be useful for web crawlers. This system will provide economic benefits for many websites. It includes quantification and identification of all the benefits expected.

  • Operational Feasibility

This system is more reliable, maintainable, affordable and producible. These are the parameters which are considered during design and development of this project. During design and development phase of this project there was appropriate and timely application of engineering and management efforts to meet the previously mentioned parameters.

  • Technical Feasibility

The back end of this project is SQL server which stores parameters related to this project. There are basic requirement of hardware to run this application. This system is developed in .Net Framework using C#. This application will be online so this application can be accessed by using any device like (Personal Computers, Laptop and with some hand held devices).

Advantages

  • This system takes co occurrence of terms into account which gives best result.

  • This system will help the web users to easily search information for particular topic.

  • Web users will get information quickly for respective topic they are searching for.

Disadvantages:

  • This system extracts words rather than phrases. If system extracts phrases topic detection will be faster.

Software Requirements:

  • Windows
  • Sql
  • Visual studio 2010

Hardware Components:

  • Processor – Dual Core
  • Hard Disk – 50 GB
  • Memory – 1GB RAM
  • Internet Connection

Application:

This application can be used by many web users.

Reference:

  • http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4624691&queryText=Topic+Detection+by+Clustering+Keywords&newsearch=true&searchField=Search_All

 

 

 

 

Scroll to Top