Biomedical Data Mining For Web Page Analysis

 Data mining is a technique used to mine out useful data and patterns from large data sets and make the most use of obtained results. Web mining and data mining go hand in hand when creating web mining systems. Web mining includes text mining methodologies that allow for usage reading from and classification based on unstructured data. Text mining allows us to detect patterns, keywords and relevant information in unstructured texts. Web mining and data mining systems each have their own uses. Data mining algorithms are efficient at manipulating organized data sets, while web mining algorithms are widely used to scan and mine from unorganized and unstructured web pages and text data available on the internet. Websites created in various platforms have different data structures and are difficult to read for a single algorithm. Since it is not feasible to build a different algorithm to suit various web technology we need to use efficient web mining algorithms to mine this huge amount of web data. Web pages are made up of HTML (Hyper text markup language) In various arrangements and have images, videos etc intermixed on a single web page. So we here propose to use smartly designed web mining algorithms to mine textual information on web pages and detect their relevancy to biomedical sector. In this way we may judge web pages and check their relevancy to the biomedical field. This system proves useful in many biomedical sectors and even search engines to classify web pages into the biomedical structure. Their relevancy to the field helps classify and sort them appropriately for the sector.

Features

  • We use web mining algorithms to mine textual information on web pages and will rank the webpage based on the biomedical entity.
  • Websites created in different platform can be ranked using this application.
  • This system will check web pages that are more relevant to the biomedical field.
  • This system will classify the web pages into various categories and sort them appropriately.
  • There are two features used in this system that is data mining and web mining.
  • Data mining is a technique used to mine out patterns of useful data from large data sets and make the most use of obtained results.
  • Web mining also consists of text mining methodologies that allow us to scan and extract useful content from unstructured data.
  • Data mining as well as web mining are used together at times for efficient system development.
  • Data mining relates to find data from “static databases” which contains “structured” data whereas, web mining plays with data that are “dynamic” and “unstructured”.
  • The goal of this system is to mine data from hypertext documents(e.g. mining data from web contents).
  • This system will preprocess the hypertext documents and extract the text data.
  • The more occurrence of a biomedical entity in a page, the more relevant the page is, and thus, we can re-rank the documents to find the most relevant documents by using text mining technique.

Feasibility Study

Our system will detect patterns, keywords and relevant information in unstructured texts in web page using web mining as well as data mining. Our system will mine webpage using web mining algorithm to mine textual information on web pages and detect those web pages that are relevant to biomedical entity. Data mining as well as web mining is used together at times for efficient result.

  • Economic Feasibility

This system will rank the web pages which contains information related to biomedical entity. So this will increase the rating scale of those web pages which provides more relevant information about biomedical sector. This will provide economic benefits to the organization. It includes quantification and identification of all the benefits expected

  • Operational Feasibility

This system is more reliable, maintainable, affordable and producible. These are the parameters which are considered during design and development of this project. During design and development phase of this project there was appropriate and timely application of engineering and management efforts to meet the previously mentioned parameters.

  • Technical Feasibility

The back end of this project is SQL server which stores data and other details which is related to this project. There are basic requirement of hardware to run this application. This system is developed in .Net Framework using C#. This application will be online so this application can be accessed by using any device like (Personal Computers, Laptop and with some hand held devices).

Advantages

  • This system helps to get relevant bio-medical web pages quickly.
  • User does not have to put more efforts to search web pages related to biomedical.
  • User gets quickly the result he is searching for.
  • This system saves time of the user.

Disadvantages

  • If internet connection fails, system won’t work.
  • The system can rate a webpage with low percentage even if the webpage is  more relevant to bio-medical

Future Scope

  • There can be module where user can get information about biomedical web page in small paragraph along with  rating scale so user can easily decide which webpage will provide him more relevant information.

Software Requirements:

  • Windows
  • Sql
  • Visual studio 2010

Hardware Components:

  • Processor – Dual Core
  • Hard Disk – 50 GB
  • Memory – 1GB RAM
  • Internet Connection

Application

  • This system can be used for many web pages which will help many users to get relevant information.

References:

  • http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6164825&queryText%3DBiomedical+Data+Mining+For+Web+Page+Analysis
  • http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=7072818&queryText%3DBiomedical+Data+Mining+For+Web+Page+Analysis

For support

Scroll to Top