Deep Text Mining for Automatic Keyphrase Extraction from Text Documents

Muhammad Abulaish; Jahiruddin; Lipika Dey

Download from

dx.doi.org

Deep Text Mining for Automatic Keyphrase Extraction from Text Documents

Muhammad Abulaish, Jahiruddin & Lipika Dey

Journal of Intelligent Systems 20 (4):327-351 (2011) Copy BIBT_EX

Abstract

Due to existence of a huge amount of textual data either on the World Wide Web or in textual databases like PubMed, the development of novel automatic keyphrase extraction methods has emerged as one of the key research problems in recent past. Consequently, a number of machine learning techniques, mostly supervised, have been proposed to extract keyphrases from text documents. But, one of the main bottlenecks that hinders the success of such systems is the requirement of annotated corpora for training purpose. In this paper, we propose the design of a deep text mining system to identify keyphrases in text documents that are either unstructured or semi-structured in nature. The novelty of our system lies in its applicability on a single document, instead of demanding a collection of annotated texts for training, to identify keyphrases embedded within it. The proposed system applies parsing techniques to identify candidate phrases. After mapping the original set of candidate phrases into a low-dimensional space using Singular Value Decomposition, the Markov Clustering technique is applied to cluster related sentences together. Finally, considering each cluster as a document, Latent Dirichlet Allocation is applied to identify feasible keyphrases that are presented to users in non-increasing order of their relevance score values. The efficacy of the proposed system is established through experimentation on datasets from two different domains. On comparative evaluation, we found that the proposed system outperforms KEA and KEA that apply the supervised machine learning approach for automatic keyphrase extraction from text documents.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Edit

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Keywords

Keyphrase Extraction Latent Dirichlet Allocation Markov Clustering Natural Language Processing Text Mining

Reprint years

DOI

10.1515/jisys.2011.017

Other Versions

No versions found

My notes

Analytics

Added to PP
2017-01-12

Downloads
20 (#1,043,550)

6 months
5 (#1,050,400)

Historical graph of downloads

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Deep Text Mining for Automatic Keyphrase Extraction from Text Documents

Abstract

Categories

Keywords

Reprint years

DOI

Other Versions

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work