TY - JOUR
T1 - Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications
AU - Verma, Jai Prakash
AU - Bhargav, Shir
AU - Bhavsar, Madhuri
AU - Bhattacharya, Pronaya
AU - Bostani, Ali
AU - Chowdhury, Subrata
AU - Webber, Julian
AU - Mehbodniya, Abolfazl
N1 - Funding Information:
Authors would like to thank Kuwait College of Science and Technology (KCST) for supporting this work.
Publisher Copyright:
© 2023 by the authors.
PY - 2023/8/22
Y1 - 2023/8/22
N2 - The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret and analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques in TS, an anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. The models, although simpler and less resource-intensive, are key in assessing reviews and feedback on products or services. Nonetheless, current methodologies have not fully resolved concerns surrounding complexity, adaptability, and computational demands. Thus, we propose our scheme, GETS, utilizing a graph-based model to forge connections among words and sentences through statistical procedures. The structure encompasses a post-processing stage that includes graph-based sentence clustering. Employing the Apache Spark framework, the scheme is designed for parallel execution, making it adaptable to real-world applications. For evaluation, we selected 500 documents from the WikiHow and Opinosis datasets, categorized them into five classes, and applied the recall-oriented understudying gisting evaluation (ROUGE) parameters for comparison with measures ROUGE-1, 2, and L. The results include recall scores of 0.3942, 0.0952, and 0.3436 for ROUGE-1, 2, and L, respectively (when using the clustered approach). Through a juxtaposition with existing models such as BERTEXT (with 3-gram, 4-gram) and MATCHSUM, our scheme has demonstrated notable improvements, substantiating its applicability and effectiveness in real-world scenarios.
AB - The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret and analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques in TS, an anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. The models, although simpler and less resource-intensive, are key in assessing reviews and feedback on products or services. Nonetheless, current methodologies have not fully resolved concerns surrounding complexity, adaptability, and computational demands. Thus, we propose our scheme, GETS, utilizing a graph-based model to forge connections among words and sentences through statistical procedures. The structure encompasses a post-processing stage that includes graph-based sentence clustering. Employing the Apache Spark framework, the scheme is designed for parallel execution, making it adaptable to real-world applications. For evaluation, we selected 500 documents from the WikiHow and Opinosis datasets, categorized them into five classes, and applied the recall-oriented understudying gisting evaluation (ROUGE) parameters for comparison with measures ROUGE-1, 2, and L. The results include recall scores of 0.3942, 0.0952, and 0.3436 for ROUGE-1, 2, and L, respectively (when using the clustered approach). Through a juxtaposition with existing models such as BERTEXT (with 3-gram, 4-gram) and MATCHSUM, our scheme has demonstrated notable improvements, substantiating its applicability and effectiveness in real-world scenarios.
KW - extractive text summarization
KW - graph analytics
KW - graph-based clustering
KW - opinion mining
KW - sentence scoring scheme
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=85172231005&partnerID=8YFLogxK
U2 - 10.3390/info14090472
DO - 10.3390/info14090472
M3 - Article
AN - SCOPUS:85172231005
SN - 2078-2489
VL - 14
SP - 472
JO - Information (Switzerland)
JF - Information (Switzerland)
IS - 9
M1 - 472
ER -