An Optimized Stochastic Gradient Descent Approach to a Bidirectional Long Short-Term Memory with Bidirectional Contextual Embeddings for Extractive Text Summarisation
Keywords:
BiLSTM, Contextual embedding, Stochastic gradient descent, Encoder-decoderAbstract
With the increase in the amount of textual data on the web, this study explores the performance of extractive text summarisation model that integrates pretrained contextual word embeddings with Bidirectional Long Short-Term Memory (BiLSTM) encoder–decoder architecture. The embeddings capture context and semantic relationships, while the BiLSTM mechanism addresses the vanishing gradient problem and enables learning of long-term dependencies in both directions. Experiments were conducted on subsets of the Amazon Fine Food Reviews dataset of 5000 samples. The model was trained using Stochastic Gradient Descent to optimise with a learning rate of 0.05 across 10, 20, and 30 epochs. From the results, it shows that at 10 epochs, training and validation metrics are consistent and matched, indicating good generalisation with minimal overfitting. As the epoch increases, training loss decreases significantly; however, validation loss increases as dataset sizes increase with overfitting. Though, training performance improves dramatically, but validation performance deteriorates. The findings demonstrate that the training enhances memorisation of summarised text but required early stopping and careful epoch selection to handle generalisation in extractive text summarisation tasks.
Published
How to Cite
Issue
Section
Copyright (c) 2026 Khadijha-Kuburat A. Abdullah, Tola J. Odule, Ayobami J. Ajayi, Omobola E. Oladiran, Olufunmilayo A. Lawal, Emmanuel F. Ologunleko, Olatunde D. Tijani

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.