An Optimized Stochastic Gradient Descent Approach to a Bidirectional Long Short-Term Memory with Bidirectional Contextual Embeddings for Extractive Text Summarisation

Authors

Keywords:

BiLSTM, Contextual embedding, Stochastic gradient descent, Encoder-decoder

Abstract

With the increase in the amount of textual data on the web, this study explores the performance of extractive text summarisation model that integrates pretrained contextual word embeddings with Bidirectional Long Short-Term Memory (BiLSTM) encoder–decoder architecture. The embeddings capture context and semantic relationships, while the BiLSTM mechanism addresses the vanishing gradient problem and enables learning of long-term dependencies in both directions. Experiments were conducted on subsets of the Amazon Fine Food Reviews dataset of 5000 samples. The model was trained using Stochastic Gradient Descent to optimise with a learning rate of 0.05 across 10, 20, and 30 epochs. From the results, it shows that at 10 epochs, training and validation metrics are consistent and matched, indicating good generalisation with minimal overfitting. As the epoch increases, training loss decreases significantly; however, validation loss increases as dataset sizes increase with overfitting. Though, training performance improves dramatically, but validation performance deteriorates. The findings demonstrate that the training enhances memorisation of summarised text but required early stopping and careful epoch selection to handle generalisation in extractive text summarisation tasks.

Dimensions

Aduragba, O. T., Yu, J., Senthilnathan, G., & Cristea, A. I. (2020). Sentence contextual encoder with BERT and BiLSTM for automatic classification with imbalanced medication tweets. In Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task (SMM4H 2020) (pp. 155–159). Association for Computational Linguistics.

Alghamdi, N. S., & Alzahrani, S. M. (2024). Improving extractive summarization with semantic enhancement through topic-injection based BERT model. Knowledge-Based Systems, 292, 111626. https://doi.org/10.1016/j.knosys.2024.111626

Arora, S., May, A., Zhang, J., & Ré, C. (2020). Contextual embeddings: When are they worth it? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020) (pp. 2650–2663). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.236

Bano, S., Khalid, S., Tairan, N. M., Shah, H., & Khattak, H. A. (2023). Summarization of scholarly articles using BERT and BiGRU: Deep learning-based extractive approach. Journal of King Saud University – Computer and Information Sciences, 35(9), 101739. https://doi.org/10.1016/j.jksuci.2023.101739

Ghojogh, B., & Ghodsi, A. (2023). Recurrent neural networks and long short-term memory networks: Tutorial and survey (arXiv:2304.11461). arXiv. https://arxiv.org/abs/2304.11461

Graves, A., Mohamed, A.-r., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6645–6649). IEEE. https://doi.org/10.1109/ICASSP.2013.6638947

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Ju, J., Liu, M., Koh, H. Y., Jin, Y., Du, L., & Pan, S. (2021). Leveraging information bottleneck for scientific document summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 4091–4098). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-emnlp.345

Sharma, G., & Sharma, D. (2022). Automatic text summarization methods: A comprehensive review. SN Computer Science, 4(1), 33.

Turton, J., Smith, R. E., & Vinson, D. (2021). Deriving semantic features from contextual embeddings. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021) (pp. 203–214). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.repl4nlp-1.26

Vo, S.-N., Vo, T.-T., & Le, B. (2024). Interpretable extractive text summarization with meta-learning and BiLSTM: A study of meta-learning and explainability techniques. Expert Systems with Applications, 245, 123045. https://doi.org/10.1016/j.eswa.2023.123045

Yadav, A. K., Singh, A., Dhiman, M., Vineet, Kaundal, R., Verma, A., & Yadav, D. (2022). Extractive text summarization using deep learning approach. International Journal of Information Technology, 14(5), 2407–2415.

Yadav, A. K., Ranvijay, Y., Yadav, R. S., & Maurya, A. K. (2023). State-of-the-art approach to extractive text summarization: A comprehensive review. Multimedia Tools and Applications, 82(19), 29135–29197.

Zhang, S., Wan, D., & Bansal, M. (2023). Extractive is not faithful: An investigation of broad unfaithfulness problems in extractive summarization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2153–2174). Association for Computational Linguistics.

Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., & Huang, X. (2020). Extractive summarization as text matching. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020) (pp. 6197–6208). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.552

Published

2026-05-21

How to Cite

Abdullah, K.-K. A., Odule, T. J., Ajayi, A. J., Oladiran, O. E., Lawal, O. A., Ologunleko, E. F., & Tijani, O. D. (2026). An Optimized Stochastic Gradient Descent Approach to a Bidirectional Long Short-Term Memory with Bidirectional Contextual Embeddings for Extractive Text Summarisation. Nigerian Journal of Physics, 35(2), 264-270. https://doi.org/10.62292/njp.v35i2.2026.592

How to Cite

Abdullah, K.-K. A., Odule, T. J., Ajayi, A. J., Oladiran, O. E., Lawal, O. A., Ologunleko, E. F., & Tijani, O. D. (2026). An Optimized Stochastic Gradient Descent Approach to a Bidirectional Long Short-Term Memory with Bidirectional Contextual Embeddings for Extractive Text Summarisation. Nigerian Journal of Physics, 35(2), 264-270. https://doi.org/10.62292/njp.v35i2.2026.592