Temporal fusion strategy for violence detection: utilising convolutional and LSTM neural networks for surveillance videos

Authors

  • Khaled Merit Tahri Mohammed University of Bechar, Department of Electrical Engineering, Laboratory of TIT, Street of Independence, Road of Kenadsa, B.P 417, 08000 Bechar, Algeria
  • Mohammed Beladgham Tahri Mohammed University of Bechar, Department of Electrical Engineering, Laboratory of TIT, Street of Independence, Road of Kenadsa, B.P 417, 08000 Bechar, Algeria
  • Abdelmalik Taleb-Ahmed University of Valenciennes, UMR CNRS 8520, Laboratory of IEMN DOAE, F-59313 Valenciennes, France

DOI:

https://doi.org/10.14311/AP.2025.65.0306

Keywords:

deep learning, efficient violence detection, temporal fusion, LSTM, automated video surveillance, intelligent cities, video recognition

Abstract

In the latest intelligent cities, there is a pursuit for the utmost degree of automation and integration of services. One of the major challenges in the surveillance industry is the need to automate real-time video analysis to identify critical cases. This paper introduces sophisticated models using Convolutional Neural Networks (CNN), specifically MobileNet V3, VGG16, and InceptionV3 networks, as well as networks using LSTM and feedforward networks. These models are designed to accurately categorise videos into two completely separate classes, namely: (“Non-Violence” and “Violence”). The RLVS database is used for this classification task. Various data representations are used by Temporal Fusion approaches. The highest attained outcome was an Accuracy of 91.03 %, and an F1-score of 90.90 %, which is superior to the results obtained in similar research performed on the same database for achieving the goal of recognising actions that are violent in Surveillance Videos.

Downloads

Download data is not yet available.

References

J. S. Gracias, G. S. Parnell, E. Specking, et al. Smart cities – A structured literature review. Smart Cities 6(4):1719–1743, 2023. https://doi.org/10.3390/smartcities6040080

I. A. T. Hashem, V. Chang, N. B. Anuar, et al. The role of big data in smart city. International Journal of Information Management 36(5):748–758, 2016. https://doi.org/10.1016/j.ijinfomgt.2016.05.002

D. M. Blei, P. Smyth. Science and data science. Proceedings of the National Academy of Sciences of the United States of America 114(33):8689–8692, 2017. https://doi.org/10.1073/pnas.1702076114

F. A. Temel, O. C. Yolcu, N. G. Turan. Artificial intelligence and machine learning approaches in composting process: A review. Bioresource Technology 370:128539, 2023. https://doi.org/10.1016/j.biortech.2022.128539

Y. LeCun, Y. Bengio, G. Hinton. Deep learning. Nature 521(7553):436–444, 2015. https://doi.org/10.1038/nature14539

L. Calderoni, D. Maio, S. Rovis. Deploying a network of smart cameras for traffic monitoring on a “city kernel”. Expert Systems with Applications 41(2):502–507, 2014. https://doi.org/10.1016/j.eswa.2013.07.076

K. Muhammad, J. Ahmad, I. Mehmood, et al. Convolutional neural networks based fire detection in surveillance videos. IEEE Access 6:18174–18183, 2018. https://doi.org/10.1109/access.2018.2812835

S. Sudhakaran, O. Lanz. Learning to detect violent videos using convolutional long short-term memory. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE, 2017. https://doi.org/10.1109/avss.2017.8078468

I. Serrano, O. Deniz, J. L. Espinosa-Aranda, G. Bueno. Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Transactions on Image Processing 27(10):4787–4797, 2018. https://doi.org/10.1109/tip.2018.2845742

A. S. Keçeli, A. Kaya. Violent activity detection with transfer learning method. Electronics Letters 53(15):1047–1048, 2017. https://doi.org/10.1049/el.2017.0970

A. Voulodimos, N. Doulamis, A. Doulamis, E. Protopapadakis. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience 2018(1):7068349, 2018. https://doi.org/10.1155/2018/7068349

A. B. Sargano, X.Wang, P. Angelov, Z. Habib. Human action recognition using transfer learning with deep representations. In 2017 International Joint Conference on Neural Networks (IJCNN), pp. 463–469. IEEE, 2017. https://doi.org/10.1109/ijcnn.2017.7965890

P. Zhou, Q. Ding, H. Luo, X. Hou. Violent interaction detection in video based on deep learning. Journal of Physics: Conference Series 844(1):012044, 2017. https://doi.org/10.1088/1742-6596/844/1/012044

J. Deng, W. Dong, R. Socher, et al. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE, 2009. https://doi.org/10.1109/cvprw.2009.5206848

T. Guo, Z. Xu, X. Yao, et al. Robust online time series prediction with recurrent neural networks. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 816–825. IEEE, 2016. https://doi.org/10.1109/dsaa.2016.92

E. Bermejo Nievas, O. Deniz Suarez, G. Bueno García, R. Sukthankar. Violence detection in video using computer vision techniques. In Computer Analysis of Images and Patterns, pp. 332–339. Springer, 2011. https://doi.org/10.1007/978-3-642-23678-5_39

T. Hassner, Y. Itcher, O. Kliper-Gross. Violent flows: Real-time detection of violent crowd behavior. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–6. IEEE, 2012. https://doi.org/10.1109/cvprw.2012.6239348

M. M. Soliman, M. H. Kamal, M. A. E.-M. Nashed, et al. Violence recognition from videos using deep learning techniques. In 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 80–85. IEEE, 2019. https://doi.org/10.1109/icicis46948.2019.9014714

M. Elesawy, M. Hussein, M. A. E. Massih. Real life violence situations dataset, 2019. [2024-02-13]. https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset/

A. Karpathy, G. Toderici, S. Shetty, et al. Large-scale video classification with convolutional neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732. 2014. https://doi.org/10.1109/cvpr.2014.223

A. Traoré, M. A. Akhloufi. 2D bidirectional gated recurrent unit convolutional neural networks for end-to-end violence detection in videos. In International Conference on Image Analysis and Recognition, pp. 152–160. Springer, 2020. https://doi.org/10.1007/978-3-030-50347-5_14

G. Bertasius, H. Wang, L. Torresani. Is space-time attention all you need for video understanding? In Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 813–824. 2021. https://doi.org/10.48550/arXiv.2102.05095

N. AlDahoul, H. A. Karim, R. Datta, et al. Convolutional neural network – long short term memory based IOT node for violence detection. In 2021 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), pp. 1–6. IEEE, 2021. https://doi.org/10.1109/iicaiet51634.2021.9573691

R. Vijeikis, V. Raudonis, G. Dervinis. Efficient violence detection in surveillance. Sensors 22(6):2216, 2022. https://doi.org/10.3390/s22062216

Downloads

Published

2025-07-09

Issue

Section

Articles

How to Cite

Merit, K., Beladgham, M., & Taleb-Ahmed, A. (2025). Temporal fusion strategy for violence detection: utilising convolutional and LSTM neural networks for surveillance videos. Acta Polytechnica, 65(3), 306–319. https://doi.org/10.14311/AP.2025.65.0306