Akçay, Mehmet Berkehan, and Kaya Oğuz, "Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers", Speech Communication, 2020, Vol.116, pp.56-76.
 Imani, Maryam, and Gholam Ali Montazer, "A survey of emotion recognition methods with emphasis on E-Learning environments", Journal of Network and Computer Applications, 2019, Vol.147, p.102423.
 Lugović, S., I. Dunđer, and M. Horvat, "Techniques and applications of emotion recognition in speech. 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2016-Proceedings, 1278–1283", Google Scholar Google Scholar Cross Ref Cross Ref (2016).
 Swain, Monorama, Aurobinda Routray, and Prithviraj Kabisatpathy, "Databases, features and classifiers for speech emotion recognition: a review", International Journal of Speech Technology, 2018, Vol.21, no.1, pp.93-120.
 France, Daniel Joseph, Richard G. Shiavi, Stephen Silverman, Marilyn Silverman, and M. Wilkes, "Acoustical properties of speech as indicators of depression and suicidal risk", IEEE transactions on Biomedical Engineering, 2000, Vol.47, no.7, pp.829-837.
 Pao, Tsang-Long, Chun-Hsiang Wang, and Yu-Ji Li, "A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition", In 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming, IEEE, 2012, pp.157-162.
 Ting, K.M. Confusion Matrix. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. 2011.
 Tamulevičius, Gintautas, Gražina Korvel, Anil Bora Yayak, Povilas Treigys, Jolita Bernatavičienė, and Bożena Kostek, "A study of cross-linguistic speech emotion recognition based on 2D feature spaces", Electronics, 2020, Vol.9, no.10, p.1725.
 Nguyen, Dung, Kien Nguyen, Sridha Sridharan, David Dean, and Clinton Fookes, "Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition", Computer Vision and Image Understanding, 2018, Vol.174, p.33-42.
 Iqbal, Aseef, and Kakon Barua, "A real-time emotion recognition from speech using gradient boosting", In 2019 international conference on electrical, computer and communication engineering (ECCE), IEEE, 2019, pp.1-5.
 Chapaneri, Santosh V., and Deepak D. Jayaswal, "Multi-taper spectral features for emotion recognition from speech", In 2015 International Conference on Industrial Instrumentation and Control (ICIC), IEEE, 2015, pp.1044-1049.
 Badshah, Abdul Malik, Jamil Ahmad, Nasir Rahim, and Sung Wook Baik, "Speech emotion recognition from spectrograms with deep convolutional neural network", In 2017 international conference on platform technology and service (PlatCon), IEEE 2017, pp.1-5.
 Kumbhar, Harshawardhan S., and Sheetal U. Bhandari, "Speech emotion recognition using MFCC features and LSTM network", In 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), IEEE, 2019, pp.1-3.
 Etienne, Caroline, Guillaume Fidanza, Andrei Petrovskii, Laurence Devillers, and Benoit Schmauch, "Cnn+ lstm architecture for speech emotion recognition with data augmentation", arXiv preprint arXiv:1802.05630, 2018.
 Guizzo, Eric, Tillman Weyde, and Jack Barnett Leveson, "Multi-time-scale convolution for emotion recognition from speech audio signals", In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 202, pp.6489-6493.
 Li, Chao, Jinlong Jiao, Yiqin Zhao, and Ziping Zhao, "Combining gated convolutional networks and self-attention mechanism for speech emotion recognition", In 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), IEEE, 2019, pp.105-109.
 Stolar, Melissa N., Margaret Lech, Robert S. Bolia, and Michael Skinner, "Real time speech emotion recognition using RGB image classification and transfer learning", In 2017 11th International Conference on Signal Processing and Communication Systems (ICSPCS), IEEE, 2017, pp.1-8.
 Livingstone, Steven R., and Frank A. Russo, "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English", PloS one, 2018, Vol.13, no.5, p.e0196391.
 Venkataramanan, Kannan, and Haresh Rengaraj Rajamohan, "Emotion recognition from speech", arXiv preprint arXiv:1912.10458, 2019.
 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri, "Learning spatiotemporal features with 3d convolutional networks", In Proceedings of the IEEE international conference on computer vision, 2015, pp.4489-4497.
 Demir, Fatih, Muammer Turkoglu, Muzaffer Aslan, and Abdulkadir Sengur, "A new pyramidal concatenated CNN approach for environmental sound classification", Applied Acoustics, 2020, Vol.170, p.107520.
 Sankisa, Arun, Arjun Punjabi, and Aggelos K. Katsaggelos, "Temporal capsule networks for video motion estimation and error concealment", Signal, Image and Video Processing, 2020, Vol.14, no.7, pp.1369-1377.