Aprendizaje profundo para selección de opciones numéricas por voz como herramientas para chatbot

Jiménez Moreno, Robinson; Castro Pescador, Andrés Mauricio; Espitia Cubillos, Anny Astrid

Repositorio Institucional Universidad de Pamplona

Producción Editorial Universidad de Pamplona

Revista Colombiana de Tecnologias de Avanzada (RCTA)

Por favor, use este identificador para citar o enlazar este ítem: http://repositoriodspace.unipamplona.edu.co/jspui/handle/20.500.12744/9478

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Jiménez Moreno, Robinson	-
dc.contributor.author	Castro Pescador, Andrés Mauricio	-
dc.contributor.author	Espitia Cubillos, Anny Astrid	-
dc.date.accessioned	2025-05-08T14:55:56Z	-
dc.date.available	2025-05-08T14:55:56Z	-
dc.date.issued	2025-01-01	-
dc.identifier.citation	Jiménez Moreno, R., Castro Pescador, A. M., & Espitia Cubillos, A. A. (2025). Aprendizaje profundo para selección de opciones numéricas por voz como herramientas para chatbot. REVISTA COLOMBIANA DE TECNOLOGIAS DE AVANZADA (RCTA), 1(45), 74–81. https://doi.org/10.24054/rcta.v1i45.3044	es_CO
dc.identifier.issn	1692-7257	-
dc.identifier.issn	2500-8625	-
dc.identifier.uri	http://repositoriodspace.unipamplona.edu.co/jspui/handle/20.500.12744/9478	-
dc.description	Este documento presenta el diseño de un asistente tipo chatbot operado por voz que funciona siguiendo un modelo de dialogo entre usuario y robot, el cual es entrenado con algoritmos de aprendizaje profundo usando una base de datos de espectrogramas, construidos a partir de voces tanto masculinas como femeninas, basados en la transformada de Fourier de corto tiempo y los coeficientes cepstrales de frecuencia Mel como técnicas de preprocesamiento de señales. Para el reconocimiento y clasificación de patrones de voz se diseñan cinco arquitecturas de red convolucional con los mismos parámetros. Se compara el desempeño en el entrenamiento de las redes donde todas obtuvieron grados de exactitud superior al 92.8%, se observa que el número de capas de las redes afecta el número de parámetros de aprendizaje, su grado de exactitud y peso digital, en general mayor cantidad de capas incrementa tanto el tiempo de entrenamiento como el tiempo de clasificación. Finalmente, para su validación mediante un App de chatbot, el diseño de la red seleccionada es aplicado al diligenciamiento de una encuesta que usa una escala de Likert de 1 a 5, en donde los usuarios además de decir la opción seleccionada la confirman con un Sí o un No, la App reproduce el audio de cada pregunta, muestra su identificación, escucha y confirma las respuestas del usuario. Se concluye el diseño de red seleccionado permite desarrollar aplicaciones de chatbot basadas en interacción por audio.	es_CO
dc.description.abstract	This document presents the design of a voice-operated chatbot-type assistant that works following a dialogue model between user and robot, which is trained with deep learning algorithms, using a database of spectrograms constructed from male and female voices, based on the short-time Fourier transform and Mel frequency cepstral coefficients as signal preprocessing techniques. For the recognition and classification of voice patterns, five convolutional network architectures are designed with the same parameters. The performance achieved in the training of the networks is compared, where all degrees of accuracy were greater than 92.8%. It is observed that the number of layers of the networks affects the number of learning parameters, their degree of accuracy and digital weight; in general, a greater number of layers increases both the training time and the classification time. Finally, for validation through a chatbot App, the selected network is applied to the completion of a survey that uses a Likert scale from 1 to 5, where users, in addition to saying the selected option, confirm it with a Yes or No, the App plays the audio of each question, shows its identification, listens and confirms the user's answers. The selected network design is concluded, allowing the development of chatbot applications based on audio interaction.	es_CO
dc.format.extent	8	es_CO
dc.format.mimetype	application/pdf	es_CO
dc.language.iso	es	es_CO
dc.publisher	Aldo Pardo García, Revista Colombiana de Tecnologías de Avanzada, Universidad de Pamplona.	es_CO
dc.relation.ispartofseries	74;81	-
dc.subject	aprendizaje profundo	es_CO
dc.subject	inteligencia artificial	es_CO
dc.subject	robótica	es_CO
dc.subject	aplicación	es_CO
dc.subject	chatbot	es_CO
dc.title	Aprendizaje profundo para selección de opciones numéricas por voz como herramientas para chatbot	es_CO
dc.type	http://purl.org/coar/resource_type/c_2df8fbb1	es_CO
dc.description.edition	Vol. 1 Núm. 45 (2025): Enero – Junio	es_CO
dc.relation.references	P. Rashmi and M. P. Singh, "Convolution neural networks with hybrid feature extraction methods for classification of voice sound signals," World Journal of Advanced Engineering Technology and Sciences, vol. 8, no. 2, pp. 110-125, doi: 10.30574/wjaets.2023.8.2.0083, 2023.	es_CO
dc.relation.references	S. A. El-Moneim, M. A. Nassar and M. Dessouky, "Cancellable template generation for speaker recognition based on spectrogram patch selection and deep convolutional neural networks," International Journal of Speech Technology, vol. 25, no. 3, pp. 689-696, doi: 10.1007/s10772-020-09791-y, 2022.	es_CO
dc.relation.references	P. H. Chandankhede, A. S. Titarmare and S. Chauhvan, "Voice recognition based security system using convolutional neural network," in 2021 International Conference on Computing, Communication and Intelligent Systems (ICCCIS), 2021.	es_CO
dc.relation.references	O. Cetin, "Accent Recognition Using a Spectrogram Image Feature-Based Convolutional Neural Network," Arabian Journal for Science and Engineering, vol. 48, no. 2, pp. 1973-1990, doi: 10.1109/SLT.2018.8639622, 2023.	es_CO
dc.relation.references	A. Soliman, S. Mohamed and I. A. Abdelrahman, "Isolated word speech recognition using convolutional neural network," in 2020 international conference on computer, control, electrical and electronics engineering (ICCCEEE), 2021.	es_CO
dc.relation.references	A. Alsobhani, A. H. M. and H. Mahdi, "Speech recognition using convolution deep neural networks," in Journal of Physics: Conference Series, 2021.	es_CO
dc.relation.references	J. Li, L. Han, X. Li, J. Zhu, B. Yuan and Z. Gou, "An evaluation of deep neural network models for music classification using spectrograms," Multimedia Tools and Applications, vol. 81, pp. 4621- 4627, doi: 10.1007/s11042-020-10465-9, 2022.	es_CO
dc.relation.references	V. Gupta, S. Juyal and Y. C. Hu, "Understanding human emotions through speech spectrograms using deep neural network," The Journal of Supercomputing, vol. 78, no. 5, pp. 6944-6973, doi: 10.1007/s11227-021-04124-5, 2022.	es_CO
dc.relation.references	D. Issa, M. F. Demirci and A. Yazici, "Speech emotion recognition with deep convolutional neural networks," Biomedical Signal Processing and Control, vol. 59, pp. 101894, doi: 10.1016/j.bspc.2020.101894, 2020.	es_CO
dc.relation.references	K. Bhangale and K. Mohanaprasad, "Speech emotion recognition using mel frequency log spectrogram and deep convolutional neural network," in International Conference on Futuristic Communication and Network Technologies, Singapore.	es_CO
dc.relation.references	A. Iyer, A. Kemp, Y. Rahmatallah, L. Pillai, A. Glover, F. Prior, L. Larson-Prior and T. Virmani, "A machine learning method to process voice samples for identification of Parkinson’s disease," Scientific Reports, vol. 13, pp. 20615, doi: 10.1038/s41598-023-47568-w, 2023.	es_CO
dc.relation.references	M. A. Mohammed, K. H. Abdulkareem, S. A. Mostafa, M. Khanapi Abd Ghani, M. S. Maashi, B. Garcia-Zapirain and F. T. Al-Dhief, "Voice pathology detection and classification using convolutional neural network model," Applied Sciences , vol. 10, no. 11, pp. 3723, doi: 10.3390/app10113723, 2020.	es_CO
dc.relation.references	L. Vavrek, M. Hires, D. Kumar and P. Drotár, "Deep convolutional neural network for detection of pathological speech," in IEEE 19th world symposium on applied machine intelligence and informatics (SAMI), 2021.	es_CO
dc.relation.references	A. Tursunov, Mustaqeem, J. Y. Choeh and S. Kwon, "Age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms," Sensors, vol. 21, no. 17, p. 5892, 2021.	es_CO
dc.relation.references	C. Cheng, K.-L. Lay, H. Yung-Fong and T. Yi-Miau , "Can Likert scales predict choices? Testing the congruence between using Likert scale and comparative judgment on measuring attribution," Methods in Psychology, vol. 5, pp. 100081,doi: 10.3390/ai5030048., 2021.	es_CO
dc.relation.references	R. Liu, G. Yibei , J. Runxiang and Z. Xiaoli , "A Review of Natural-Language-Instructed Robot Execution Systems," AI 5, no. 3, pp. 948-989, doi: 10.1016/j.metip.2021.100081., 2024.	es_CO
dc.relation.references	A. Koshy and S. Tavakoli, "Exploring British Accents: Modelling the Trap–Bath Split with Functional Data Analysis," Journal of the Royal Statistical Society Series C: Applied Statistics, vol. 71, pp. 773–805, doi: 10.1111/rssc.12555, 2022.	es_CO
dc.relation.references	M. M. Kabir, M. F. Mridha, J. Shin, I. Jahan and A. Q. Ohi, "A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities," IEEE Access, vol. 9, pp. 79236-79263, doi: 10.1109/ACCESS.2021.3084299, 2021.	es_CO
dc.relation.references	Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang and M. D. Plumbley, "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880-2894, doi: 10.1109/TASLP.2020.3030497, 2020,.	es_CO
dc.relation.references	J. Martinsson and M. Sandsten, "DMEL: The Differentiable Log-Mel Spectrogram as a Trainable Layer in Neural Networks," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 2024.	es_CO
dc.relation.references	J. Ancilin and A. Milton, "Improved speech emotion recognition with Mel frequency magnitude coefficient," Applied Acoustics, vol. 179, p. doi.org/10.1016/j.apacoust.2021.108046, 2021.	es_CO
dc.relation.references	M. Samaneh, C. Talen, A. Olayinka, T. John Michael, P. Christian, P. Dave and S. Sandra L, "Speech emotion recognition using machine learning — A systematic review," Intelligent Systems with Applications, vol. 20, p. doi.org/10.1016/j.iswa.2023.200266, 2023.	es_CO
dc.relation.references	A. Yenni, H. Risanuri and B. Agus, "A Mel-weighted Spectrogram Feature Extraction for Improved Speaker," International Journal of Intelligent Engineering and Systems, vol. 15, no. 6, p. 74–82 DOI: 10.22266/ijies2022.1231.08, 2022.	es_CO
dc.rights.accessrights	http://purl.org/coar/access_right/c_abf2	es_CO
dc.type.coarversion	http://purl.org/coar/resource_type/c_2df8fbb1	es_CO
Aparece en las colecciones:	Revista Colombiana de Tecnologias de Avanzada (RCTA)

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
Art08_V1_N45_2025_esp.pdf	Art08_V1_N45_2025_esp	701,73 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem

DSpace JSPUI

DSpace almacena y facilita el acceso abierto a todo tipo de contenido digital incluyendo texto, imágenes, vídeos y colecciones de datos.