Una arquitectura de aprendizaje profundo multimodal basada en ViT para la clasificación binaria de accidentes de tráfico

Ríos Pérez, Jesús David; Sánchez Torres, German; Henriquez Miranda, Carlos

Repositorio Institucional Universidad de Pamplona

Producción Editorial Universidad de Pamplona

Revista Colombiana de Tecnologias de Avanzada (RCTA)

Por favor, use este identificador para citar o enlazar este ítem: http://repositoriodspace.unipamplona.edu.co/jspui/handle/20.500.12744/10379

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Ríos Pérez, Jesús David	-
dc.contributor.author	Sánchez Torres, German	-
dc.contributor.author	Henriquez Miranda, Carlos	-
dc.date.accessioned	2025-10-14T22:09:57Z	-
dc.date.available	2025-10-14T22:09:57Z	-
dc.date.issued	2025-01-01	-
dc.identifier.citation	J. D. Ríos Pérez, G. Sánchez Torres, y C. Henríquez Miranda, «Una arquitectura de aprendizaje profundo multimodal basada en ViT para la clasificación binaria de accidentes de tráfico», RCTA, vol. 1, n.º 45, pp. 225–239, may 2025. https://doi.org/10.24054/rcta.v1i45.3751	es_CO
dc.identifier.issn	1692-7257	-
dc.identifier.issn	2500-8625	-
dc.identifier.uri	http://repositoriodspace.unipamplona.edu.co/jspui/handle/20.500.12744/10379	-
dc.description	Cada año, más de un millón de personas mueren debido a accidentes de tráfico, y un tercio de estas vidas podrían salvarse reduciendo el tiempo de respuesta médica. El aprendizaje profundo multimodal (MMDL) ha surgido en los últimos años como una poderosa herramienta que integra diferentes tipos de datos para mejorar las capacidades de toma de decisiones en los modelos. Además, los Transformadores Visuales (ViT) son un enfoque de aprendizaje profundo para procesar imágenes y videos que ha mostrado resultados prometedores en varias áreas del conocimiento. En este proyecto, proponemos una arquitectura basada en ViT para la clasificación binaria de accidentes de tráfico utilizando datos de múltiples fuentes, como datos ambientales e imágenes. La integración de un enfoque MMDL basado en ViT puede mejorar la precisión del modelo en la clasificación de accidentes y no accidentes. Este proyecto explora un enfoque MMDL integrando ViT para la monitorización de accidentes de tráfico en el contexto de las ciudades inteligentes, logrando un recall del 91%, lo que evidencia una alta robustez del modelo en la identificación de casos positivos. Sin embargo, la escasez de datos multimodales representa un gran desafío para el entrenamiento de este tipo de modelos.	es_CO
dc.description.abstract	Each year, more than 1 million people die due to traffic accidents, and one-third of these lives could be saved by reducing medical response time. Multi-Modal Deep Learning (MMDL) has emerged in recent years as a powerful tool that integrates different types of data to enhance decision-making capabilities in models. Additionally, Vision Transformers (ViT) are a Deep Learning approach for processing images and videos that has shown promising results in various fields of knowledge. In this project, we propose a ViT-based architecture for binary classification of traffic accidents using data from multiple sources, such as environmental data and images. The integration of an MMDL approach based on ViT can improve the model's accuracy in classifying accidents and non-accidents. This project explores a MMDL approach integrating ViT for traffic accident monitoring in the context of smart cities, achieving a recall of 91%, which evidences a high robustness of the model in identifying positive cases. However, the scarcity of multimodal data represents a major challenge for training these types of models.	es_CO
dc.format.extent	15	es_CO
dc.format.mimetype	application/pdf	es_CO
dc.language.iso	es	es_CO
dc.publisher	Aldo Pardo García, Revista Colombiana de Tecnologías de Avanzada, Universidad de Pamplona.	es_CO
dc.relation.ispartofseries	225;239	-
dc.subject	multimodal	es_CO
dc.subject	aprendizaje profundo	es_CO
dc.subject	transformadores visuales	es_CO
dc.subject	accidentes de tránsito	es_CO
dc.title	Una arquitectura de aprendizaje profundo multimodal basada en ViT para la clasificación binaria de accidentes de tráfico	es_CO
dc.title.alternative	A Multi-Modal ViT-Based Deep Learning Architecture for Binary Classification of Traffic Accident	es_CO
dc.type	http://purl.org/coar/resource_type/c_2df8fbb1	es_CO
dc.description.edition	Vol. 1 Núm. 45 (2025): Enero – Junio	es_CO
dc.relation.references	«Traumatismos causados por el tránsito». Accedido: 18 de marzo de 2025. [En línea]. Disponible en: https://www.who.int/es/news-room/fact-sheets/detail/road-traffic-injuries	es_CO
dc.relation.references	M. T. Pulgarín et al., «Autores: Agencia Nacional de Seguridad Vial».	es_CO
dc.relation.references	R. Sánchez-Mangas, A. García-Ferrrer, A. de Juan, y A. M. Arroyo, «The probability of death in road traffic accidents. How important is a quick medical response?», Accid. Anal. Prev., vol. 42, n.o 4, pp. 1048-1056, jul. 2010, doi: 10.1016/j.aap.2009.12.012.	es_CO
dc.relation.references	Y. Li, F.-X. Wu, y A. Ngom, «A review on machine learning principles for multi-view biological data integration», Brief. Bioinform., vol. 19, n.o 2, pp. 325-340, mar. 2018, doi: 10.1093/bib/bbw113.	es_CO
dc.relation.references	C. Manzoni et al., «Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences», Brief. Bioinform., vol. 19, n.o 2, pp. 286-302, mar. 2018, doi: 10.1093/bib/bbw114.	es_CO
dc.relation.references	«Milestones in Genomic Sequencing». Accedido: 23 de noviembre de 2023. [En línea]. Disponible en: https://www-nature-com.biblioteca.unimagdalena.edu.co/immersive/d42859-020-00099-0/index.html	es_CO
dc.relation.references	S. R. Stahlschmidt, B. Ulfenborg, y J. Synnergren, «Multimodal deep learning for biomedical data fusion: a review», Brief. Bioinform., vol. 23, n.o 2, p. bbab569, ene. 2022, doi: 10.1093/bib/bbab569.	es_CO
dc.relation.references	«Single-cell multiomics: technologies and data analysis methods \| Experimental & Molecular Medicine». Accedido: 23 de noviembre de 2023. [En línea]. Disponible en: https://www-nature-com.biblioteca.unimagdalena.edu.co/articles/s12276-020-0420-2	es_CO
dc.relation.references	T. Baltrušaitis, C. Ahuja, y L.-P. Morency, «Multimodal Machine Learning: A Survey and Taxonomy», IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, n.o 2, pp. 423-443, feb. 2019, doi: 10.1109/TPAMI.2018.2798607.	es_CO
dc.relation.references	«The promise and challenges of multimodal learning analytics», doi: 10.1111/bjet.13015.	es_CO
dc.relation.references	D. Hong et al., «More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification», IEEE Trans. Geosci. Remote Sens., vol. 59, n.o 5, pp. 4340-4354, may 2021, doi: 10.1109/TGRS.2020.3016820.	es_CO
dc.relation.references	S. Jabeen, X. Li, M. S. Amin, O. Bourahla, S. Li, y A. Jabbar, «A Review on Methods and Applications in Multimodal Deep Learning», ACM Trans. Multimed. Comput. Commun. Appl., vol. 19, n.o 2s, p. 76:1-76:41, feb. 2023, doi: 10.1145/3545572.	es_CO
dc.relation.references	Proceedings of the 2020 International Conference on Multimodal Interaction. Association for Computing Machinery, 2020.	es_CO
dc.relation.references	J. Chen et al., «HEU Emotion: a large-scale database for multimodal emotion recognition in the wild», Neural Comput. Appl., vol. 33, n.o 14, pp. 8669-8685, jul. 2021, doi: 10.1007/s00521-020-05616-w.	es_CO
dc.relation.references	«Improving reasoning with contrastive visual information for visual question answering - Long - 2021 - Electronics Letters - Wiley Online Library». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/ell2.12255	es_CO
dc.relation.references	B. P. Yuhas, M. H. Goldstein, y T. J. Sejnowski, «Integration of acoustic and visual speech signals using neural networks», IEEE Commun. Mag., vol. 27, n.o 11, pp. 65-71, nov. 1989, doi: 10.1109/35.41402.	es_CO
dc.relation.references	S. Bai y S. An, «A survey on automatic image caption generation», Neurocomputing, vol. 311, pp. 291-304, oct. 2018, doi: 10.1016/j.neucom.2018.05.080.	es_CO
dc.relation.references	«Future Internet \| Free Full-Text \| Video Captioning Based on Channel Soft Attention and Semantic Reconstructor». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://www.mdpi.com/1999-5903/13/2/55	es_CO
dc.relation.references	R. Souza, A. Fernandes, T. S. F. X. Teixeira, G. Teodoro, y R. Ferreira, «Online multimedia retrieval on CPU–GPU platforms with adaptive work partition», J. Parallel Distrib. Comput., vol. 148, pp. 31-45, feb. 2021, doi: 10.1016/j.jpdc.2020.10.001.	es_CO
dc.relation.references	P. K. Atrey, M. A. Hossain, A. El Saddik, y M. S. Kankanhalli, «Multimodal fusion for multimedia analysis: a survey», Multimed. Syst., vol. 16, n.o 6, pp. 345-379, nov. 2010, doi: 10.1007/s00530-010-0182-0.	es_CO
dc.relation.references	C. G. M. Snoek y M. Worring, «Multimodal Video Indexing: A Review of the State-of-the-art», Multimed. Tools Appl., vol. 25, n.o 1, pp. 5-35, ene. 2005, doi: 10.1023/B:MTAP.0000046380.27575.a5.	es_CO
dc.relation.references	A. H. Yazdavar et al., «Multimodal mental health analysis in social media», PLoS ONE, vol. 15, n.o 4, p. e0226248, abr. 2020, doi: 10.1371/journal.pone.0226248.	es_CO
dc.relation.references	«Sensors \| Free Full-Text \| Effective Techniques for Multimodal Data Fusion: A Comparative Analysis». Accedido: 8 de diciembre de 2023. [En línea]. Disponible en: https://www.mdpi.com/1424-8220/23/5/2381	es_CO
dc.relation.references	«Cascade recurrent neural network for image caption generation - Wu - 2017 - Electronics Letters - Wiley Online Library». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/el.2017.3159	es_CO
dc.relation.references	M. Chen, G. Ding, S. Zhao, H. Chen, Q. Liu, y J. Han, «Reference Based LSTM for Image Captioning», Proc. AAAI Conf. Artif. Intell., vol. 31, n.o 1, Art. n.o 1, feb. 2017, doi: 10.1609/aaai.v31i1.11198.	es_CO
dc.relation.references	W. Jiang, L. Ma, Y.-G. Jiang, W. Liu, y T. Zhang, «Recurrent Fusion Network for Image Captioning», 30 de julio de 2018, arXiv: arXiv:1807.09986. doi: 10.48550/arXiv.1807.09986.	es_CO
dc.relation.references	J. Ji et al., «Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network», Proc. AAAI Conf. Artif. Intell., vol. 35, n.o 2, Art. n.o 2, may 2021, doi: 10.1609/aaai.v35i2.16258.	es_CO
dc.relation.references	Z. Zhang, Q. Wu, Y. Wang, y F. Chen, «High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention», IEEE Trans. Multimed., vol. 21, n.o 7, pp. 1681-1693, jul. 2019, doi: 10.1109/TMM.2018.2888822.	es_CO
dc.relation.references	P. Cao, Z. Yang, L. Sun, Y. Liang, M. Q. Yang, y R. Guan, «Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory», Neural Process. Lett., vol. 50, n.o 1, pp. 103-119, ago. 2019, doi: 10.1007/s11063-018-09973-5.	es_CO
dc.relation.references	«Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation \| IEEE Journals & Magazine \| IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9174742	es_CO
dc.relation.references	L. Chen, Z. Jiang, J. Xiao, y W. Liu, «Human-like Controllable Image Captioning with Verb-specific Semantic Roles», 22 de marzo de 2021, arXiv: arXiv:2103.12204. doi: 10.48550/arXiv.2103.12204.	es_CO
dc.relation.references	B. Wang, L. Ma, W. Zhang, y W. Liu, «Reconstruction Network for Video Captioning», en 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, jun. 2018, pp. 7622-7631. doi: 10.1109/CVPR.2018.00795.	es_CO
dc.relation.references	W. Pei, J. Zhang, X. Wang, L. Ke, X. Shen, y Y.-W. Tai, «Memory-Attended Recurrent Network for Video Captioning», en 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA: IEEE, jun. 2019, pp. 8339-8348. doi: 10.1109/CVPR.2019.00854.	es_CO
dc.relation.references	N. Aafaq, N. Akhtar, W. Liu, S. Z. Gilani, y A. Mian, «Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning», 29 de abril de 2019, arXiv: arXiv:1902.10322. Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: http://arxiv.org/abs/1902.10322	es_CO
dc.relation.references	S. Liu, Z. Ren, y J. Yuan, «SibNet: Sibling Convolutional Encoder for Video Captioning», IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, n.o 9, pp. 3259-3272, sep. 2021, doi: 10.1109/TPAMI.2019.2940007.	es_CO
dc.relation.references	J. Perez-Martin, B. Bustos, y J. Perez, «Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding», en 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA: IEEE, ene. 2021, pp. 3038-3048. doi: 10.1109/WACV48630.2021.00308.	es_CO
dc.relation.references	M. M. Rahman, T. Abedin, K. S. S. Prottoy, A. Moshruba, y F. H. Siddiqui, «Semantically Sensible Video Captioning (SSVC)», ArXiv, sep. 2020, Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://www.semanticscholar.org/paper/Semantically-Sensible-Video-Captioning-(SSVC)-Rahman-Abedin/cf2193f4e9e203fe05addffabed27e0c37a89efa	es_CO
dc.relation.references	Z. Fang, T. Gokhale, P. Banerjee, C. Baral, y Y. Yang, «Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning», en Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, y Y. Liu, Eds., Online: Association for Computational Linguistics, nov. 2020, pp. 840-860. doi: 10.18653/v1/2020.emnlp-main.61.	es_CO
dc.relation.references	Z. Zhang, D. Xu, W. Ouyang, y L. Zhou, «Dense Video Captioning Using Graph-Based Sentence Summarization», IEEE Trans. Multimed., vol. 23, pp. 1799-1810, 2021, doi: 10.1109/TMM.2020.3003592.	es_CO
dc.relation.references	X. Wang, W. Chen, J. Wu, Y.-F. Wang, y W. Y. Wang, «Video Captioning via Hierarchical Reinforcement Learning», 29 de marzo de 2018, arXiv: arXiv:1711.11135. doi: 10.48550/arXiv.1711.11135.	es_CO
dc.relation.references	Y. Chen, S. Wang, W. Zhang, y Q. Huang, «Less Is More: Picking Informative Frames for Video Captioning», 4 de marzo de 2018, arXiv: arXiv:1803.01457. doi: 10.48550/arXiv.1803.01457.	es_CO
dc.relation.references	L. Li y B. Gong, «End-to-End Video Captioning With Multitask Reinforcement Learning», en 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), ene. 2019, pp. 339-348. doi: 10.1109/WACV.2019.00042.	es_CO
dc.relation.references	J. Mun, L. Yang, Z. Ren, N. Xu, y B. Han, «Streamlined Dense Video Captioning», 8 de abril de 2019, arXiv: arXiv:1904.03870. doi: 10.48550/arXiv.1904.03870.	es_CO
dc.relation.references	W. Zhang, B. Wang, L. Ma, y W. Liu, «Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning», 3 de junio de 2019, arXiv: arXiv:1906.01452. doi: 10.48550/arXiv.1906.01452.	es_CO
dc.relation.references	W. Xu, J. Yu, Z. Miao, L. Wan, Y. Tian, y Q. Ji, «Deep Reinforcement Polishing Network for Video Captioning», IEEE Trans. Multimed., vol. 23, pp. 1772-1784, 2021, doi: 10.1109/TMM.2020.3002669.	es_CO
dc.relation.references	H. Ben-younes, R. Cadene, M. Cord, y N. Thome, «MUTAN: Multimodal Tucker Fusion for Visual Question Answering», 18 de mayo de 2017, arXiv: arXiv:1705.06676. doi: 10.48550/arXiv.1705.06676.	es_CO
dc.relation.references	R. Cadene, H. Ben-younes, M. Cord, y N. Thome, «MUREL: Multimodal Relational Reasoning for Visual Question Answering», 25 de febrero de 2019, arXiv: arXiv:1902.09487. doi: 10.48550/arXiv.1902.09487.	es_CO
dc.relation.references	B. N. Patro, S. Pate, y V. P. Namboodiri, «Robust Explanations for Visual Question Answering», 23 de enero de 2020, arXiv: arXiv:2001.08730. Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: http://arxiv.org/abs/2001.08730	es_CO
dc.relation.references	S. Lobry, D. Marcos, J. Murray, y D. Tuia, «RSVQA: Visual Question Answering for Remote Sensing Data», IEEE Trans. Geosci. Remote Sens., vol. 58, n.o 12, pp. 8555-8566, dic. 2020, doi: 10.1109/TGRS.2020.2988782.	es_CO
dc.relation.references	Z. Yu, J. Yu, J. Fan, y D. Tao, «Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering», 4 de agosto de 2017, arXiv: arXiv:1708.01471. doi: 10.48550/arXiv.1708.01471.	es_CO
dc.relation.references	P. Anderson et al., «Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering», 14 de marzo de 2018, arXiv: arXiv:1707.07998. doi: 10.48550/arXiv.1707.07998.	es_CO
dc.relation.references	Z. Yu, J. Yu, Y. Cui, D. Tao, y Q. Tian, «Deep Modular Co-Attention Networks for Visual Question Answering», 25 de junio de 2019, arXiv: arXiv:1906.10770. doi: 10.48550/arXiv.1906.10770.	es_CO
dc.relation.references	L. Li, Z. Gan, Y. Cheng, y J. Liu, «Relation-Aware Graph Attention Network for Visual Question Answering», 9 de octubre de 2019, arXiv: arXiv:1903.12314. doi: 10.48550/arXiv.1903.12314.	es_CO
dc.relation.references	P. Wang, Q. Wu, C. Shen, A. Dick, y A. van den Hengel, «FVQA: Fact-Based Visual Question Answering», IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, n.o 10, pp. 2413-2427, oct. 2018, doi: 10.1109/TPAMI.2017.2754246.	es_CO
dc.relation.references	K. Marino, M. Rastegari, A. Farhadi, y R. Mottaghi, «OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge», 4 de septiembre de 2019, arXiv: arXiv:1906.00067. doi: 10.48550/arXiv.1906.00067.	es_CO
dc.relation.references	J. Yu, Z. Zhu, Y. Wang, W. Zhang, Y. Hu, y J. Tan, «Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering», Pattern Recognit., vol. 108, p. 107563, dic. 2020, doi: 10.1016/j.patcog.2020.107563.	es_CO
dc.relation.references	K. Basu, F. Shakerin, y G. Gupta, «AQuA: ASP-Based Visual Question Answering», en Practical Aspects of Declarative Languages: 22nd International Symposium, PADL 2020, New Orleans, LA, USA, January 20–21, 2020, Proceedings, Berlin, Heidelberg: Springer-Verlag, ene. 2020, pp. 57-72. doi: 10.1007/978-3-030-39197-3_4.	es_CO
dc.relation.references	Y. Wang et al., «Tacotron: Towards End-to-End Speech Synthesis», 6 de abril de 2017, arXiv: arXiv:1703.10135. doi: 10.48550/arXiv.1703.10135.	es_CO
dc.relation.references	S. O. Arik et al., «Deep Voice: Real-time Neural Text-to-Speech», 7 de marzo de 2017, arXiv: arXiv:1702.07825. doi: 10.48550/arXiv.1702.07825.	es_CO
dc.relation.references	S. Arik et al., «Deep Voice 2: Multi-Speaker Neural Text-to-Speech», 20 de septiembre de 2017, arXiv: arXiv:1705.08947. doi: 10.48550/arXiv.1705.08947.	es_CO
dc.relation.references	W. Ping et al., «Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning», 22 de febrero de 2018, arXiv: arXiv:1710.07654.	es_CO
dc.relation.references	A. van den Oord et al., «Parallel WaveNet: Fast High-Fidelity Speech Synthesis», 28 de noviembre de 2017, arXiv: arXiv:1711.10433. doi: 10.48550/arXiv.1711.10433.	es_CO
dc.relation.references	Y. Taigman, L. Wolf, A. Polyak, y E. Nachmani, «VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop», 1 de febrero de 2018, arXiv: arXiv:1707.06588. doi: 10.48550/arXiv.1707.06588.	es_CO
dc.relation.references	J. Shen et al., «Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions», 15 de febrero de 2018, arXiv: arXiv:1712.05884. doi: 10.48550/arXiv.1712.05884.	es_CO
dc.relation.references	F. Tao y C. Busso, «End-to-End Audiovisual Speech Recognition System With Multitask Learning», IEEE Trans. Multimed., vol. 23, pp. 1-11, 2021, doi: 10.1109/TMM.2020.2975922.	es_CO
dc.relation.references	I. Elias et al., «Parallel Tacotron: Non-Autoregressive and Controllable TTS», 22 de octubre de 2020, arXiv: arXiv:2010.11439. doi: 10.48550/arXiv.2010.11439.	es_CO
dc.relation.references	D. Nguyen, K. Nguyen, S. Sridharan, A. Ghasemi, D. Dean, y C. Fookes, «Deep Spatio-Temporal Features for Multimodal Emotion Recognition», en 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), mar. 2017, pp. 1215-1223. doi: 10.1109/WACV.2017.140.	es_CO
dc.relation.references	D. Nguyen, K. Nguyen, S. Sridharan, D. Dean, y C. Fookes, «Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition», Comput. Vis. Image Underst., vol. 174, pp. 33-42, sep. 2018, doi: 10.1016/j.cviu.2018.06.005.	es_CO
dc.relation.references	D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, y R. Zimmermann, «ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection», en Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, y J. Tsujii, Eds., Brussels, Belgium: Association for Computational Linguistics, oct. 2018, pp. 2594-2604. doi: 10.18653/v1/D18-1280.	es_CO
dc.relation.references	L. Chong, M. Jin, y Y. He, EmoChat: Bringing Multimodal Emotion Detection to Mobile Conversation. 2019, p. 221. doi: 10.1109/BIGCOM.2019.00037.	es_CO
dc.relation.references	«Multistep Deep System for Multimodal Emotion Detection With Invalid Data in the Internet of Things \| IEEE Journals & Magazine \| IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9216023	es_CO
dc.relation.references	H. Lai, H. Chen, y S. Wu, «Different Contextual Window Sizes Based RNNs for Multimodal Emotion Detection in Interactive Conversations», IEEE Access, vol. 8, pp. 119516-119526, 2020, doi: 10.1109/ACCESS.2020.3005664.	es_CO
dc.relation.references	R.-H. Huan, J. Shu, S.-L. Bao, R.-H. Liang, P. Chen, y K.-K. Chi, «Video multimodal emotion recognition based on Bi-GRU and attention fusion», Multimed. Tools Appl., vol. 80, n.o 6, pp. 8213-8240, mar. 2021, doi: 10.1007/s11042-020-10030-4.	es_CO
dc.relation.references	Y. Gao, H. Zhang, X. Zhao, y S. Yan, «Event Classification in Microblogs via Social Tracking», ACM Trans. Intell. Syst. Technol., vol. 8, n.o 3, p. 35:1-35:14, feb. 2017, doi: 10.1145/2967502.	es_CO
dc.relation.references	Z. Yang, Q. Li, W. Liu, y J. Lv, «Shared Multi-View Data Representation for Multi-Domain Event Detection», IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, n.o 5, pp. 1243-1256, may 2020, doi: 10.1109/TPAMI.2019.2893953.	es_CO
dc.relation.references	«Prediction of Alzheimer’s disease based on deep neural network by integrating gene expression and DNA methylation dataset - ScienceDirect». Accedido: 24 de noviembre de 2023. [En línea]. Disponible en: https://www-sciencedirect-com.biblioteca.unimagdalena.edu.co/science/article/pii/S0957417419305834	es_CO
dc.relation.references	M. J. Rafiee, K. Eyre, M. Leo, M. Benovoy, M. G. Friedrich, y M. Chetrit, «Comprehensive review of artifacts in cardiac MRI and their mitigation», Int. J. Cardiovasc. Imaging, vol. 40, n.o 10, pp. 2021-2039, oct. 2024, doi: 10.1007/s10554-024-03234-4.	es_CO
dc.relation.references	H. Suresh, N. Hunt, A. Johnson, L. A. Celi, P. Szolovits, y M. Ghassemi, «Clinical Intervention Prediction and Understanding using Deep Networks», 23 de mayo de 2017, arXiv: arXiv:1705.08498. doi: 10.48550/arXiv.1705.08498.	es_CO
dc.relation.references	Y. Chang et al., «Cancer Drug Response Profile scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature», Sci. Rep., vol. 8, n.o 1, p. 8857, jun. 2018, doi: 10.1038/s41598-018-27214-6.	es_CO
dc.relation.references	C. Peng, Y. Zheng, y D.-S. Huang, «Capsule Network Based Modeling of Multi-omics Data for Discovery of Breast Cancer-Related Genes», IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 17, n.o 5, pp. 1605-1612, 2020, doi: 10.1109/TCBB.2019.2909905.	es_CO
dc.relation.references	Y. Fu et al., «A gene prioritization method based on a swine multi-omics knowledgebase and a deep learning model», Commun. Biol., vol. 3, n.o 1, Art. n.o 1, sep. 2020, doi: 10.1038/s42003-020-01233-4.	es_CO
dc.relation.references	I. Bichindaritz, G. Liu, y C. Bartlett, «Integrative survival analysis of breast cancer with gene expression and DNA methylation data», Bioinforma. Oxf. Engl., vol. 37, n.o 17, pp. 2601-2608, sep. 2021, doi: 10.1093/bioinformatics/btab140.	es_CO
dc.relation.references	«Frontiers \| SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer». Accedido: 24 de noviembre de 2023. [En línea]. Disponible en: https://www.frontiersin.org/articles/10.3389/fgene.2019.00166/full	es_CO
dc.relation.references	«Predicting Alzheimer’s disease progression using multi-modal deep learning approach \| Scientific Reports». Accedido: 24 de noviembre de 2023. [En línea]. Disponible en: https://www-nature-com.biblioteca.unimagdalena.edu.co/articles/s41598-018-37769-z	es_CO
dc.relation.references	O. B. Poirion, K. Chaudhary, y L. X. Garmire, «Deep Learning data integration for better risk stratification models of bladder cancer», AMIA Jt. Summits Transl. Sci. Proc. AMIA Jt. Summits Transl. Sci., vol. 2017, pp. 197-206, 2018.	es_CO
dc.relation.references	S. Takahashi et al., «Predicting Deep Learning Based Multi-Omics Parallel Integration Survival Subtypes in Lung Cancer Using Reverse Phase Protein Array Data», Biomolecules, vol. 10, n.o 10, p. 1460, oct. 2020, doi: 10.3390/biom10101460.	es_CO
dc.relation.references	O. B. Poirion, Z. Jing, K. Chaudhary, S. Huang, y L. X. Garmire, «DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data», Genome Med., vol. 13, n.o 1, p. 112, jul. 2021, doi: 10.1186/s13073-021-00930-x.	es_CO
dc.relation.references	L. Tong, J. Mitchel, K. Chatlin, y M. D. Wang, «Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis», BMC Med. Inform. Decis. Mak., vol. 20, n.o 1, p. 225, sep. 2020, doi: 10.1186/s12911-020-01225-8.	es_CO
dc.relation.references	T. Ma y A. Zhang, «Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)», BMC Genomics, vol. 20, n.o 11, p. 944, dic. 2019, doi: 10.1186/s12864-019-6285-x.	es_CO
dc.relation.references	M. T. Hira, M. A. Razzaque, C. Angione, J. Scrivens, S. Sawan, y M. Sarker, «Integrated multi-omics analysis of ovarian cancer using variational autoencoders», Sci. Rep., vol. 11, n.o 1, Art. n.o 1, mar. 2021, doi: 10.1038/s41598-021-85285-4.	es_CO
dc.relation.references	S. Albaradei, F. Napolitano, M. A. Thafar, T. Gojobori, M. Essack, y X. Gao, «MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data», Comput. Struct. Biotechnol. J., vol. 19, pp. 4404-4411, 2021, doi: 10.1016/j.csbj.2021.08.006.	es_CO
dc.relation.references	J. Huang, X. Zhang, Q. Xin, Y. Sun, y P. Zhang, «Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network», ISPRS J. Photogramm. Remote Sens., vol. 151, pp. 91-105, may 2019, doi: 10.1016/j.isprsjprs.2019.02.019.	es_CO
dc.relation.references	G. Masi, D. Cozzolino, L. Verdoliva, y G. Scarpa, «Pansharpening by Convolutional Neural Networks», Remote Sens., vol. 8, n.o 7, Art. n.o 7, jul. 2016, doi: 10.3390/rs8070594.	es_CO
dc.relation.references	Q. Liu, H. Zhou, Q. Xu, X. Liu, y Y. Wang, «PSGAN: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening», IEEE Trans. Geosci. Remote Sens., vol. 59, n.o 12, pp. 10227-10242, dic. 2021, doi: 10.1109/TGRS.2020.3042974.	es_CO
dc.relation.references	T.-J. Zhang, L.-J. Deng, T.-Z. Huang, J. Chanussot, y G. Vivone, «A Triple-Double Convolutional Neural Network for Panchromatic Sharpening», IEEE Trans. Neural Netw. Learn. Syst., vol. 34, n.o 11, pp. 9088-9101, nov. 2023, doi: 10.1109/TNNLS.2022.3155655.	es_CO
dc.relation.references	H. Zhou, Q. Liu, y Y. Wang, «PanFormer: a Transformer Based Model for Pan-sharpening», 22 de marzo de 2022, arXiv: arXiv:2203.02916. doi: 10.48550/arXiv.2203.02916.	es_CO
dc.relation.references	F. Palsson, J. R. Sveinsson, y M. O. Ulfarsson, «Multispectral and Hyperspectral Image Fusion Using a 3-D-Convolutional Neural Network», IEEE Geosci. Remote Sens. Lett., vol. 14, n.o 5, pp. 639-643, may 2017, doi: 10.1109/LGRS.2017.2668299.	es_CO
dc.relation.references	«Physics-Based GAN With Iterative Refinement Unit for Hyperspectral and Multispectral Image Fusion \| IEEE Journals & Magazine \| IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9435191	es_CO
dc.relation.references	J.-F. Hu, T.-Z. Huang, y L.-J. Deng, «Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution», IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1-5, 2022, doi: 10.1109/LGRS.2022.3194257.	es_CO
dc.relation.references	W. G. C. Bandara, J. M. J. Valanarasu, y V. M. Patel, «Hyperspectral Pansharpening Based on Improved Deep Image Prior and Residual Reconstruction», IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1-16, 2022, doi: 10.1109/TGRS.2021.3139292.	es_CO
dc.relation.references	«HPGAN: Hyperspectral Pansharpening Using 3-D Generative Adversarial Networks \| IEEE Journals & Magazine \| IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9097446	es_CO
dc.relation.references	«Hyperspectral and LiDAR Data Fusion Using Extinction Profiles and Deep Convolutional Neural Network \| IEEE Journals & Magazine \| IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/7786851	es_CO
dc.relation.references	«Multimodal Hyperspectral Unmixing: Insights From Attention Networks \| IEEE Journals & Magazine \| IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9724217	es_CO
dc.relation.references	S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, y L. Jiao, «A deep learning framework for remote sensing image registration», ISPRS J. Photogramm. Remote Sens., vol. 145, pp. 148-164, nov. 2018, doi: 10.1016/j.isprsjprs.2017.12.012.	es_CO
dc.relation.references	«Remote Sensing \| Free Full-Text \| A Fusion Method of Optical Image and SAR Image Based on Dense-UGAN and Gram–Schmidt Transformation». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://www.mdpi.com/2072-4292/13/21/4274	es_CO
dc.relation.references	A. Meraner, P. Ebel, X. X. Zhu, y M. Schmitt, «Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion», ISPRS J. Photogramm. Remote Sens., vol. 166, pp. 333-346, ago. 2020, doi: 10.1016/j.isprsjprs.2020.05.013.	es_CO
dc.relation.references	J. Hu, L. Mou, A. Schmitt, y X. X. Zhu, «FusioNet: A two-stream convolutional neural network for urban scene classification using PolSAR and hyperspectral data», en 2017 Joint Urban Remote Sensing Event (JURSE), mar. 2017, pp. 1-4. doi: 10.1109/JURSE.2017.7924565.	es_CO
dc.relation.references	J. Li, Z. Liu, X. Lei, y L. Wang, «Distributed Fusion of Heterogeneous Remote Sensing and Social Media Data: A Review and New Developments», Proc. IEEE, vol. 109, n.o 8, pp. 1350-1363, ago. 2021, doi: 10.1109/JPROC.2021.3079176.	es_CO
dc.relation.references	Z. Shao, L. Zhang, y L. Wang, «Stacked Sparse Autoencoder Modeling Using the Synergy of Airborne LiDAR and Satellite Optical and SAR Data to Map Forest Above-Ground Biomass», IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 10, n.o 12, pp. 5569-5582, dic. 2017, doi: 10.1109/JSTARS.2017.2748341.	es_CO
dc.relation.references	J. Li et al., «Deep learning in multimodal remote sensing data fusion: A comprehensive review», Int. J. Appl. Earth Obs. Geoinformation, vol. 112, p. 102926, ago. 2022, doi: 10.1016/j.jag.2022.102926.	es_CO
dc.relation.references	Y. Xu et al., «Transformers in computational visual media: A survey», Comput. Vis. Media, vol. 8, n.o 1, pp. 33-62, mar. 2022, doi: 10.1007/s41095-021-0247-3. [112] «A Survey of Visual Transformers». Accedido: 22 de marzo de 2025. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/10088164	es_CO
dc.relation.references	S. Lee, Y. Yu, G. Kim, T. Breuel, J. Kautz, y Y. Song, «Parameter Efficient Multimodal Transformers for Video Representation Learning», 22 de septiembre de 2021, arXiv: arXiv:2012.04124. doi: 10.48550/arXiv.2012.04124.	es_CO
dc.relation.references	Z. Pan, B. Zhuang, J. Liu, H. He, y J. Cai, «Scalable Vision Transformers with Hierarchical Pooling», 18 de agosto de 2021, arXiv: arXiv:2103.10619. doi: 10.48550/arXiv.2103.10619.	es_CO
dc.relation.references	X. Chu, Z. Tian, B. Zhang, X. Wang, y C. Shen, «Conditional Positional Encodings for Vision Transformers», 13 de febrero de 2023, arXiv: arXiv:2102.10882. doi: 10.48550/arXiv.2102.10882.	es_CO
dc.relation.references	J. Fang, L. Xie, X. Wang, X. Zhang, W. Liu, y Q. Tian, «MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens», 25 de marzo de 2022, arXiv: arXiv:2105.15168. doi: 10.48550/arXiv.2105.15168.	es_CO
dc.relation.references	B. Wu et al., «Visual Transformers: Token-based Image Representation and Processing for Computer Vision», 20 de noviembre de 2020, arXiv: arXiv:2006.03677. doi: 10.48550/arXiv.2006.03677.	es_CO
dc.relation.references	R. Mallick, J. Benois-Pineau, y A. Zemmari, «I Saw: A Self-Attention Weighted Method for Explanation of Visual Transformers», en 2022 IEEE International Conference on Image Processing (ICIP), oct. 2022, pp. 3271-3275. doi: 10.1109/ICIP46576.2022.9897347.	es_CO
dc.relation.references	Q. Zhang, Y. Xu, J. Zhang, y D. Tao, «ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond», Int. J. Comput. Vis., vol. 131, n.o 5, pp. 1141-1162, may 2023, doi: 10.1007/s11263-022-01739-w.	es_CO
dc.relation.references	S. Robles-Serrano, G. Sanchez-Torres, y J. Branch-Bedoya, «Automatic Detection of Traffic Accidents from Video Using Deep Learning Techniques», Computers, vol. 10, n.o 11, Art. n.o 11, nov. 2021, doi: 10.3390/computers10110148. [121] H. Hozhabr Pour et al., «A Machine Learning Framework for Automated Accident Detection Based on Multimodal Sensors in Cars», Sensors, vol. 22, n.o 10, Art. n.o 10, ene. 2022, doi: 10.3390/s22103634.	es_CO
dc.relation.references	I. de Zarzà, J. de Curtò, G. Roig, y C. T. Calafate, «LLM Multimodal Traffic Accident Forecasting», Sensors, vol. 23, n.o 22, Art. n.o 22, ene. 2023, doi: 10.3390/s23229225.	es_CO
dc.relation.references	«liuhaotian/llava-v1.5-7b · Hugging Face». Accedido: 31 de marzo de 2025. [En línea]. Disponible en: https://huggingface.co/liuhaotian/llava-v1.5-7b	es_CO
dc.rights.accessrights	http://purl.org/coar/access_right/c_abf2	es_CO
dc.type.coarversion	http://purl.org/coar/resource_type/c_2df8fbb1	es_CO
Aparece en las colecciones:	Revista Colombiana de Tecnologias de Avanzada (RCTA)

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
Art22_V1_N45_2025_esp.pdf	Art22_V1_N45_2025_esp	466,03 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem

DSpace JSPUI

DSpace almacena y facilita el acceso abierto a todo tipo de contenido digital incluyendo texto, imágenes, vídeos y colecciones de datos.