• Repositorio Institucional Universidad de Pamplona
  • Producción Editorial Universidad de Pamplona
  • Revistas
  • Revista Colombiana de Tecnologias de Avanzada (RCTA)
  • Por favor, use este identificador para citar o enlazar este ítem: http://repositoriodspace.unipamplona.edu.co/jspui/handle/20.500.12744/10379
    Registro completo de metadatos
    Campo DC Valor Lengua/Idioma
    dc.contributor.authorRíos Pérez, Jesús David-
    dc.contributor.authorSánchez Torres, German-
    dc.contributor.authorHenriquez Miranda, Carlos-
    dc.date.accessioned2025-10-14T22:09:57Z-
    dc.date.available2025-10-14T22:09:57Z-
    dc.date.issued2025-01-01-
    dc.identifier.citationJ. D. Ríos Pérez, G. Sánchez Torres, y C. Henríquez Miranda, «Una arquitectura de aprendizaje profundo multimodal basada en ViT para la clasificación binaria de accidentes de tráfico», RCTA, vol. 1, n.º 45, pp. 225–239, may 2025. https://doi.org/10.24054/rcta.v1i45.3751es_CO
    dc.identifier.issn1692-7257-
    dc.identifier.issn2500-8625-
    dc.identifier.urihttp://repositoriodspace.unipamplona.edu.co/jspui/handle/20.500.12744/10379-
    dc.descriptionCada año, más de un millón de personas mueren debido a accidentes de tráfico, y un tercio de estas vidas podrían salvarse reduciendo el tiempo de respuesta médica. El aprendizaje profundo multimodal (MMDL) ha surgido en los últimos años como una poderosa herramienta que integra diferentes tipos de datos para mejorar las capacidades de toma de decisiones en los modelos. Además, los Transformadores Visuales (ViT) son un enfoque de aprendizaje profundo para procesar imágenes y videos que ha mostrado resultados prometedores en varias áreas del conocimiento. En este proyecto, proponemos una arquitectura basada en ViT para la clasificación binaria de accidentes de tráfico utilizando datos de múltiples fuentes, como datos ambientales e imágenes. La integración de un enfoque MMDL basado en ViT puede mejorar la precisión del modelo en la clasificación de accidentes y no accidentes. Este proyecto explora un enfoque MMDL integrando ViT para la monitorización de accidentes de tráfico en el contexto de las ciudades inteligentes, logrando un recall del 91%, lo que evidencia una alta robustez del modelo en la identificación de casos positivos. Sin embargo, la escasez de datos multimodales representa un gran desafío para el entrenamiento de este tipo de modelos.es_CO
    dc.description.abstractEach year, more than 1 million people die due to traffic accidents, and one-third of these lives could be saved by reducing medical response time. Multi-Modal Deep Learning (MMDL) has emerged in recent years as a powerful tool that integrates different types of data to enhance decision-making capabilities in models. Additionally, Vision Transformers (ViT) are a Deep Learning approach for processing images and videos that has shown promising results in various fields of knowledge. In this project, we propose a ViT-based architecture for binary classification of traffic accidents using data from multiple sources, such as environmental data and images. The integration of an MMDL approach based on ViT can improve the model's accuracy in classifying accidents and non-accidents. This project explores a MMDL approach integrating ViT for traffic accident monitoring in the context of smart cities, achieving a recall of 91%, which evidences a high robustness of the model in identifying positive cases. However, the scarcity of multimodal data represents a major challenge for training these types of models.es_CO
    dc.format.extent15es_CO
    dc.format.mimetypeapplication/pdfes_CO
    dc.language.isoeses_CO
    dc.publisherAldo Pardo García, Revista Colombiana de Tecnologías de Avanzada, Universidad de Pamplona.es_CO
    dc.relation.ispartofseries225;239-
    dc.subjectmultimodales_CO
    dc.subjectaprendizaje profundoes_CO
    dc.subjecttransformadores visualeses_CO
    dc.subjectaccidentes de tránsitoes_CO
    dc.titleUna arquitectura de aprendizaje profundo multimodal basada en ViT para la clasificación binaria de accidentes de tráficoes_CO
    dc.title.alternativeA Multi-Modal ViT-Based Deep Learning Architecture for Binary Classification of Traffic Accidentes_CO
    dc.typehttp://purl.org/coar/resource_type/c_2df8fbb1es_CO
    dc.description.editionVol. 1 Núm. 45 (2025): Enero – Junioes_CO
    dc.relation.references«Traumatismos causados por el tránsito». Accedido: 18 de marzo de 2025. [En línea]. Disponible en: https://www.who.int/es/news-room/fact-sheets/detail/road-traffic-injurieses_CO
    dc.relation.referencesM. T. Pulgarín et al., «Autores: Agencia Nacional de Seguridad Vial».es_CO
    dc.relation.referencesR. Sánchez-Mangas, A. García-Ferrrer, A. de Juan, y A. M. Arroyo, «The probability of death in road traffic accidents. How important is a quick medical response?», Accid. Anal. Prev., vol. 42, n.o 4, pp. 1048-1056, jul. 2010, doi: 10.1016/j.aap.2009.12.012.es_CO
    dc.relation.referencesY. Li, F.-X. Wu, y A. Ngom, «A review on machine learning principles for multi-view biological data integration», Brief. Bioinform., vol. 19, n.o 2, pp. 325-340, mar. 2018, doi: 10.1093/bib/bbw113.es_CO
    dc.relation.referencesC. Manzoni et al., «Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences», Brief. Bioinform., vol. 19, n.o 2, pp. 286-302, mar. 2018, doi: 10.1093/bib/bbw114.es_CO
    dc.relation.references«Milestones in Genomic Sequencing». Accedido: 23 de noviembre de 2023. [En línea]. Disponible en: https://www-nature-com.biblioteca.unimagdalena.edu.co/immersive/d42859-020-00099-0/index.htmles_CO
    dc.relation.referencesS. R. Stahlschmidt, B. Ulfenborg, y J. Synnergren, «Multimodal deep learning for biomedical data fusion: a review», Brief. Bioinform., vol. 23, n.o 2, p. bbab569, ene. 2022, doi: 10.1093/bib/bbab569.es_CO
    dc.relation.references«Single-cell multiomics: technologies and data analysis methods | Experimental & Molecular Medicine». Accedido: 23 de noviembre de 2023. [En línea]. Disponible en: https://www-nature-com.biblioteca.unimagdalena.edu.co/articles/s12276-020-0420-2es_CO
    dc.relation.referencesT. Baltrušaitis, C. Ahuja, y L.-P. Morency, «Multimodal Machine Learning: A Survey and Taxonomy», IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, n.o 2, pp. 423-443, feb. 2019, doi: 10.1109/TPAMI.2018.2798607.es_CO
    dc.relation.references«The promise and challenges of multimodal learning analytics», doi: 10.1111/bjet.13015.es_CO
    dc.relation.referencesD. Hong et al., «More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification», IEEE Trans. Geosci. Remote Sens., vol. 59, n.o 5, pp. 4340-4354, may 2021, doi: 10.1109/TGRS.2020.3016820.es_CO
    dc.relation.referencesS. Jabeen, X. Li, M. S. Amin, O. Bourahla, S. Li, y A. Jabbar, «A Review on Methods and Applications in Multimodal Deep Learning», ACM Trans. Multimed. Comput. Commun. Appl., vol. 19, n.o 2s, p. 76:1-76:41, feb. 2023, doi: 10.1145/3545572.es_CO
    dc.relation.referencesProceedings of the 2020 International Conference on Multimodal Interaction. Association for Computing Machinery, 2020.es_CO
    dc.relation.referencesJ. Chen et al., «HEU Emotion: a large-scale database for multimodal emotion recognition in the wild», Neural Comput. Appl., vol. 33, n.o 14, pp. 8669-8685, jul. 2021, doi: 10.1007/s00521-020-05616-w.es_CO
    dc.relation.references«Improving reasoning with contrastive visual information for visual question answering - Long - 2021 - Electronics Letters - Wiley Online Library». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/ell2.12255es_CO
    dc.relation.referencesB. P. Yuhas, M. H. Goldstein, y T. J. Sejnowski, «Integration of acoustic and visual speech signals using neural networks», IEEE Commun. Mag., vol. 27, n.o 11, pp. 65-71, nov. 1989, doi: 10.1109/35.41402.es_CO
    dc.relation.referencesS. Bai y S. An, «A survey on automatic image caption generation», Neurocomputing, vol. 311, pp. 291-304, oct. 2018, doi: 10.1016/j.neucom.2018.05.080.es_CO
    dc.relation.references«Future Internet | Free Full-Text | Video Captioning Based on Channel Soft Attention and Semantic Reconstructor». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://www.mdpi.com/1999-5903/13/2/55es_CO
    dc.relation.referencesR. Souza, A. Fernandes, T. S. F. X. Teixeira, G. Teodoro, y R. Ferreira, «Online multimedia retrieval on CPU–GPU platforms with adaptive work partition», J. Parallel Distrib. Comput., vol. 148, pp. 31-45, feb. 2021, doi: 10.1016/j.jpdc.2020.10.001.es_CO
    dc.relation.referencesP. K. Atrey, M. A. Hossain, A. El Saddik, y M. S. Kankanhalli, «Multimodal fusion for multimedia analysis: a survey», Multimed. Syst., vol. 16, n.o 6, pp. 345-379, nov. 2010, doi: 10.1007/s00530-010-0182-0.es_CO
    dc.relation.referencesC. G. M. Snoek y M. Worring, «Multimodal Video Indexing: A Review of the State-of-the-art», Multimed. Tools Appl., vol. 25, n.o 1, pp. 5-35, ene. 2005, doi: 10.1023/B:MTAP.0000046380.27575.a5.es_CO
    dc.relation.referencesA. H. Yazdavar et al., «Multimodal mental health analysis in social media», PLoS ONE, vol. 15, n.o 4, p. e0226248, abr. 2020, doi: 10.1371/journal.pone.0226248.es_CO
    dc.relation.references«Sensors | Free Full-Text | Effective Techniques for Multimodal Data Fusion: A Comparative Analysis». Accedido: 8 de diciembre de 2023. [En línea]. Disponible en: https://www.mdpi.com/1424-8220/23/5/2381es_CO
    dc.relation.references«Cascade recurrent neural network for image caption generation - Wu - 2017 - Electronics Letters - Wiley Online Library». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/el.2017.3159es_CO
    dc.relation.referencesM. Chen, G. Ding, S. Zhao, H. Chen, Q. Liu, y J. Han, «Reference Based LSTM for Image Captioning», Proc. AAAI Conf. Artif. Intell., vol. 31, n.o 1, Art. n.o 1, feb. 2017, doi: 10.1609/aaai.v31i1.11198.es_CO
    dc.relation.referencesW. Jiang, L. Ma, Y.-G. Jiang, W. Liu, y T. Zhang, «Recurrent Fusion Network for Image Captioning», 30 de julio de 2018, arXiv: arXiv:1807.09986. doi: 10.48550/arXiv.1807.09986.es_CO
    dc.relation.referencesJ. Ji et al., «Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network», Proc. AAAI Conf. Artif. Intell., vol. 35, n.o 2, Art. n.o 2, may 2021, doi: 10.1609/aaai.v35i2.16258.es_CO
    dc.relation.referencesZ. Zhang, Q. Wu, Y. Wang, y F. Chen, «High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention», IEEE Trans. Multimed., vol. 21, n.o 7, pp. 1681-1693, jul. 2019, doi: 10.1109/TMM.2018.2888822.es_CO
    dc.relation.referencesP. Cao, Z. Yang, L. Sun, Y. Liang, M. Q. Yang, y R. Guan, «Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory», Neural Process. Lett., vol. 50, n.o 1, pp. 103-119, ago. 2019, doi: 10.1007/s11063-018-09973-5.es_CO
    dc.relation.references«Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation | IEEE Journals & Magazine | IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9174742es_CO
    dc.relation.referencesL. Chen, Z. Jiang, J. Xiao, y W. Liu, «Human-like Controllable Image Captioning with Verb-specific Semantic Roles», 22 de marzo de 2021, arXiv: arXiv:2103.12204. doi: 10.48550/arXiv.2103.12204.es_CO
    dc.relation.referencesB. Wang, L. Ma, W. Zhang, y W. Liu, «Reconstruction Network for Video Captioning», en 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, jun. 2018, pp. 7622-7631. doi: 10.1109/CVPR.2018.00795.es_CO
    dc.relation.referencesW. Pei, J. Zhang, X. Wang, L. Ke, X. Shen, y Y.-W. Tai, «Memory-Attended Recurrent Network for Video Captioning», en 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA: IEEE, jun. 2019, pp. 8339-8348. doi: 10.1109/CVPR.2019.00854.es_CO
    dc.relation.referencesN. Aafaq, N. Akhtar, W. Liu, S. Z. Gilani, y A. Mian, «Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning», 29 de abril de 2019, arXiv: arXiv:1902.10322. Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: http://arxiv.org/abs/1902.10322es_CO
    dc.relation.referencesS. Liu, Z. Ren, y J. Yuan, «SibNet: Sibling Convolutional Encoder for Video Captioning», IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, n.o 9, pp. 3259-3272, sep. 2021, doi: 10.1109/TPAMI.2019.2940007.es_CO
    dc.relation.referencesJ. Perez-Martin, B. Bustos, y J. Perez, «Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding», en 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA: IEEE, ene. 2021, pp. 3038-3048. doi: 10.1109/WACV48630.2021.00308.es_CO
    dc.relation.referencesM. M. Rahman, T. Abedin, K. S. S. Prottoy, A. Moshruba, y F. H. Siddiqui, «Semantically Sensible Video Captioning (SSVC)», ArXiv, sep. 2020, Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://www.semanticscholar.org/paper/Semantically-Sensible-Video-Captioning-(SSVC)-Rahman-Abedin/cf2193f4e9e203fe05addffabed27e0c37a89efaes_CO
    dc.relation.referencesZ. Fang, T. Gokhale, P. Banerjee, C. Baral, y Y. Yang, «Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning», en Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, y Y. Liu, Eds., Online: Association for Computational Linguistics, nov. 2020, pp. 840-860. doi: 10.18653/v1/2020.emnlp-main.61.es_CO
    dc.relation.referencesZ. Zhang, D. Xu, W. Ouyang, y L. Zhou, «Dense Video Captioning Using Graph-Based Sentence Summarization», IEEE Trans. Multimed., vol. 23, pp. 1799-1810, 2021, doi: 10.1109/TMM.2020.3003592.es_CO
    dc.relation.referencesX. Wang, W. Chen, J. Wu, Y.-F. Wang, y W. Y. Wang, «Video Captioning via Hierarchical Reinforcement Learning», 29 de marzo de 2018, arXiv: arXiv:1711.11135. doi: 10.48550/arXiv.1711.11135.es_CO
    dc.relation.referencesY. Chen, S. Wang, W. Zhang, y Q. Huang, «Less Is More: Picking Informative Frames for Video Captioning», 4 de marzo de 2018, arXiv: arXiv:1803.01457. doi: 10.48550/arXiv.1803.01457.es_CO
    dc.relation.referencesL. Li y B. Gong, «End-to-End Video Captioning With Multitask Reinforcement Learning», en 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), ene. 2019, pp. 339-348. doi: 10.1109/WACV.2019.00042.es_CO
    dc.relation.referencesJ. Mun, L. Yang, Z. Ren, N. Xu, y B. Han, «Streamlined Dense Video Captioning», 8 de abril de 2019, arXiv: arXiv:1904.03870. doi: 10.48550/arXiv.1904.03870.es_CO
    dc.relation.referencesW. Zhang, B. Wang, L. Ma, y W. Liu, «Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning», 3 de junio de 2019, arXiv: arXiv:1906.01452. doi: 10.48550/arXiv.1906.01452.es_CO
    dc.relation.referencesW. Xu, J. Yu, Z. Miao, L. Wan, Y. Tian, y Q. Ji, «Deep Reinforcement Polishing Network for Video Captioning», IEEE Trans. Multimed., vol. 23, pp. 1772-1784, 2021, doi: 10.1109/TMM.2020.3002669.es_CO
    dc.relation.referencesH. Ben-younes, R. Cadene, M. Cord, y N. Thome, «MUTAN: Multimodal Tucker Fusion for Visual Question Answering», 18 de mayo de 2017, arXiv: arXiv:1705.06676. doi: 10.48550/arXiv.1705.06676.es_CO
    dc.relation.referencesR. Cadene, H. Ben-younes, M. Cord, y N. Thome, «MUREL: Multimodal Relational Reasoning for Visual Question Answering», 25 de febrero de 2019, arXiv: arXiv:1902.09487. doi: 10.48550/arXiv.1902.09487.es_CO
    dc.relation.referencesB. N. Patro, S. Pate, y V. P. Namboodiri, «Robust Explanations for Visual Question Answering», 23 de enero de 2020, arXiv: arXiv:2001.08730. Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: http://arxiv.org/abs/2001.08730es_CO
    dc.relation.referencesS. Lobry, D. Marcos, J. Murray, y D. Tuia, «RSVQA: Visual Question Answering for Remote Sensing Data», IEEE Trans. Geosci. Remote Sens., vol. 58, n.o 12, pp. 8555-8566, dic. 2020, doi: 10.1109/TGRS.2020.2988782.es_CO
    dc.relation.referencesZ. Yu, J. Yu, J. Fan, y D. Tao, «Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering», 4 de agosto de 2017, arXiv: arXiv:1708.01471. doi: 10.48550/arXiv.1708.01471.es_CO
    dc.relation.referencesP. Anderson et al., «Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering», 14 de marzo de 2018, arXiv: arXiv:1707.07998. doi: 10.48550/arXiv.1707.07998.es_CO
    dc.relation.referencesZ. Yu, J. Yu, Y. Cui, D. Tao, y Q. Tian, «Deep Modular Co-Attention Networks for Visual Question Answering», 25 de junio de 2019, arXiv: arXiv:1906.10770. doi: 10.48550/arXiv.1906.10770.es_CO
    dc.relation.referencesL. Li, Z. Gan, Y. Cheng, y J. Liu, «Relation-Aware Graph Attention Network for Visual Question Answering», 9 de octubre de 2019, arXiv: arXiv:1903.12314. doi: 10.48550/arXiv.1903.12314.es_CO
    dc.relation.referencesP. Wang, Q. Wu, C. Shen, A. Dick, y A. van den Hengel, «FVQA: Fact-Based Visual Question Answering», IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, n.o 10, pp. 2413-2427, oct. 2018, doi: 10.1109/TPAMI.2017.2754246.es_CO
    dc.relation.referencesK. Marino, M. Rastegari, A. Farhadi, y R. Mottaghi, «OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge», 4 de septiembre de 2019, arXiv: arXiv:1906.00067. doi: 10.48550/arXiv.1906.00067.es_CO
    dc.relation.referencesJ. Yu, Z. Zhu, Y. Wang, W. Zhang, Y. Hu, y J. Tan, «Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering», Pattern Recognit., vol. 108, p. 107563, dic. 2020, doi: 10.1016/j.patcog.2020.107563.es_CO
    dc.relation.referencesK. Basu, F. Shakerin, y G. Gupta, «AQuA: ASP-Based Visual Question Answering», en Practical Aspects of Declarative Languages: 22nd International Symposium, PADL 2020, New Orleans, LA, USA, January 20–21, 2020, Proceedings, Berlin, Heidelberg: Springer-Verlag, ene. 2020, pp. 57-72. doi: 10.1007/978-3-030-39197-3_4.es_CO
    dc.relation.referencesY. Wang et al., «Tacotron: Towards End-to-End Speech Synthesis», 6 de abril de 2017, arXiv: arXiv:1703.10135. doi: 10.48550/arXiv.1703.10135.es_CO
    dc.relation.referencesS. O. Arik et al., «Deep Voice: Real-time Neural Text-to-Speech», 7 de marzo de 2017, arXiv: arXiv:1702.07825. doi: 10.48550/arXiv.1702.07825.es_CO
    dc.relation.referencesS. Arik et al., «Deep Voice 2: Multi-Speaker Neural Text-to-Speech», 20 de septiembre de 2017, arXiv: arXiv:1705.08947. doi: 10.48550/arXiv.1705.08947.es_CO
    dc.relation.referencesW. Ping et al., «Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning», 22 de febrero de 2018, arXiv: arXiv:1710.07654.es_CO
    dc.relation.referencesA. van den Oord et al., «Parallel WaveNet: Fast High-Fidelity Speech Synthesis», 28 de noviembre de 2017, arXiv: arXiv:1711.10433. doi: 10.48550/arXiv.1711.10433.es_CO
    dc.relation.referencesY. Taigman, L. Wolf, A. Polyak, y E. Nachmani, «VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop», 1 de febrero de 2018, arXiv: arXiv:1707.06588. doi: 10.48550/arXiv.1707.06588.es_CO
    dc.relation.referencesJ. Shen et al., «Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions», 15 de febrero de 2018, arXiv: arXiv:1712.05884. doi: 10.48550/arXiv.1712.05884.es_CO
    dc.relation.referencesF. Tao y C. Busso, «End-to-End Audiovisual Speech Recognition System With Multitask Learning», IEEE Trans. Multimed., vol. 23, pp. 1-11, 2021, doi: 10.1109/TMM.2020.2975922.es_CO
    dc.relation.referencesI. Elias et al., «Parallel Tacotron: Non-Autoregressive and Controllable TTS», 22 de octubre de 2020, arXiv: arXiv:2010.11439. doi: 10.48550/arXiv.2010.11439.es_CO
    dc.relation.referencesD. Nguyen, K. Nguyen, S. Sridharan, A. Ghasemi, D. Dean, y C. Fookes, «Deep Spatio-Temporal Features for Multimodal Emotion Recognition», en 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), mar. 2017, pp. 1215-1223. doi: 10.1109/WACV.2017.140.es_CO
    dc.relation.referencesD. Nguyen, K. Nguyen, S. Sridharan, D. Dean, y C. Fookes, «Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition», Comput. Vis. Image Underst., vol. 174, pp. 33-42, sep. 2018, doi: 10.1016/j.cviu.2018.06.005.es_CO
    dc.relation.referencesD. Hazarika, S. Poria, R. Mihalcea, E. Cambria, y R. Zimmermann, «ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection», en Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, y J. Tsujii, Eds., Brussels, Belgium: Association for Computational Linguistics, oct. 2018, pp. 2594-2604. doi: 10.18653/v1/D18-1280.es_CO
    dc.relation.referencesL. Chong, M. Jin, y Y. He, EmoChat: Bringing Multimodal Emotion Detection to Mobile Conversation. 2019, p. 221. doi: 10.1109/BIGCOM.2019.00037.es_CO
    dc.relation.references«Multistep Deep System for Multimodal Emotion Detection With Invalid Data in the Internet of Things | IEEE Journals & Magazine | IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9216023es_CO
    dc.relation.referencesH. Lai, H. Chen, y S. Wu, «Different Contextual Window Sizes Based RNNs for Multimodal Emotion Detection in Interactive Conversations», IEEE Access, vol. 8, pp. 119516-119526, 2020, doi: 10.1109/ACCESS.2020.3005664.es_CO
    dc.relation.referencesR.-H. Huan, J. Shu, S.-L. Bao, R.-H. Liang, P. Chen, y K.-K. Chi, «Video multimodal emotion recognition based on Bi-GRU and attention fusion», Multimed. Tools Appl., vol. 80, n.o 6, pp. 8213-8240, mar. 2021, doi: 10.1007/s11042-020-10030-4.es_CO
    dc.relation.referencesY. Gao, H. Zhang, X. Zhao, y S. Yan, «Event Classification in Microblogs via Social Tracking», ACM Trans. Intell. Syst. Technol., vol. 8, n.o 3, p. 35:1-35:14, feb. 2017, doi: 10.1145/2967502.es_CO
    dc.relation.referencesZ. Yang, Q. Li, W. Liu, y J. Lv, «Shared Multi-View Data Representation for Multi-Domain Event Detection», IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, n.o 5, pp. 1243-1256, may 2020, doi: 10.1109/TPAMI.2019.2893953.es_CO
    dc.relation.references«Prediction of Alzheimer’s disease based on deep neural network by integrating gene expression and DNA methylation dataset - ScienceDirect». Accedido: 24 de noviembre de 2023. [En línea]. Disponible en: https://www-sciencedirect-com.biblioteca.unimagdalena.edu.co/science/article/pii/S0957417419305834es_CO
    dc.relation.referencesM. J. Rafiee, K. Eyre, M. Leo, M. Benovoy, M. G. Friedrich, y M. Chetrit, «Comprehensive review of artifacts in cardiac MRI and their mitigation», Int. J. Cardiovasc. Imaging, vol. 40, n.o 10, pp. 2021-2039, oct. 2024, doi: 10.1007/s10554-024-03234-4.es_CO
    dc.relation.referencesH. Suresh, N. Hunt, A. Johnson, L. A. Celi, P. Szolovits, y M. Ghassemi, «Clinical Intervention Prediction and Understanding using Deep Networks», 23 de mayo de 2017, arXiv: arXiv:1705.08498. doi: 10.48550/arXiv.1705.08498.es_CO
    dc.relation.referencesY. Chang et al., «Cancer Drug Response Profile scan (CDRscan): A Deep Learning Model That Predicts Drug Effectiveness from Cancer Genomic Signature», Sci. Rep., vol. 8, n.o 1, p. 8857, jun. 2018, doi: 10.1038/s41598-018-27214-6.es_CO
    dc.relation.referencesC. Peng, Y. Zheng, y D.-S. Huang, «Capsule Network Based Modeling of Multi-omics Data for Discovery of Breast Cancer-Related Genes», IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 17, n.o 5, pp. 1605-1612, 2020, doi: 10.1109/TCBB.2019.2909905.es_CO
    dc.relation.referencesY. Fu et al., «A gene prioritization method based on a swine multi-omics knowledgebase and a deep learning model», Commun. Biol., vol. 3, n.o 1, Art. n.o 1, sep. 2020, doi: 10.1038/s42003-020-01233-4.es_CO
    dc.relation.referencesI. Bichindaritz, G. Liu, y C. Bartlett, «Integrative survival analysis of breast cancer with gene expression and DNA methylation data», Bioinforma. Oxf. Engl., vol. 37, n.o 17, pp. 2601-2608, sep. 2021, doi: 10.1093/bioinformatics/btab140.es_CO
    dc.relation.references«Frontiers | SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer». Accedido: 24 de noviembre de 2023. [En línea]. Disponible en: https://www.frontiersin.org/articles/10.3389/fgene.2019.00166/fulles_CO
    dc.relation.references«Predicting Alzheimer’s disease progression using multi-modal deep learning approach | Scientific Reports». Accedido: 24 de noviembre de 2023. [En línea]. Disponible en: https://www-nature-com.biblioteca.unimagdalena.edu.co/articles/s41598-018-37769-zes_CO
    dc.relation.referencesO. B. Poirion, K. Chaudhary, y L. X. Garmire, «Deep Learning data integration for better risk stratification models of bladder cancer», AMIA Jt. Summits Transl. Sci. Proc. AMIA Jt. Summits Transl. Sci., vol. 2017, pp. 197-206, 2018.es_CO
    dc.relation.referencesS. Takahashi et al., «Predicting Deep Learning Based Multi-Omics Parallel Integration Survival Subtypes in Lung Cancer Using Reverse Phase Protein Array Data», Biomolecules, vol. 10, n.o 10, p. 1460, oct. 2020, doi: 10.3390/biom10101460.es_CO
    dc.relation.referencesO. B. Poirion, Z. Jing, K. Chaudhary, S. Huang, y L. X. Garmire, «DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data», Genome Med., vol. 13, n.o 1, p. 112, jul. 2021, doi: 10.1186/s13073-021-00930-x.es_CO
    dc.relation.referencesL. Tong, J. Mitchel, K. Chatlin, y M. D. Wang, «Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis», BMC Med. Inform. Decis. Mak., vol. 20, n.o 1, p. 225, sep. 2020, doi: 10.1186/s12911-020-01225-8.es_CO
    dc.relation.referencesT. Ma y A. Zhang, «Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)», BMC Genomics, vol. 20, n.o 11, p. 944, dic. 2019, doi: 10.1186/s12864-019-6285-x.es_CO
    dc.relation.referencesM. T. Hira, M. A. Razzaque, C. Angione, J. Scrivens, S. Sawan, y M. Sarker, «Integrated multi-omics analysis of ovarian cancer using variational autoencoders», Sci. Rep., vol. 11, n.o 1, Art. n.o 1, mar. 2021, doi: 10.1038/s41598-021-85285-4.es_CO
    dc.relation.referencesS. Albaradei, F. Napolitano, M. A. Thafar, T. Gojobori, M. Essack, y X. Gao, «MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data», Comput. Struct. Biotechnol. J., vol. 19, pp. 4404-4411, 2021, doi: 10.1016/j.csbj.2021.08.006.es_CO
    dc.relation.referencesJ. Huang, X. Zhang, Q. Xin, Y. Sun, y P. Zhang, «Automatic building extraction from high-resolution aerial images and LiDAR data using gated residual refinement network», ISPRS J. Photogramm. Remote Sens., vol. 151, pp. 91-105, may 2019, doi: 10.1016/j.isprsjprs.2019.02.019.es_CO
    dc.relation.referencesG. Masi, D. Cozzolino, L. Verdoliva, y G. Scarpa, «Pansharpening by Convolutional Neural Networks», Remote Sens., vol. 8, n.o 7, Art. n.o 7, jul. 2016, doi: 10.3390/rs8070594.es_CO
    dc.relation.referencesQ. Liu, H. Zhou, Q. Xu, X. Liu, y Y. Wang, «PSGAN: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening», IEEE Trans. Geosci. Remote Sens., vol. 59, n.o 12, pp. 10227-10242, dic. 2021, doi: 10.1109/TGRS.2020.3042974.es_CO
    dc.relation.referencesT.-J. Zhang, L.-J. Deng, T.-Z. Huang, J. Chanussot, y G. Vivone, «A Triple-Double Convolutional Neural Network for Panchromatic Sharpening», IEEE Trans. Neural Netw. Learn. Syst., vol. 34, n.o 11, pp. 9088-9101, nov. 2023, doi: 10.1109/TNNLS.2022.3155655.es_CO
    dc.relation.referencesH. Zhou, Q. Liu, y Y. Wang, «PanFormer: a Transformer Based Model for Pan-sharpening», 22 de marzo de 2022, arXiv: arXiv:2203.02916. doi: 10.48550/arXiv.2203.02916.es_CO
    dc.relation.referencesF. Palsson, J. R. Sveinsson, y M. O. Ulfarsson, «Multispectral and Hyperspectral Image Fusion Using a 3-D-Convolutional Neural Network», IEEE Geosci. Remote Sens. Lett., vol. 14, n.o 5, pp. 639-643, may 2017, doi: 10.1109/LGRS.2017.2668299.es_CO
    dc.relation.references«Physics-Based GAN With Iterative Refinement Unit for Hyperspectral and Multispectral Image Fusion | IEEE Journals & Magazine | IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9435191es_CO
    dc.relation.referencesJ.-F. Hu, T.-Z. Huang, y L.-J. Deng, «Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution», IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1-5, 2022, doi: 10.1109/LGRS.2022.3194257.es_CO
    dc.relation.referencesW. G. C. Bandara, J. M. J. Valanarasu, y V. M. Patel, «Hyperspectral Pansharpening Based on Improved Deep Image Prior and Residual Reconstruction», IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1-16, 2022, doi: 10.1109/TGRS.2021.3139292.es_CO
    dc.relation.references«HPGAN: Hyperspectral Pansharpening Using 3-D Generative Adversarial Networks | IEEE Journals & Magazine | IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9097446es_CO
    dc.relation.references«Hyperspectral and LiDAR Data Fusion Using Extinction Profiles and Deep Convolutional Neural Network | IEEE Journals & Magazine | IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/7786851es_CO
    dc.relation.references«Multimodal Hyperspectral Unmixing: Insights From Attention Networks | IEEE Journals & Magazine | IEEE Xplore». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/9724217es_CO
    dc.relation.referencesS. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, y L. Jiao, «A deep learning framework for remote sensing image registration», ISPRS J. Photogramm. Remote Sens., vol. 145, pp. 148-164, nov. 2018, doi: 10.1016/j.isprsjprs.2017.12.012.es_CO
    dc.relation.references«Remote Sensing | Free Full-Text | A Fusion Method of Optical Image and SAR Image Based on Dense-UGAN and Gram–Schmidt Transformation». Accedido: 19 de noviembre de 2023. [En línea]. Disponible en: https://www.mdpi.com/2072-4292/13/21/4274es_CO
    dc.relation.referencesA. Meraner, P. Ebel, X. X. Zhu, y M. Schmitt, «Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion», ISPRS J. Photogramm. Remote Sens., vol. 166, pp. 333-346, ago. 2020, doi: 10.1016/j.isprsjprs.2020.05.013.es_CO
    dc.relation.referencesJ. Hu, L. Mou, A. Schmitt, y X. X. Zhu, «FusioNet: A two-stream convolutional neural network for urban scene classification using PolSAR and hyperspectral data», en 2017 Joint Urban Remote Sensing Event (JURSE), mar. 2017, pp. 1-4. doi: 10.1109/JURSE.2017.7924565.es_CO
    dc.relation.referencesJ. Li, Z. Liu, X. Lei, y L. Wang, «Distributed Fusion of Heterogeneous Remote Sensing and Social Media Data: A Review and New Developments», Proc. IEEE, vol. 109, n.o 8, pp. 1350-1363, ago. 2021, doi: 10.1109/JPROC.2021.3079176.es_CO
    dc.relation.referencesZ. Shao, L. Zhang, y L. Wang, «Stacked Sparse Autoencoder Modeling Using the Synergy of Airborne LiDAR and Satellite Optical and SAR Data to Map Forest Above-Ground Biomass», IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 10, n.o 12, pp. 5569-5582, dic. 2017, doi: 10.1109/JSTARS.2017.2748341.es_CO
    dc.relation.referencesJ. Li et al., «Deep learning in multimodal remote sensing data fusion: A comprehensive review», Int. J. Appl. Earth Obs. Geoinformation, vol. 112, p. 102926, ago. 2022, doi: 10.1016/j.jag.2022.102926.es_CO
    dc.relation.referencesY. Xu et al., «Transformers in computational visual media: A survey», Comput. Vis. Media, vol. 8, n.o 1, pp. 33-62, mar. 2022, doi: 10.1007/s41095-021-0247-3. [112] «A Survey of Visual Transformers». Accedido: 22 de marzo de 2025. [En línea]. Disponible en: https://ieeexplore.ieee.org/document/10088164es_CO
    dc.relation.referencesS. Lee, Y. Yu, G. Kim, T. Breuel, J. Kautz, y Y. Song, «Parameter Efficient Multimodal Transformers for Video Representation Learning», 22 de septiembre de 2021, arXiv: arXiv:2012.04124. doi: 10.48550/arXiv.2012.04124.es_CO
    dc.relation.referencesZ. Pan, B. Zhuang, J. Liu, H. He, y J. Cai, «Scalable Vision Transformers with Hierarchical Pooling», 18 de agosto de 2021, arXiv: arXiv:2103.10619. doi: 10.48550/arXiv.2103.10619.es_CO
    dc.relation.referencesX. Chu, Z. Tian, B. Zhang, X. Wang, y C. Shen, «Conditional Positional Encodings for Vision Transformers», 13 de febrero de 2023, arXiv: arXiv:2102.10882. doi: 10.48550/arXiv.2102.10882.es_CO
    dc.relation.referencesJ. Fang, L. Xie, X. Wang, X. Zhang, W. Liu, y Q. Tian, «MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens», 25 de marzo de 2022, arXiv: arXiv:2105.15168. doi: 10.48550/arXiv.2105.15168.es_CO
    dc.relation.referencesB. Wu et al., «Visual Transformers: Token-based Image Representation and Processing for Computer Vision», 20 de noviembre de 2020, arXiv: arXiv:2006.03677. doi: 10.48550/arXiv.2006.03677.es_CO
    dc.relation.referencesR. Mallick, J. Benois-Pineau, y A. Zemmari, «I Saw: A Self-Attention Weighted Method for Explanation of Visual Transformers», en 2022 IEEE International Conference on Image Processing (ICIP), oct. 2022, pp. 3271-3275. doi: 10.1109/ICIP46576.2022.9897347.es_CO
    dc.relation.referencesQ. Zhang, Y. Xu, J. Zhang, y D. Tao, «ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond», Int. J. Comput. Vis., vol. 131, n.o 5, pp. 1141-1162, may 2023, doi: 10.1007/s11263-022-01739-w.es_CO
    dc.relation.referencesS. Robles-Serrano, G. Sanchez-Torres, y J. Branch-Bedoya, «Automatic Detection of Traffic Accidents from Video Using Deep Learning Techniques», Computers, vol. 10, n.o 11, Art. n.o 11, nov. 2021, doi: 10.3390/computers10110148. [121] H. Hozhabr Pour et al., «A Machine Learning Framework for Automated Accident Detection Based on Multimodal Sensors in Cars», Sensors, vol. 22, n.o 10, Art. n.o 10, ene. 2022, doi: 10.3390/s22103634.es_CO
    dc.relation.referencesI. de Zarzà, J. de Curtò, G. Roig, y C. T. Calafate, «LLM Multimodal Traffic Accident Forecasting», Sensors, vol. 23, n.o 22, Art. n.o 22, ene. 2023, doi: 10.3390/s23229225.es_CO
    dc.relation.references«liuhaotian/llava-v1.5-7b · Hugging Face». Accedido: 31 de marzo de 2025. [En línea]. Disponible en: https://huggingface.co/liuhaotian/llava-v1.5-7bes_CO
    dc.rights.accessrightshttp://purl.org/coar/access_right/c_abf2es_CO
    dc.type.coarversionhttp://purl.org/coar/resource_type/c_2df8fbb1es_CO
    Aparece en las colecciones: Revista Colombiana de Tecnologias de Avanzada (RCTA)

    Ficheros en este ítem:
    Fichero Descripción Tamaño Formato  
    Art22_V1_N45_2025_esp.pdfArt22_V1_N45_2025_esp466,03 kBAdobe PDFVisualizar/Abrir


    Los ítems de DSpace están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.