Jiaxin Deng (邓嘉鑫)

I am a master student at Institute of Automation in Beijing, where I work on video multi-modal understanding and time series mining.

I did my undergraduate degree at UESTC and I am doing my Master's Degree at Institute of Automation, where I am advised by Gaofeng Meng.

Email  /  CV  /  Bio  /  Twitter  /  Github

profile photo
Research

I'm interested in multi-modal video understanding, continual learning, and time series mining. Much of my research is about mutli-modal learning (visual, text, speech, etc) from videos. Representative papers are highlighted.

MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion
Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou*, Gaofeng Meng
KDD, 2024   (Just Accepted!)
paper / slides / video

Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. In this work, we propose MMBee based on real-time Multi-Modal Fusion and Behaviour Expansion to address these issues. Specifically, we first present a Multi-modal Fusion Module with Learnable Query (MFQ) to perceive the dynamic content of streaming segments and process complex multi-modal interactions, including images, text comments and speech. To alleviate the sparsity issue of gifting behaviors, we present a novel Graph-guided Interest Expansion (GIE) approach that learns both user and streamer representations on large-scale gifting graphs with multi-modal attributes. It consists of two main parts: graph node representations pre-training and metapath-based behavior expansion, all of which help model jump out of the specific historical gifting behaviors for exploration and largely enrich the behavior representations.

A Multimodal Transformer for Live Streaming Highlight Prediction
Jiaxin Deng, Shiyao Wang, Dong Shen, Liqin Zhao, Fan Yang, Guorui Zhou, Gaofeng Meng*
ICME, 2024   (Poster)
paper / slides / video

Recently, live streaming platforms have gained immense popularity. Traditional video highlight detection mainly focuses on visual features and utilizes both past and future content for prediction. However, live streaming requires models to infer without future frames and process complex multimodal interactions, including images, audio and text comments. To address these issues, we propose a multimodal transformer that incorporates historical look-back windows. We introduce a novel Modality Temporal Alignment Module to handle the temporal shift of cross-modal signals. Additionally, using existing datasets with limited manual annotations is insufficient for live streaming whose topics are constantly updated and changed. Therefore, we propose a novel Border-aware Pairwise Loss to learn from a large-scale dataset and utilize user implicit feedback as a weak supervision signal. Extensive experiments show our model outperforms various strong baselines on both real-world scenarios and public datasets.

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset
Jiaxin Deng, Dong Shen, Haojie Pan, Xiangyu Wu, Ximan Liu, Gaofeng Meng*, Fan Yang, Tingting Gao, Ruiji Fu, Zhongyuan Wang
ICMR, 2023   (Oral Presentation, Best Paper Award Candidate)
paper / slides / video

We propose a heterogeneous dataset that contains the multi-modal video entity and fruitful common sense relations. This dataset also provides multiple novel video inference tasks like the Video-Relation-Tag (VRT) and Video-Relation-Video (VRV) tasks. Furthermore, based on this dataset, we propose an end-to-end model that jointly optimizes the video understanding objective with knowledge graph embedding, which can not only better inject factual knowledge into video understanding but also generate effective multi-modal entity embedding for KG.


Please refrain from scraping the HTML of this page, as it contains analytics tags that are undesirable for your own website.