What is: CAMoE?
Source | Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
CAMoE is a multi-stream Corpus Alignment network with single gate Mixture-of-Experts (MoE) for video-text retrieval. The CAMoE employs Mixture-of-Experts (MoE) to extract multi-perspective video representations, including action, entity, scene, etc., then align them with the corresponding part of the text. A Dual Softmax Loss (DSL) is used to avoid the one-way optimum-match which occurs in previous contrastive methods. Introducing the intrinsic prior of each pair in a batch, DSL serves as a reviser to correct the similarity matrix and achieves the dual optimal match.