AICurious Logo

What is: CAMoE?

SourceImproving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

CAMoE is a multi-stream Corpus Alignment network with single gate Mixture-of-Experts (MoE) for video-text retrieval. The CAMoE employs Mixture-of-Experts (MoE) to extract multi-perspective video representations, including action, entity, scene, etc., then align them with the corresponding part of the text. A Dual Softmax Loss (DSL) is used to avoid the one-way optimum-match which occurs in previous contrastive methods. Introducing the intrinsic prior of each pair in a batch, DSL serves as a reviser to correct the similarity matrix and achieves the dual optimal match.