AICurious Logo

What is: Modular Interactive VOS?

SourceModular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

MiVOS is a video object segmentation model which decouples interaction-to-mask and mask propagation. By decoupling interaction from propagation, MiVOS is versatile and not limited by the type of interactions. It uses three modules: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory.