AICurious Logo

What is: MUSIQ?

SourceMUSIQ: Multi-scale Image Quality Transformer
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

MUSIQ, or Multi-scale Image Quality Transformer, is a Transformer-based model for multi-scale image quality assessment. It processes native resolution images with varying sizes and aspect ratios. In MUSIQ, we construct a multi-scale image representation as input, including the native resolution image and its ARP resized variants. Each image is split into fixed-size patches which are embedded by a patch encoding module (blue boxes). To capture 2D structure of the image and handle images of varying aspect ratios, the spatial embedding is encoded by hashing the patch position (i,j)(i,j) to (ti,tj)(t_{i},t_{j}) within a grid of learnable embeddings (red boxes). Scale Embedding (green boxes) is introduced to capture scale information. The Transformer encoder takes the input tokens and performs multi-head self-attention. To predict the image quality, MUSIQ follows a common strategy in Transformers to add an [CLS] token to the sequence to represent the whole multi-scale input and the corresponding Transformer output is used as the final representation.