Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

1City University of Hong Kong 2Nanyang Technological University 3Shanghai Jiao Tong University 4Jiangxi University of Finance and Economics
*Equal Contribution.

Abstract

While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score—an all-around LMM-based no-reference IQA~(NR-IQA) model, which is capable of producing qualitatively comparative responses and effectively translating these discrete comparative levels into a continuous quality score. Specifically, during training, we present to generate scaled-up comparative instructions by comparing images from the same IQA dataset, allowing for more flexible integration of diverse IQA datasets. Utilizing the established large-scale training corpus, we develop a human-like visual quality comparator. During inference, moving beyond binary choices, we propose a soft comparison method that calculates the likelihood of the test image being preferred over multiple predefined anchor images. The quality score is further optimized by maximum a posteriori estimation with the resulting probability matrix. Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality score for inference, surpassing state-of-the-art IQA models across diverse scenarios. Moreover, we verify that the probability-matrix-based inference conversion not only improves the rating accuracy of Compare2Score but also zero-shot general-purpose LMMs, suggesting its intrinsic effectiveness.

Motivation

Illustrations of the motivation of this work. (a) Images with identical rescaled MOS from various IQA datasets exhibit significant variations in perceptual quality. (b) Images that cluster at the same rating level from different IQA datasets display mismatches due to differing subjective testing methodologies. (c) By comparing MOSs within the same dataset, it facilitates the flexible combination of multiple IQA datasets.

A repurposed training dataset.

We present to generate scaled-up comparative instructions by comparing MOSs of images within each IQA dataset, which allows for more flexible integration of diverse IQA datasets. Specifically, the approach simulates subjective testing by posing the question, ``Compared with the first image, how is the quality of the second image?”. Responses are then generated based on the MOS comparisons of the image pairs. Using the empirical rule, we categorize the image pairs into five distinct comparative levels: inferior, worse, similar, better, superior. This method produces a comprehensive training dataset that enables the LMM to effectively handle various distortion scenarios, resulting in a human-like visual quality comparator.

An inference conversion strategy

We develop an adaptive soft comparison scheme that efficiently translates discrete comparative levels into continuous quality scores. Unlike traditional two-alternative forced choice (2AFC) methods, our approach calculates the likelihood that an input image is preferred over multiple anchor images. This probability is derived from a weighted summation of the softmax-transformed log probabilities across five comparative levels. Subsequently, the quality score of the input image is calculated through maximum a posteriori (MAP) estimation based on the resulting probability matrix.

Experiments

BibTeX

@misc{zhu2024adaptive,
      title={Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare}, 
      author={Hanwei Zhu and Haoning Wu and Yixuan Li and Zicheng Zhang and Baoliang Chen and Lingyu Zhu and Yuming Fang and Guangtao Zhai and Weisi Lin and Shiqi Wang},
      year={2024},
      eprint={2405.19298},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}