From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance

University of Southern California
Equal contribution
1. Select Dataset & Scene
ACDC
Foggy Cityscapes
Defogging Comparison

Comparison of defogging pipelines on ACDC and Foggy Cityscapes datasets.

Abstract

Autonomous driving perception systems are particularly vulnerable in foggy conditions, where light scattering reduces contrast and obscures fine details critical for safe operation. While numerous defogging methods exist, from handcrafted filters to learned restoration models, improvements in image fidelity do not consistently translate into better downstream detection and segmentation. Moreover, prior evaluations often rely on synthetic data, leaving questions about real-world transferability.

We present a structured empirical study that benchmarks a comprehensive set of pipelines, including (i) classical dehazing filters, (ii) modern defogging networks, (iii) chained variants (filter→model, model→filter), and (iv) prompt-driven visual-language image editing models (VLM) applied directly to foggy images. To address the gap between simulated and physical environments, we employ both the synthetic Foggy Cityscapes dataset and the real-world Adverse Conditions Dataset with Correspondences (ACDC).

Crucially, we examine the generalizability of these pipelines by evaluating performance on synthetic fog and real-world conditions. We assess both image quality and downstream performance in terms of object detection (mAP) and segmentation (PQ). Our analysis identifies when defogging is effective, the impact of combining models, and how VLMs compare to traditional approaches. Additionally, to evaluate the quality of dehazed images, we report qualitative rubric-based scores from both VLM and human judges, and discuss their alignment with down-stream task metrics, revealing reasonable correlations with mAP. Together, these results establish a transparent, task-oriented benchmark for defogging methods and highlight the conditions under which pre-processing genuinely enhances autonomous perception in adverse weather conditions.

Quantitative Results

Detection and Segmentation Performance

mAP is reported in percentage (%), and PQ scores for Cityscapes are normalized relative to the ground truth image.

Base Method Next Method Cityscapes ACDC
mAP% PQ mAP% PQ
GT None 25.60 100.0 N/A N/A
Foggy None 22.95 57.9 37.32 48.5
DehazeFormer(Tr) None 25.57 74.4 35.63 44.5
CLAHE 25.55 70.3 35.06 38.5
MSR 25.30 59.4 31.05 35.0
DCP 24.52 69.9 34.17 38.0
DehazeFormer None 23.79 63.0 38.04 48.0
CLAHE 24.45 62.4 37.98 46.6
DCP 24.09 64.2 34.18 42.1
MSR 23.36 59.4 33.89 43.9
CLAHE None 23.83 58.0 37.73 47.0
Dehaze(Tr) 25.05 69.9 34.77 40.7
Dehaze 24.10 61.1 37.97 46.8
FocalNet 23.85 60.9 37.88 46.7
MITNet 23.28 56.1 32.59 46.0
MSR None 23.18 50.7 33.46 44.5
Dehaze(Tr) 24.35 53.2 32.09 31.8
Dehaze 23.43 51.5 34.74 44.4
FocalNet 23.51 51.7 33.74 44.4
MITNet 23.38 57.9 33.04 34.1
DCP None 23.69 64.0 35.08 43.0
Dehaze(Tr) 23.76 64.8 33.14 36.1
Dehaze 23.45 64.2 34.14 42.3
MITNet 23.58 59.5 6.51 9.7
FocalNet 23.47 63.6 34.58 42.4
FocalNet None 23.53 62.6 37.27 48.0
CLAHE 23.85 60.4 37.32 45.5
DCP 23.56 62.8 34.47 41.3
MSR 21.71 51.7 32.77 43.4
MITNet None 22.51 57.9 28.32 31.5
CLAHE 22.79 56.1 28.13 27.9
DCP 22.38 56.8 27.89 22.7
MSR 21.51 51.0 22.68 25.2
Flux CoT Prompt 23.76 59.7 37.78 46.6
Base Prompt 23.06 55.5 29.47 43.2
NanoBanana Base Prompt 19.41 61.2 23.48 39.8

Qualitative VLM-Judge Metrics

Scores from 0 to 5. All models applied to Cityscapes Dataset. Values in parentheses indicate average human judge scores.

Method Visibility
Restoration
Boundary
Clarity
Perceived
Detectability
Ground Truth (GT) 4.95 4.95 4.95
Flux (CoT Prompt) 4.30 (3.95) 3.92 (3.28) 4.22 (4.12)
Flux (Baseline Prompt) 1.22 (2.32) 1.21 (1.43) 1.22 (1.58)
DehazeFormer (Trained) 4.07 (3.74) 3.96 (3.55) 4.07 (3.95)
DehazeFormer 2.17 (2.40) 2.17 (2.65) 2.18 (1.95)
NanoBanana 2.44 (2.35) 2.44 (2.85) 2.44 (3.05)
Qualitative Metrics Visualization

BibTeX

@article{aryashad2025filters,
  title={From Filters to VLMs: Benchmarking Defogging Methods through Object Detection and Segmentation Performance},
  author={Aryashad, Ardalan and Razmara, Parsa and Mahjoub, Amin and Azizi, Seyedarmin and Salmani, Mahdi and Firouzkouhi, Arad},
  journal={arXiv preprint arXiv:2510.03906},
  year={2025}
}