PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers

doi:10.1109/TNNLS.2023.3301007

CORC > 自动化研究所 > 中国科学院自动化研究所 > 中科院工业视觉智能装备工程实验室

	PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers
	Li, Zhikai 1,2; Chen, Mengjuan 2; Xiao, Junrui 1,2; Gu, Qingyi 2
刊名	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
	2023-08-14
页码	12
关键词	Data-free quantization model compression patch similarity quantized vision transformers (ViTs)
ISSN号	2162-237X
DOI	10.1109/TNNLS.2023.3301007
通讯作者	Gu, Qingyi(qingyi.gu@ia.ac.cn)
英文摘要	Data-free quantization can potentially address data privacy and security concerns in model compression and thus has been widely investigated. Recently, patch similarity aware data-free quantization for vision transformers (PSAQ-ViT) designs a relative value metric, patch similarity, to generate data from pretrained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this article, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model in a competitive and interactive fashion under the supervision of the full-precision (FP) model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task-and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mean Intersection over Union (mIoU) on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code is released and merged at: https://github.com/zkkli/PSAQ-ViT.
资助项目	National Key Research and Development Program of China[2022ZD0119402] ; National Natural Science Foundation of China[62276255]
WOS研究方向	Computer Science ; Engineering
语种	英语
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号	WOS:001051284800001
资助机构	National Key Research and Development Program of China ; National Natural Science Foundation of China
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/53951]
专题	中科院工业视觉智能装备工程实验室
通讯作者	Gu, Qingyi
作者单位	1.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 2.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Li, Zhikai,Chen, Mengjuan,Xiao, Junrui,et al. PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,2023:12.
APA	Li, Zhikai,Chen, Mengjuan,Xiao, Junrui,&Gu, Qingyi.(2023).PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,12.
MLA	Li, Zhikai,et al."PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers".IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023):12.