OCRGenBench Leaderboard

#↕	Model↕	Date↕	Type↕	Source↕	Params↕	OCRGenScore ↑↕	T2I (VIEScore) ↑↕	T2I (AR) ↑↕	Edit (1-LPIPS) ↑↕	Edit (AR) ↑↕	Dewarping (DD) ↑↕	Deshadow (MSSSIM) ↑↕	Deblur (MSSSIM) ↑↕
Closed-Source Models
1	Nano Banana Pro	2025.11	Unified U&G	Closed	–	77.19	92.24	76.96	85.22	71.46	42.52	85.93	61.37
3	Seedream 4.5	2025.12	Specialized	Closed	–	63.35	90.45	74.09	65.75	45.19	23.85	57.99	55.09
9	GPT Image 1.5	2025.03	Specialized	Closed	–	54.00	93.41	68.72	57.36	42.73	29.55	36.10	32.09
Open-Source · Unified Understanding & Generation
5	BAGEL	2025.05	Unified U&G	Open	14B	59.11	57.08	14.95	87.03	15.07	20.80	76.18	76.27
8	OmniGen2	2025.06	Unified U&G	Open	7B	54.24	59.96	26.37	65.83	13.82	21.21	64.26	65.17
11	InternVL-U	2026.03	Unified U&G	Open	4B	43.64	66.25	44.17	53.87	17.21	28.69	49.45	30.80
13	Janus-4o	2025.06	Unified U&G	Open	7B	29.58	53.59	13.63	45.53	5.80	24.88	34.28	39.33
15	ILLUME+	2025.04	Unified U&G	Open	7B	28.39	43.15	5.46	42.71	2.68	27.21	42.78	38.28
Open-Source · Specialized Generation
2	FLUX.2-dev	2025.11	Specialized	Open	32B	70.19	88.88	66.37	83.92	41.56	41.41	67.97	71.87
4	LongCat-Image	2025.12	Specialized	Open	6B	66.39	84.02	67.51	85.56	51.53	28.04	72.12	56.62
6	FLUX.2-Klein-9B	2026.01	Specialized	Open	9B	59.28	82.84	39.63	81.18	31.10	24.44	57.98	56.74
7	Qwen-Image	2025.12	Specialized	Open	20B	56.29	84.41	65.21	65.65	41.20	25.14	50.05	38.81
10	GLM-Image	2026.01	Specialized	Open	9B	50.12	83.53	69.36	70.97	21.54	24.99	43.15	41.13
12	FLUX.1-Kontext-dev	2025.06	Specialized	Open	12B	36.51	39.58	21.69	53.76	15.13	24.30	30.36	30.80
14	SD-3.5-Large	2024.10	Specialized	Open	8B	29.53	50.94	27.51	47.43	5.77	30.99	29.07	32.64

#↕	Model↕	OCRGenScore ↑↕	Text Removal ↑		Style Transfer: Artistic Text ↑		Style Transfer: Hist. Doc. (VIEScore) ↑↕	Hist. Doc. Rest. (1-LPIPS) ↑↕	Scene Text SR (MSSSIM) ↑↕	Layout-Aware Text Gen. ↑
#↕	Model↕	OCRGenScore ↑↕	Handwriting (MSSSIM)↕	Scene Text (MSSSIM)↕	VIEScore↕	AR↕	Style Transfer: Hist. Doc. (VIEScore) ↑↕	Hist. Doc. Rest. (1-LPIPS) ↑↕	Scene Text SR (MSSSIM) ↑↕	VIEScore↕	AR↕
Closed-Source Models
1	Nano Banana Pro	77.19	84.16	91.66	78.27	89.00	77.66	74.15	64.67	88.87	100.00
9	GPT Image 1.5	54.00	47.59	51.38	87.16	94.22	80.57	46.18	23.23	85.69	97.77
3	Seedream 4.5	64.69	69.97	83.93	81.52	99.34	78.07	59.88	42.02	54.50	95.00
Open-Source · Unified Understanding & Generation
5	BAGEL	59.11	80.23	79.78	33.19	12.80	44.49	73.71	87.67	75.31	32.99
8	OmniGen2	54.24	47.28	82.11	35.17	47.59	64.66	50.64	83.75	72.58	43.33
11	InternVL-U	43.64	57.19	61.13	53.41	59.99	20.75	37.45	17.74	53.94	85.83
15	ILLUME+	28.39	48.93	68.42	0.50	2.26	1.00	25.84	21.88	7.67	3.29
13	Janus-4o	29.58	43.69	35.77	25.08	8.17	16.53	28.60	19.05	41.55	17.32
Open-Source · Specialized Generation
7	Qwen-Image	56.29	49.22	44.56	70.63	95.47	72.53	59.48	27.65	84.44	98.93
4	LongCat-Image	66.39	78.11	89.56	69.22	94.27	69.74	66.66	24.37	83.57	96.14
14	SD-3.5-Large	29.53	44.64	26.11	56.64	44.01	0.00	42.79	17.37	18.52	4.84
12	FLUX.1-Kontext-dev	36.51	58.34	31.87	58.66	52.59	52.80	43.24	22.49	49.00	28.51
2	FLUX.2-dev	70.19	78.75	93.96	79.86	91.32	68.48	74.62	52.78	76.53	44.61
6	FLUX.2-Klein-9B	59.28	65.71	80.59	78.48	93.39	67.48	63.81	46.46	74.05	41.61
10	GLM-Image	50.12	64.61	16.21	59.77	51.90	77.81	67.98	11.80	80.81	66.67

1st place 2nd place 3rd place Bold = best per column | Underline = 2nd best | T2I = Text-to-Image generation | DD = Document-Distortion metric

#↕	Model↕	Document ↑		Handwriting ↑↕	Scene Text ↑↕	Artistic Text ↑↕	Layout-Rich Text ↑
#↕	Model↕	Modern↕	Historical↕	Handwriting ↑↕	Scene Text ↑↕	Artistic Text ↑↕	Slide↕	Poster↕	Layout-Aware↕
Closed-Source Models
1	Nano Banana Pro	70.87	71.37	78.28	84.67	89.11	92.86	80.85	94.43
3	Seedream 4.5	48.74	61.82	60.30	70.34	89.07	77.20	74.23	74.75
9	GPT Image 1.5	36.31	56.47	57.16	60.37	86.26	80.67	63.21	91.73
Open-Source · Unified Understanding & Generation
5	BAGEL	56.33	47.63	55.47	63.83	40.25	47.79	44.91	54.15
8	OmniGen2	45.97	46.84	40.66	63.81	47.45	42.84	42.59	57.96
11	InternVL-U	36.83	29.96	47.58	49.94	57.94	52.56	41.60	69.88
13	Janus-4o	31.75	25.88	33.28	33.51	29.09	45.85	26.94	29.43
15	ILLUME+	33.87	19.48	32.55	36.05	13.40	34.90	21.57	5.48
Open-Source · Specialized Generation
2	FLUX.2-dev	60.17	64.36	69.75	75.50	87.04	80.17	70.68	80.52
4	LongCat-Image	54.79	63.47	70.92	72.21	85.95	79.04	82.48	89.86
6	FLUX.2-Klein-9B	47.85	56.55	60.73	66.89	72.56	68.01	57.04	57.83
7	Qwen-Image	45.22	56.37	52.12	58.59	87.26	75.63	70.40	91.68
10	GLM-Image	41.42	62.62	57.83	49.74	68.66	66.84	64.12	73.74
12	FLUX.1-Kontext-dev	31.10	35.72	33.39	32.38	52.95	38.09	27.40	38.76
14	SD-3.5-Large	30.11	22.86	33.23	30.56	47.90	43.93	36.96	11.68

T2I Generation Subtasks: Text-to-Image generation tasks across multiple text categories. OCRGenScore_T2I = weighted average of VIEScore and Accuracy Rate (AR) across all T2I subtasks. Models marked with * support only T2I generation (no image editing capabilities).

#	Model	OCRGenScore_T2I ↑	Hist. Doc. ↑		Handwriting ↑		Scene Text ↑		Artistic Text ↑		Slide ↑	Poster ↑
#	Model	OCRGenScore_T2I ↑	VIEScore	AR	VIEScore	AR	VIEScore	AR	VIEScore	AR	VIEScore	VIEScore	AR
Closed-Source Models
1	Nano Banana Pro	85.53	92.02	61.76	91.36	74.35	94.06	90.89	90.91	91.32	94.82	91.87	58.22
2	Seedream 4.5	83.83	89.60	53.38	89.48	53.92	90.81	84.58	90.98	95.42	90.72	92.50	83.87
3	GPT Image 1.5	82.63	92.35	49.63	94.10	67.74	93.20	83.19	92.37	77.29	94.25	93.62	59.43
Open-Source · Unified Understanding & Generation
11	InternVL-U	58.44	58.26	10.27	56.66	40.59	69.78	80.09	83.89	46.83	76.11	—	—
12	OmniGen2	45.58	73.25	18.98	59.34	24.48	63.12	32.71	58.37	24.57	48.42	72.77	12.08
14	Janus-4o	38.79	72.78	10.01	49.22	14.59	53.66	15.70	42.48	19.20	62.31	62.30	0.07
15	BAGEL	37.34	54.34	8.59	54.11	16.05	65.46	9.53	55.08	25.76	43.69	70.40	1.40
17	ILLUME+	27.26	59.42	8.37	43.87	7.81	51.07	2.72	24.62	4.12	36.13	52.72	0.14
18	Show-o2*	19.44	28.71	5.01	25.83	11.26	24.79	5.47	35.14	23.67	20.07	33.22	0.07
Open-Source · Specialized Generation
4	FLUX.2-dev	79.28	85.59	31.41	81.89	29.94	79.47	99.99	85.70	92.47	81.14	89.75	67.62
5	LongCat-Image	78.92	82.91	42.03	78.30	51.74	87.47	95.99	88.55	88.05	83.82	88.41	75.96
6	GLM-Image	77.95	79.85	60.99	77.95	64.16	88.51	82.17	88.64	75.77	78.70	90.49	69.48
7	Qwen-Image	76.03	87.40	26.36	60.28	88.00	77.76	99.53	86.49	95.47	85.91	84.32	58.30
8	Z-Image*	74.46	66.22	57.31	68.52	72.78	75.87	93.08	77.29	89.80	66.44	79.48	77.30
9	Ovis-Image*	69.02	52.45	52.75	55.30	54.00	76.67	85.82	77.38	92.70	57.14	75.97	90.87
10	FLUX.2-Klein-9B	59.28	82.84	39.63	88.91	31.10	80.47	84.11	83.86	45.04	51.34	51.37	37.23
13	SD-3.5-Large	43.36	52.39	7.40	41.20	19.86	47.48	42.44	59.53	43.76	58.06	64.30	25.70
16	FLUX.1-Kontext-dev	31.78	37.21	9.55	36.20	21.25	40.15	29.01	39.80	33.69	39.25	51.98	3.96

Text Editing Tasks: Modify specific textual content within images while preserving other elements unchanged. Evaluated by 1 − LPIPS (perceptual similarity, ↑ better) and Accuracy Rate (AR, ↑ better) per text category. Note: Models marked with * currently support only image editing tasks.

#	Model	OCRGenScore_Edit ↑	Modern Doc. ↑		Historical Doc. ↑		Handwriting ↑		Scene Text ↑		Artistic Text ↑		Slide ↑		Poster ↑
#	Model	OCRGenScore_Edit ↑	1-LPIPS	AR	1-LPIPS	AR	1-LPIPS	AR	1-LPIPS	AR	1-LPIPS	AR	1-LPIPS	AR	1-LPIPS	AR
Closed-Source Models
2	Nano Banana Pro	80.30	88.15	80.28	74.66	38.87	77.42	58.21	92.03	73.36	92.15	94.00	85.92	96.90	93.50	79.80
6	Seedream 4.5	56.74	65.14	41.20	60.95	13.77	53.77	24.48	68.67	45.31	67.15	100.00	64.06	64.31	59.84	69.71
7	GPT Image 1.5	52.32	55.66	28.30	48.73	8.80	54.08	31.89	59.92	45.78	67.45	65.31	60.28	67.98	55.36	44.42
Open-Source · Unified Understanding & Generation
8	BAGEL	50.57	87.58	9.00	79.45	2.50	86.27	15.95	90.55	20.66	87.63	17.63	83.69	20.09	92.57	15.28
11	OmniGen2	39.86	61.76	9.32	68.48	3.66	62.11	11.90	73.55	10.21	74.24	24.66	49.88	24.11	51.50	6.13
13	InternVL-U	35.03	51.20	5.59	43.98	10.75	58.54	15.32	58.50	21.29	58.39	45.15	44.29	13.72	49.59	14.15
14	Janus-4o	28.09	43.82	2.42	32.06	1.92	44.59	3.89	47.29	6.24	52.61	27.68	56.12	2.68	43.76	0.82
16	ILLUME+	22.68	39.25	1.25	41.83	1.19	41.46	3.89	61.53	3.46	48.29	0.62	63.53	3.78	42.74	0.69
Open-Source · Specialized Generation
1	FireRed-Image-Edit-v1.1*	81.27	89.75	60.08	80.46	44.70	90.68	76.18	91.97	66.97	21.18	93.26	88.02	98.66	96.19	79.47
3	LongCat-Image	70.97	84.35	29.83	76.75	32.06	86.55	52.73	87.56	54.12	83.62	91.95	83.81	64.72	92.98	72.66
4	FLUX.2-dev	64.14	83.53	34.86	74.61	13.08	80.93	31.32	87.57	48.38	87.57	89.77	84.51	56.67	91.28	33.86
5	Qwen-Image	57.68	67.63	33.60	63.37	21.50	51.40	25.06	72.08	58.58	85.95	72.33	59.48	27.65	84.44	98.93
9	FLUX.2-Klein-9B	48.32	76.15	16.35	67.51	5.09	59.92	15.77	82.84	24.61	67.98	47.48	75.98	33.96	79.19	16.77
10	GLM-Image	46.72	41.42	14.42	54.20	14.42	65.15	44.39	59.40	77.81	67.98	47.48	78.31	84.08	58.31	74.23
12	FLUX.1-Kontext-dev	35.48	51.49	6.82	45.34	10.75	54.66	15.52	62.04	11.27	62.81	20.97	57.29	16.53	52.86	5.39
15	SD-3.5-Large	26.74	35.32	4.45	37.15	0.38	43.19	5.87	42.86	5.56	64.87	18.46	59.60	0.00	51.50	6.13

Chinese (ZH) Performance: Model performance evaluated exclusively on Chinese text samples. All metrics (OCRGenScore, T2I, Editing, I2I Translation) are computed on Chinese-language test data only.

#↕	Model↕	OCRGenScore ↑↕	T2I (VIEScore) ↑↕	T2I (AR) ↑↕	Edit (1-LPIPS) ↑↕	Edit (AR) ↑↕	Dewarping (DD) ↑↕	Deshadow (MSSSIM) ↑↕	Deblur (MSSSIM) ↑↕
Closed-Source Models
1	Nano Banana Pro	76.02	91.61	82.27	83.43	67.96	43.99	93.82	60.26
4	Seedream 4.5	63.55	91.09	66.89	61.87	40.88	21.45	83.87	49.51
8	GPT Image 1.5	53.04	93.50	69.24	55.35	34.08	31.69	34.34	33.60
Open-Source · Unified Understanding & Generation
7	BAGEL	55.87	53.09	2.32	86.09	8.97	19.25	74.19	59.08
9	OmniGen2	47.82	51.27	0.87	63.12	4.99	19.56	68.38	46.71
11	InternVL-U	43.88	70.18	44.30	50.93	9.34	23.52	64.68	33.54
12	ILLUME+	29.17	53.17	0.56	42.34	1.69	22.80	44.82	42.50
15	Janus-4o	25.50	44.85	0.19	44.02	1.17	25.94	31.45	39.39
Open-Source · Specialized Generation
2	FLUX.2-dev	69.30	89.60	58.42	81.81	28.48	37.52	81.94	61.19
3	LongCat-Image	65.97	87.39	68.92	85.83	52.70	28.00	79.45	59.62
5	FLUX.2-Klein-9B	56.53	80.47	11.39	78.95	12.37	23.16	83.86	51.34
6	Qwen-Image	56.44	88.11	68.43	60.69	33.06	19.27	60.62	42.47
10	GLM-Image	46.72	87.98	63.47	70.68	13.82	21.44	25.73	44.02
13	FLUX.1-Kontext-dev	28.72	3.85	0.22	53.46	2.90	19.78	26.00	34.36
14	SD-3.5-Large	26.74	22.55	0.60	45.29	1.01	31.62	22.71	37.63

#↕	Model↕	OCRGenScore ↑↕	Text Removal ↑		Style Transfer: Artistic Text ↑		Hist. Doc. (VIEScore) ↑↕	Hist. Doc. Rest. (1-LPIPS) ↑↕	Scene Text SR (MSSSIM) ↑↕	Layout-Aware Text Gen. ↑
#↕	Model↕	OCRGenScore ↑↕	Handwriting (MSSSIM)↕	Scene Text (MSSSIM)↕	VIEScore↕	AR↕	Hist. Doc. (VIEScore) ↑↕	Hist. Doc. Rest. (1-LPIPS) ↑↕	Scene Text SR (MSSSIM) ↑↕	VIEScore↕	AR↕
Closed-Source Models
1	Nano Banana Pro	76.02	84.16	93.53	64.21	79.23	77.66	74.15	52.98	88.02	100.00
8	GPT Image 1.5	53.04	47.59	50.75	81.63	90.11	80.57	46.18	19.68	86.05	95.30
4	Seedream 4.5	63.55	69.97	88.99	76.60	99.23	78.07	59.88	29.30	55.26	89.47
Open-Source · Unified Understanding & Generation
7	BAGEL	55.87	80.23	87.20	12.74	1.30	44.49	73.71	88.05	82.51	1.74
9	OmniGen2	47.82	47.28	83.19	5.66	6.52	64.66	50.64	82.67	67.37	0.00
11	InternVL-U	43.88	57.19	53.47	44.41	57.21	20.75	37.45	19.48	62.28	83.30
12	ILLUME+	29.17	48.93	71.65	0.00	0.00	1.00	25.84	16.19	3.08	0.00
15	Janus-4o	28.50	43.69	37.33	0.00	0.00	16.53	28.60	16.21	32.96	0.00
Open-Source · Specialized Generation
6	Qwen-Image	56.44	49.22	38.52	68.27	96.01	72.53	59.48	27.24	82.44	99.00
3	LongCat-Image	65.97	78.11	90.32	58.02	91.30	69.74	66.66	18.05	82.47	92.90
14	SD-3.5-Large	26.74	44.64	25.62	48.05	3.90	0.00	42.79	18.16	17.93	0.00
13	FLUX.1-Kontext-dev	28.52	58.34	27.83	37.85	3.96	52.80	43.24	21.34	22.61	0.00
2	FLUX.2-dev	69.30	78.75	93.96	73.96	82.64	68.48	74.62	33.10	75.85	36.57
5	FLUX.2-Klein-9B	56.53	65.71	92.63	67.54	86.79	67.48	63.81	44.49	75.16	10.47
10	GLM-Image	46.72	64.61	18.00	54.20	44.39	77.81	67.98	11.07	79.31	58.31

1st place 2nd place 3rd place Bold = best per column | Underline = 2nd best | Ranked by OCRGenScore on Chinese (ZH) text samples

English (EN) Performance: Model performance evaluated exclusively on English text samples. All metrics (OCRGenScore, T2I, Editing, I2I Translation) are computed on English-language test data only.

#↕	Model↕	OCRGenScore ↑↕	T2I (VIEScore) ↑↕	T2I (AR) ↑↕	Edit (1-LPIPS) ↑↕	Edit (AR) ↑↕	Dewarping (DD) ↑↕	Deshadow (MSSSIM) ↑↕	Deblur (MSSSIM) ↑↕
Closed-Source Models
1	Nano Banana Pro	78.75	92.87	71.71	87.00	74.96	41.53	81.99	62.92
4	Seedream 4.5	62.73	89.82	75.22	61.62	49.51	25.45	45.05	62.90
9	GPT Image 1.5	53.95	93.31	68.21	59.36	51.37	28.13	36.99	29.98
Open-Source · Unified Understanding & Generation
5	OmniGen2	62.26	68.55	51.47	68.54	22.66	22.11	62.20	39.44
7	BAGEL	60.96	61.02	27.37	87.96	21.16	21.84	77.17	61.57
11	InternVL-U	45.45	62.36	44.04	56.82	25.08	32.13	41.83	26.97
14	Janus-4o	33.54	63.19	26.86	47.04	10.43	24.18	35.70	33.95
15	ILLUME+	31.02	33.17	10.34	43.09	3.67	30.16	41.75	38.54
Open-Source · Specialized Generation
2	FLUX.2-dev	70.74	88.17	74.19	86.04	54.64	43.81	60.98	59.35
3	LongCat-Image	65.76	80.70	66.12	85.29	56.37	28.07	68.45	60.66
6	FLUX.2-Klein-9B	61.95	85.18	67.43	83.40	49.82	25.28	45.04	51.37
8	Qwen-Image	55.14	80.76	62.04	70.61	49.34	29.06	44.77	33.67
10	GLM-Image	47.17	81.11	75.16	71.25	29.26	27.36	51.96	37.08
12	FLUX.1-Kontext-dev	43.59	72.99	42.82	54.67	27.37	37.31	33.54	25.53
13	SD-3.5-Large	35.32	75.96	54.00	49.65	10.30	30.37	32.24	25.66

#↕	Model↕	OCRGenScore ↑↕	Text Removal ↑		Style Transfer: Artistic Text ↑		Style Transfer: Hist. Doc. (VIEScore) ↑↕	Hist. Doc. Rest. (1-LPIPS) ↑↕	Scene Text SR (MSSSIM) ↑↕	Layout-Aware Text Gen. (VIEScore) ↑
#↕	Model↕	OCRGenScore ↑↕	Handwriting (MSSSIM)↕	Scene Text (MSSSIM)↕	VIEScore↕	AR↕	Style Transfer: Hist. Doc. (VIEScore) ↑↕	Hist. Doc. Rest. (1-LPIPS) ↑↕	Scene Text SR (MSSSIM) ↑↕	VIEScore↕	AR↕
Closed-Source Models
1	Nano Banana Pro	78.75	-	89.79	92.33	98.77	-	-	76.36	89.72	100.00
9	GPT Image 1.5	53.95	-	52.01	92.69	98.33	-	-	26.78	85.33	100.00
4	Seedream 4.5	62.73	-	78.87	86.44	99.45	-	-	54.74	53.74	100.00
Open-Source · Unified Understanding & Generation
5	BAGEL	60.96	-	72.36	53.64	24.30	-	-	87.29	68.11	64.24
7	OmniGen2	62.26	-	81.03	64.68	88.66	-	-	84.83	77.79	86.66
11	InternVL-U	45.45	-	68.79	62.41	62.77	-	-	16.00	45.60	88.36
15	ILLUME+	31.02	-	65.19	1.00	4.52	-	-	27.57	12.26	6.58
14	Janus-4o	33.54	-	34.21	50.16	16.34	-	-	21.89	50.14	34.64
Open-Source · Specialized Generation
8	Qwen-Image	55.14	-	50.60	72.99	94.93	-	-	28.06	86.44	98.86
3	LongCat-Image	65.76	-	88.80	80.42	97.24	-	-	30.69	84.67	99.38
13	SD-3.5-Large	35.32	-	26.60	65.23	84.12	-	-	16.58	19.11	9.68
12	FLUX.1-Kontext-dev	43.59	-	35.91	79.47	100.00	-	-	23.64	75.39	57.02
2	FLUX.2-dev	70.74	-	99.71	85.76	100.00	-	-	72.46	77.21	52.65
6	FLUX.2-Klein-9B	61.95	-	68.55	89.42	100.00	-	-	48.43	72.94	72.75
10	GLM-Image	47.17	-	14.42	65.34	59.41	-	-	12.53	82.31	75.03

1st place 2nd place 3rd place Bold = best per column | Underline = 2nd best | Ranked by OCRGenScore on English (EN) text samples

OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities

Benchmark Overview — Task & Data Examples

Submit Your Model Results

Preview — first 5 rows