Political Email Extraction Leaderboard

What kind of sicko signs up for political fundraising emails from just about every committee? Oh, right, that's me. I've collected thousands of political fundraising emails and challenged various LLMs to extract committee names from their disclaimers (like "Paid for by The Pennsylvania Democratic Party"). This extraction isn't straightforward - disclaimers vary in format and position, with some being simple and others continuing with additional text about contributions and treasurers.

Using the same 1,000 emails from November 2024 and a zero-shot prompt asking models to extract committee names and senders, I've compared how different LLMs perform at this task. The leaderboard below shows each model's success rate at correctly matching the committee names in the training dataset. For more details on this project, read my full blog post. You can also explore the complete code and extraction results on GitHub.

Updated Prompt

Model Total Records Matches Match % Accuracy Precision Recall F1 Score
gpt-4o_november_2024_prompt2.json 1000 969 96.90% 0.97 0.91 0.91 0.91
claude37_sonnet_november_2024_prompt2.json 1000 964 96.40% 0.96 0.91 0.91 0.90
gemini_25_november_2024_prompt2.json 1000 962 96.20% 0.96 0.89 0.89 0.89
gemini25_flash_preview_04_17_november_2024_prompt2.json 1000 961 96.10% 0.96 0.89 0.88 0.88
claude35_haiku_november_2024_prompt2.json 1000 956 95.60% 0.96 0.88 0.88 0.88
qwen25_november_2024_prompt2.json 1000 955 95.50% 0.95 0.88 0.87 0.87
mistral_small_31_november_2024_prompt2.json 1000 942 94.20% 0.94 0.88 0.86 0.86
llama_4_maverick_17b_128e_instruct_november_2024_prompt2.json 1000 940 94.00% 0.94 0.83 0.82 0.82
claude35_sonnet_november_2024_prompt2.json 1000 938 93.80% 0.94 0.88 0.87 0.87
llama_4_scout_17b_16e_instruct_november_2024_prompt2.json 1000 936 93.60% 0.94 0.87 0.86 0.86
mistral_small_32_november_2024_prompt2.json 1000 934 93.40% 0.93 0.84 0.82 0.83
gemini20_flash_november_2024_prompt2.json 1000 932 93.20% 0.93 0.83 0.81 0.81
gpt_41_mini_november_2024_prompt2.json 1000 932 93.20% 0.93 0.83 0.81 0.82
cogito_70b_november_2024_prompt2.json 1000 891 89.10% 0.89 0.82 0.80 0.80
llama33_november_2024_prompt2.json 1000 890 89.00% 0.89 0.82 0.79 0.80
gpt_41_nano_november_2024_prompt2.json 1000 884 88.40% 0.88 0.76 0.74 0.74
mistral_nemo_november_2024_prompt2.json 1000 869 86.90% 0.87 0.80 0.77 0.77
openthinker_32b_november_2024_prompt2.json 1000 868 86.80% 0.87 0.78 0.75 0.76
qwq_november_2024_prompt2.json 1000 863 86.30% 0.86 0.79 0.76 0.77
cogito_32b_november_2024_prompt2.json 1000 861 86.10% 0.86 0.79 0.75 0.76
gemma2_november_2024_prompt2.json 1000 859 85.90% 0.86 0.75 0.70 0.72
gemma2_27b_november_2024_prompt2.json 1000 858 85.80% 0.86 0.76 0.71 0.73
cogito_14b_november_2024_prompt2.json 1000 857 85.70% 0.86 0.77 0.74 0.75
qwen25_72b_november_2024_prompt2.json 1000 857 85.70% 0.86 0.79 0.75 0.77
phi4_reasoning_plus_november_2024_prompt2.json 1000 853 85.30% 0.85 0.71 0.67 0.68
phi4_november_2024_prompt2.json 1000 852 85.20% 0.85 0.74 0.70 0.72
tulu3_8b_november_2024_prompt2.json 1000 850 85.00% 0.85 0.78 0.74 0.75
olmo2_13b_november_2024_prompt2.json 1000 848 84.80% 0.85 0.75 0.70 0.72
o3-mini_november_2024_prompt2.json 1000 836 83.60% 0.84 0.68 0.64 0.65
mistral_small_24b_november_2024_prompt2.json 1000 835 83.50% 0.83 0.76 0.71 0.73
mixtral_november_2024_prompt2.json 1000 834 83.40% 0.83 0.75 0.71 0.72
command_a_november_2024_prompt2.json 1000 818 81.80% 0.82 0.77 0.72 0.73
gemma3_27b_november_2024_prompt2.json 1000 817 81.70% 0.82 0.71 0.66 0.67
internlm2_november_2024_prompt2.json 1000 815 81.50% 0.81 0.69 0.65 0.66
granite31_dense_8b_november_2024_prompt2.json 1000 813 81.30% 0.81 0.71 0.66 0.67
olmo2_7b_november_2024_prompt2.json 1000 810 81.00% 0.81 0.71 0.66 0.68
granite3.2_november_2024_prompt2.json 1000 807 80.70% 0.81 0.69 0.65 0.66
granite3.3_november_2024_prompt2.json 1000 803 80.30% 0.80 0.68 0.63 0.65
exaone35_32b_november_2024_prompt2.json 1000 803 80.30% 0.80 0.70 0.65 0.66
gemma3_27b_it_qat_november_2024_prompt2.json 1000 802 80.20% 0.80 0.70 0.65 0.66
exaone_deep_32b_november_2024_prompt2.json 1000 798 79.80% 0.80 0.70 0.64 0.66
command_r7b_november_2024_prompt2.json 1000 770 77.00% 0.77 0.67 0.62 0.63
phi4_mini_november_2024_prompt2.json 1000 753 75.30% 0.75 0.55 0.48 0.50
qwen3_8b_november_2024_prompt2.json 1000 736 73.60% 0.74 0.69 0.62 0.64
gemma3_12b_november_2024_prompt2.json 1000 736 73.60% 0.74 0.60 0.54 0.56
deepseek_r1_32b_november_2024_prompt2.json 1000 729 72.90% 0.73 0.69 0.60 0.63
internlm3_8b_instruct_november_2024_prompt2.json 1000 653 65.30% 0.65 0.41 0.35 0.37
qwen3_32b_november_2024_prompt2.json 1000 638 63.80% 0.64 0.70 0.55 0.60
hermes3_70b_november_2024_prompt2.json 1000 637 63.70% 0.64 0.71 0.56 0.61
phi3_november_2024_prompt2.json 1000 580 58.00% 0.58 0.48 0.38 0.41
qwen3_14b_november_2024_prompt2.json 1000 239 23.90% 0.24 0.35 0.20 0.24

Original Prompt

Model (JSON Filename) Total Records Committee Matches Match % Accuracy Precision Recall F1 Score
gpt-4o-mini_november_2024.json 1000 969 96.90% 0.97 0.91 0.91 0.91
llama33_november_2024.json 1000 943 94.30% 0.94 0.87 0.86 0.86
openai4o_nov24_november_2024.json 1000 940 94.00% 0.94 0.87 0.86 0.87
openai4o_november_2024.json 1000 938 93.80% 0.94 0.88 0.86 0.86
phi4_november_2024.json 1000 849 84.90% 0.85 0.73 0.69 0.70
deepseek_r1_32b_november_2024.json 1000 742 74.20% 0.74 0.70 0.62 0.64
solar-pro_november_2024.json 1000 330 33.00% 0.33 0.18 0.13 0.14
qwq_november_2024.json 996 800 80.32% 0.80 0.77 0.70 0.72
mistral_small24b_november_2024.json 995 810 81.41% 0.81 0.71 0.66 0.68
gemma2_27b_november_2024.json 994 822 82.70% 0.83 0.70 0.64 0.66
internlm2_november_2024.json 992 712 71.77% 0.72 0.53 0.47 0.49
claude3_sonnet_november_2024.json 985 927 94.11% 0.94 0.88 0.87 0.87
exaone35_november_2024.json 960 671 69.90% 0.70 0.51 0.44 0.47
claude35_haiku_november_2024.json 880 842 95.68% 0.96 0.89 0.89 0.88
deepseek_r1_8b_november_2024.json 781 371 47.50% 0.48 0.52 0.37 0.42
llama-3.2-3b-preview_november_2024.json 552 219 39.67% 0.40 0.40 0.28 0.31
llama323b_november_2024.json 524 303 57.82% 0.58 0.34 0.29 0.30
phi3_november_2024.json 521 191 36.66% 0.37 0.34 0.23 0.26
llama318b_november_2024.json 507 442 87.18% 0.87 0.73 0.71 0.72
mistral_small_november_2024.json 502 398 79.28% 0.79 0.65 0.61 0.62
gemma2_november_2024.json 495 426 86.06% 0.86 0.75 0.71 0.72
mixtral_november_2024.json 461 420 91.11% 0.91 0.86 0.84 0.85
starling-lm_november_2024.json 270 202 74.81% 0.75 0.66 0.60 0.62
Last Updated: 2025-07-11 00:47:46