Political Email Extraction Leaderboard

What kind of sicko signs up for political fundraising emails from just about every committee? Oh, right, that's me. I've collected thousands of political fundraising emails and challenged various LLMs to extract committee names from their disclaimers (like "Paid for by The Pennsylvania Democratic Party"). This extraction isn't straightforward - disclaimers vary in format and position, with some being simple and others continuing with additional text about contributions and treasurers.

Using the same 1,000 emails from November 2024 and a zero-shot prompt asking models to extract committee names and senders, I've compared how different LLMs perform at this task. The leaderboard below shows each model's success rate at correctly matching the committee names in the training dataset. For more details on this project, read my full blog post. You can also explore the complete code and extraction results on GitHub.

Updated Prompt

Model Total Records Matches Match % Accuracy Precision Recall F1 Score
gpt-4o_november_2024_prompt2.json 1000 967 96.70% 0.97 0.91 0.91 0.91
claude37_sonnet_november_2024_prompt2.json 1000 962 96.20% 0.96 0.90 0.90 0.90
gemini_25_november_2024_prompt2.json 1000 960 96.00% 0.96 0.89 0.89 0.89
claude35_haiku_november_2024_prompt2.json 1000 954 95.40% 0.95 0.88 0.88 0.88
qwen25_november_2024_prompt2.json 1000 953 95.30% 0.95 0.88 0.86 0.87
mistral_small_31_november_2024_prompt2.json 1000 940 94.00% 0.94 0.87 0.86 0.86
claude35_sonnet_november_2024_prompt2.json 1000 936 93.60% 0.94 0.88 0.87 0.87
gemini20_flash_november_2024_prompt2.json 1000 930 93.00% 0.93 0.82 0.80 0.81
cogito_70b_november_2024_prompt2.json 1000 889 88.90% 0.89 0.82 0.79 0.80
llama33_november_2024_prompt2.json 1000 888 88.80% 0.89 0.81 0.79 0.80
openthinker_32b_november_2024_prompt2.json 1000 868 86.80% 0.87 0.79 0.75 0.76
mistral_nemo_november_2024_prompt2.json 1000 867 86.70% 0.87 0.79 0.76 0.77
qwq_november_2024_prompt2.json 1000 861 86.10% 0.86 0.79 0.76 0.77
cogito_32b_november_2024_prompt2.json 1000 859 85.90% 0.86 0.79 0.75 0.76
gemma2_27b_november_2024_prompt2.json 1000 858 85.80% 0.86 0.76 0.71 0.73
gemma2_november_2024_prompt2.json 1000 857 85.70% 0.86 0.75 0.70 0.72
cogito_14b_november_2024_prompt2.json 1000 855 85.50% 0.85 0.77 0.73 0.74
qwen25_72b_november_2024_prompt2.json 1000 855 85.50% 0.85 0.79 0.75 0.76
phi4_november_2024_prompt2.json 1000 850 85.00% 0.85 0.74 0.70 0.71
tulu3_8b_november_2024_prompt2.json 1000 848 84.80% 0.85 0.77 0.74 0.75
olmo2_13b_november_2024_prompt2.json 1000 846 84.60% 0.85 0.75 0.70 0.72
o3-mini_november_2024_prompt2.json 1000 838 83.80% 0.84 0.68 0.64 0.65
mistral_small_24b_november_2024_prompt2.json 1000 833 83.30% 0.83 0.75 0.71 0.72
mixtral_november_2024_prompt2.json 1000 832 83.20% 0.83 0.75 0.71 0.72
gemma3_27b_november_2024_prompt2.json 1000 817 81.70% 0.82 0.71 0.66 0.67
command_a_november_2024_prompt2.json 1000 816 81.60% 0.82 0.76 0.72 0.73
internlm2_november_2024_prompt2.json 1000 813 81.30% 0.81 0.69 0.65 0.66
olmo2_7b_november_2024_prompt2.json 1000 812 81.20% 0.81 0.71 0.67 0.68
granite31_dense_8b_november_2024_prompt2.json 1000 811 81.10% 0.81 0.71 0.66 0.67
granite3.2_november_2024_prompt2.json 1000 805 80.50% 0.81 0.69 0.64 0.66
exaone35_32b_november_2024_prompt2.json 1000 803 80.30% 0.80 0.70 0.65 0.66
command_r7b_november_2024_prompt2.json 1000 768 76.80% 0.77 0.67 0.62 0.63
phi4_mini_november_2024_prompt2.json 1000 751 75.10% 0.75 0.55 0.48 0.50
gemma3_12b_november_2024_prompt2.json 1000 738 73.80% 0.74 0.61 0.55 0.56
deepseek_r1_32b_november_2024_prompt2.json 1000 727 72.70% 0.73 0.69 0.60 0.62
hermes3_70b_november_2024_prompt2.json 1000 636 63.60% 0.64 0.70 0.56 0.61
phi3_november_2024_prompt2.json 1000 580 58.00% 0.58 0.48 0.38 0.41

Original Prompt

Model (JSON Filename) Total Records Committee Matches Match % Accuracy Precision Recall F1 Score
gpt-4o-mini_november_2024.json 1000 967 96.70% 0.97 0.91 0.91 0.91
llama33_november_2024.json 1000 941 94.10% 0.94 0.87 0.85 0.86
openai4o_november_2024.json 1000 940 94.00% 0.94 0.89 0.86 0.87
openai4o_nov24_november_2024.json 1000 938 93.80% 0.94 0.87 0.86 0.86
phi4_november_2024.json 1000 847 84.70% 0.85 0.73 0.68 0.70
deepseek_r1_32b_november_2024.json 1000 740 74.00% 0.74 0.70 0.61 0.64
solar-pro_november_2024.json 1000 330 33.00% 0.33 0.18 0.13 0.14
qwq_november_2024.json 996 799 80.22% 0.80 0.77 0.70 0.72
mistral_small24b_november_2024.json 995 808 81.21% 0.81 0.71 0.66 0.67
gemma2_27b_november_2024.json 994 822 82.70% 0.83 0.70 0.64 0.66
internlm2_november_2024.json 992 710 71.57% 0.72 0.53 0.47 0.49
claude3_sonnet_november_2024.json 985 925 93.91% 0.94 0.88 0.87 0.87
exaone35_november_2024.json 960 671 69.90% 0.70 0.51 0.45 0.47
claude35_haiku_november_2024.json 880 840 95.45% 0.95 0.88 0.88 0.88
deepseek_r1_8b_november_2024.json 781 371 47.50% 0.48 0.52 0.37 0.42
llama-3.2-3b-preview_november_2024.json 552 218 39.49% 0.39 0.40 0.28 0.31
llama323b_november_2024.json 524 301 57.44% 0.57 0.34 0.29 0.30
phi3_november_2024.json 521 191 36.66% 0.37 0.34 0.23 0.26
llama318b_november_2024.json 507 441 86.98% 0.87 0.73 0.71 0.71
mistral_small_november_2024.json 502 397 79.08% 0.79 0.64 0.61 0.62
gemma2_november_2024.json 495 428 86.46% 0.86 0.75 0.72 0.73
mixtral_november_2024.json 461 419 90.89% 0.91 0.86 0.84 0.84
starling-lm_november_2024.json 270 202 74.81% 0.75 0.66 0.60 0.62
Last Updated: 2025-04-11 16:48:38