What kind of sicko signs up for political fundraising emails from just about every committee? Oh, right, that's me. I've collected thousands of political fundraising emails and challenged various LLMs to extract committee names from their disclaimers (like "Paid for by The Pennsylvania Democratic Party"). This extraction isn't straightforward - disclaimers vary in format and position, with some being simple and others continuing with additional text about contributions and treasurers.
Using the same 1,000 emails from November 2024 and a zero-shot prompt asking models to extract committee names and senders, I've compared how different LLMs perform at this task. The leaderboard below shows each model's success rate at correctly matching the committee names in the training dataset. For more details on this project, read my full blog post. You can also explore the complete code and extraction results on GitHub.
Model | Total Records | Matches | Match % | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|---|---|
gpt-4o_november_2024_prompt2.json | 1000 | 967 | 96.70% | 0.97 | 0.91 | 0.91 | 0.91 |
claude37_sonnet_november_2024_prompt2.json | 1000 | 962 | 96.20% | 0.96 | 0.90 | 0.90 | 0.90 |
gemini_25_november_2024_prompt2.json | 1000 | 960 | 96.00% | 0.96 | 0.89 | 0.89 | 0.89 |
claude35_haiku_november_2024_prompt2.json | 1000 | 954 | 95.40% | 0.95 | 0.88 | 0.88 | 0.88 |
qwen25_november_2024_prompt2.json | 1000 | 953 | 95.30% | 0.95 | 0.88 | 0.86 | 0.87 |
mistral_small_31_november_2024_prompt2.json | 1000 | 940 | 94.00% | 0.94 | 0.87 | 0.86 | 0.86 |
claude35_sonnet_november_2024_prompt2.json | 1000 | 936 | 93.60% | 0.94 | 0.88 | 0.87 | 0.87 |
gemini20_flash_november_2024_prompt2.json | 1000 | 930 | 93.00% | 0.93 | 0.82 | 0.80 | 0.81 |
cogito_70b_november_2024_prompt2.json | 1000 | 889 | 88.90% | 0.89 | 0.82 | 0.79 | 0.80 |
llama33_november_2024_prompt2.json | 1000 | 888 | 88.80% | 0.89 | 0.81 | 0.79 | 0.80 |
openthinker_32b_november_2024_prompt2.json | 1000 | 868 | 86.80% | 0.87 | 0.79 | 0.75 | 0.76 |
mistral_nemo_november_2024_prompt2.json | 1000 | 867 | 86.70% | 0.87 | 0.79 | 0.76 | 0.77 |
qwq_november_2024_prompt2.json | 1000 | 861 | 86.10% | 0.86 | 0.79 | 0.76 | 0.77 |
cogito_32b_november_2024_prompt2.json | 1000 | 859 | 85.90% | 0.86 | 0.79 | 0.75 | 0.76 |
gemma2_27b_november_2024_prompt2.json | 1000 | 858 | 85.80% | 0.86 | 0.76 | 0.71 | 0.73 |
gemma2_november_2024_prompt2.json | 1000 | 857 | 85.70% | 0.86 | 0.75 | 0.70 | 0.72 |
cogito_14b_november_2024_prompt2.json | 1000 | 855 | 85.50% | 0.85 | 0.77 | 0.73 | 0.74 |
qwen25_72b_november_2024_prompt2.json | 1000 | 855 | 85.50% | 0.85 | 0.79 | 0.75 | 0.76 |
phi4_november_2024_prompt2.json | 1000 | 850 | 85.00% | 0.85 | 0.74 | 0.70 | 0.71 |
tulu3_8b_november_2024_prompt2.json | 1000 | 848 | 84.80% | 0.85 | 0.77 | 0.74 | 0.75 |
olmo2_13b_november_2024_prompt2.json | 1000 | 846 | 84.60% | 0.85 | 0.75 | 0.70 | 0.72 |
o3-mini_november_2024_prompt2.json | 1000 | 838 | 83.80% | 0.84 | 0.68 | 0.64 | 0.65 |
mistral_small_24b_november_2024_prompt2.json | 1000 | 833 | 83.30% | 0.83 | 0.75 | 0.71 | 0.72 |
mixtral_november_2024_prompt2.json | 1000 | 832 | 83.20% | 0.83 | 0.75 | 0.71 | 0.72 |
gemma3_27b_november_2024_prompt2.json | 1000 | 817 | 81.70% | 0.82 | 0.71 | 0.66 | 0.67 |
command_a_november_2024_prompt2.json | 1000 | 816 | 81.60% | 0.82 | 0.76 | 0.72 | 0.73 |
internlm2_november_2024_prompt2.json | 1000 | 813 | 81.30% | 0.81 | 0.69 | 0.65 | 0.66 |
olmo2_7b_november_2024_prompt2.json | 1000 | 812 | 81.20% | 0.81 | 0.71 | 0.67 | 0.68 |
granite31_dense_8b_november_2024_prompt2.json | 1000 | 811 | 81.10% | 0.81 | 0.71 | 0.66 | 0.67 |
granite3.2_november_2024_prompt2.json | 1000 | 805 | 80.50% | 0.81 | 0.69 | 0.64 | 0.66 |
exaone35_32b_november_2024_prompt2.json | 1000 | 803 | 80.30% | 0.80 | 0.70 | 0.65 | 0.66 |
command_r7b_november_2024_prompt2.json | 1000 | 768 | 76.80% | 0.77 | 0.67 | 0.62 | 0.63 |
phi4_mini_november_2024_prompt2.json | 1000 | 751 | 75.10% | 0.75 | 0.55 | 0.48 | 0.50 |
gemma3_12b_november_2024_prompt2.json | 1000 | 738 | 73.80% | 0.74 | 0.61 | 0.55 | 0.56 |
deepseek_r1_32b_november_2024_prompt2.json | 1000 | 727 | 72.70% | 0.73 | 0.69 | 0.60 | 0.62 |
hermes3_70b_november_2024_prompt2.json | 1000 | 636 | 63.60% | 0.64 | 0.70 | 0.56 | 0.61 |
phi3_november_2024_prompt2.json | 1000 | 580 | 58.00% | 0.58 | 0.48 | 0.38 | 0.41 |
Model (JSON Filename) | Total Records | Committee Matches | Match % | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|---|---|
gpt-4o-mini_november_2024.json | 1000 | 967 | 96.70% | 0.97 | 0.91 | 0.91 | 0.91 |
llama33_november_2024.json | 1000 | 941 | 94.10% | 0.94 | 0.87 | 0.85 | 0.86 |
openai4o_november_2024.json | 1000 | 940 | 94.00% | 0.94 | 0.89 | 0.86 | 0.87 |
openai4o_nov24_november_2024.json | 1000 | 938 | 93.80% | 0.94 | 0.87 | 0.86 | 0.86 |
phi4_november_2024.json | 1000 | 847 | 84.70% | 0.85 | 0.73 | 0.68 | 0.70 |
deepseek_r1_32b_november_2024.json | 1000 | 740 | 74.00% | 0.74 | 0.70 | 0.61 | 0.64 |
solar-pro_november_2024.json | 1000 | 330 | 33.00% | 0.33 | 0.18 | 0.13 | 0.14 |
qwq_november_2024.json | 996 | 799 | 80.22% | 0.80 | 0.77 | 0.70 | 0.72 |
mistral_small24b_november_2024.json | 995 | 808 | 81.21% | 0.81 | 0.71 | 0.66 | 0.67 |
gemma2_27b_november_2024.json | 994 | 822 | 82.70% | 0.83 | 0.70 | 0.64 | 0.66 |
internlm2_november_2024.json | 992 | 710 | 71.57% | 0.72 | 0.53 | 0.47 | 0.49 |
claude3_sonnet_november_2024.json | 985 | 925 | 93.91% | 0.94 | 0.88 | 0.87 | 0.87 |
exaone35_november_2024.json | 960 | 671 | 69.90% | 0.70 | 0.51 | 0.45 | 0.47 |
claude35_haiku_november_2024.json | 880 | 840 | 95.45% | 0.95 | 0.88 | 0.88 | 0.88 |
deepseek_r1_8b_november_2024.json | 781 | 371 | 47.50% | 0.48 | 0.52 | 0.37 | 0.42 |
llama-3.2-3b-preview_november_2024.json | 552 | 218 | 39.49% | 0.39 | 0.40 | 0.28 | 0.31 |
llama323b_november_2024.json | 524 | 301 | 57.44% | 0.57 | 0.34 | 0.29 | 0.30 |
phi3_november_2024.json | 521 | 191 | 36.66% | 0.37 | 0.34 | 0.23 | 0.26 |
llama318b_november_2024.json | 507 | 441 | 86.98% | 0.87 | 0.73 | 0.71 | 0.71 |
mistral_small_november_2024.json | 502 | 397 | 79.08% | 0.79 | 0.64 | 0.61 | 0.62 |
gemma2_november_2024.json | 495 | 428 | 86.46% | 0.86 | 0.75 | 0.72 | 0.73 |
mixtral_november_2024.json | 461 | 419 | 90.89% | 0.91 | 0.86 | 0.84 | 0.84 |
starling-lm_november_2024.json | 270 | 202 | 74.81% | 0.75 | 0.66 | 0.60 | 0.62 |