Spamstats
History
Spamstats was originally written in 2002 for the Council of Europe, and has since been modified to support new products and new features. In january 2004 the German edition of Linux Magazine published an article about Spamstats, followed the month after by the English edition of the same newspaper.
Functionnalities
This script analyses log entries from your postfix, exim or sendmail email server, together with data from spamassassin, and will report to you the amount of spam, and non spam messages, that your site receives. Other nice and exclusive features are a sorting of top spammed email addresses of your domain, volume informations, html output...
Spamstats can easily be interfaced with the excellent Cricket graphing program to report very precise number of spam/non spam emails your site receives at any given time, together with other interesting spam-related informations.
Here are a few sample cricket graphs Spamstats allows you to draw.
Graphing of daily scores tagged by Spamassassin
Graphing of daily number of clean emails/spams
Weekly graphing of spamassassin processing time
Functionnalities
This Perl script parses mail.log file, generated by exim/postfix/sendmail and ( spamassassin 's)spamd. The script will currently not work with other mailers than exim, postfix or sendmail (contributions will be welcome) or if you do not use spamd.
Known to work with Postfix v 1.1.11 and later versions, and spamd v 2.43/2.44/2.50 and later releases
Tested with Exim v unknown
Sendmail is supported from version 0.4 on.
Qmail support
As Qmail log format is very different, and provides very different informations from other mailers (sendmail, exim, postfix), no Qmail logs support is planned at this time.
IN
Supports multiple file input, .gz format input Autodetects mailer format (exim/postfix/sendmail). TODO : Exiscan support
OUT (stdout)
Reports number of spams, number of clean messages, overall spamassassin average score, spam average score, clean message average score, average time to process an email. Also reports (as an option) main spam recipients on your network.
Required perl modules
- Getopt::Long
- Compress::Zlib
OUTPUT Example
$ spamstats.pl -number=5 /var/log/mail.log /var/log/mail.log.2.gz
File /var/log/mail.log : from Nov 15 00:06:04 to Nov 15 11:07:57
File /var/log/mail.log2.gz : from Nov 12 00:06:04 to Nov 13 00:00:15
Total number of emails processed by the spam filter : 3304
Number of spams : 500 (15.13%)
Number of clean messages : 2804 (84.87%)
Average message analysis time : 2.48 seconds
Average spam analysis time : 2.07 seconds
Average clean message analysis time : 2.55 seconds
Average message score : 2.41
Average spam score : 12.89
Average clean message score : 0.62
Recipients with highest number of spams : (top 5)
17 spams :
user1@foo.com
14 spams :
user2@foo.com
13 spams :
user3@foo.com
9 spams :
user4@foo.com
user5@foo.com
Download
Spamstats Version 0.6c (latest) and optionnally Dan Larsson’s patch. This patch makes it easier to trace who actually got the email when using the "-number" commandline argument. It may depend on how the mailsystem is configured; on some configurations inbound smtp gateways who resolve to , seeing in the report isn’t very useful since it’s impossible to remember what userid at which mailserver is what customer. This patch only is useful to some postfix users. Beware this patch was written against 0.4 version. It should apply without too many problem to 0.5 though. Please read this short FAQ before asking for support.
Latest bugs fixes / features :
- 0.6c Transparent support for bzip2 files. Thanks to Yen-Ming Lee for the patch.
- 0.6b Now parses logs from spamd v3.1.0rc1. "Old" format is of course still supported.
- 0.6a New option from Jean-Louis Bergamo: "-spamd", to be used if you want to use only spamd logs and no correlation with mailer logs. This option removes some useful features, so don’t use unless you have a real need for it.
- 0.6 Added lots of cricket scripts to graph minmax originating data. Corrected minor bugs, linked to those updates. Added Radko Keves’ (0.5b) version with -img option
- 0.5b Fixed a stupid typo where spam volumes unit was false. Thanks to Matthew McGehrin for the bugreport.
- 0.5a Support for BSD’s sm-mta mailer, which is actually just a sendmail alias. -firstdate option added. Thanks to Radko Keves!
- 0.5 -minmax option added. Thanks to Cyril Chaboisseau! Fixed the display bug with -number and -html
- 0.4b5 Fixed bug when date specified a month starting with "0"
- 0.4b4 Fixed sendmail/procmail parsing bug and corrected the arg infile==0 bug
- 0.4b3 Added documentation about how to graph spamstats output, and tiny bug fixes and cleanup from Bob Apthorpe. 2 Jun 2003 The —file switch can now be omitted, which makes it easier to feed spamstats with multiple files at a time.
- 0.4b2 Two regexp fixes (on exim, and on spamd). Documentation (README) now more english
- 0.4b1 Fixed a tiny parsing bug in some sendmail relaying config. 19 May 2003
- 0.4b New counting scheme. Emails are counted per recipient, not per MailerID anymore. THIS MEANS SPAMSTATS DOESNT COUNT THE SAME WAY AS IT DID ANYMORE!! Use the "-agglo-recipient" (new) option to keep old way of counting. WARNING : for now EXIM USERS want to set this !! Applied Jim Breton’s patch for a better display. Added documentation 10 Mar 2003
- 0.4 Sendmail support. This new feature is actually only -0.3b2 with some new regexp :-)
- 0.3b2 Fixes a minor bug that happened only when logfile is [near] empty. 30 Jan 2003
- 0.3b Duration support (actually a time filter specification extension). 04 Jan 2003
- 0.3a Time filters support. This allows specifying a time frame for analysis. 29 Dec 2002
- 0.3alpha Exim support (still a bit buggy, it seems 90% accurate).
- 0.2f If an input file doesn’t exist, says which. 26 Nov 2002
- 0.2e Reports total spam and clean volumes (in bytes). Allows nonabsolute filename parameters. 26 Nov 2002
- 0.2d Local recipients are counted as well as relayed ones (pipe/local bug)
- 0.2c Email recipient ordering is now case-insensitive.
- 0.2b User spamd was running on had to be "spamd" for the script to work. Huh!
