spam filtering misc spams

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

spam filtering misc spams

David Diggles-2
I'm interested in hearing about peoples experiences with spam filtering the spam
emails that make it through to misc.  Mostly non-english.  I have been using
SpamAssassin and training it, yet the bayes in default weightings are not enough
to get the misc spams into my spam box... in fact many still autolearn as ham.

Email coming from the list server boosts the ham score. The locale plugin
for SA doesnt help at all.

I started working on something to check for word count % of words in an email,
from /usr/share/dict/words to detect english-ness.  It does work well but has it
already been one elsewhere?

Reply | Threaded
Open this post in threaded view
|

Re: spam filtering misc spams

Nicolai-8
On Tue, Oct 09, 2012 at 12:40:56AM +1000, David Diggles wrote:
> I'm interested in hearing about peoples experiences with spam filtering the spam
> emails that make it through to misc.

There are a few strings common in the spam that hits OpenBSD lists which
are unlikely to be found in good messages.  And since even legitimate
messages are bulk, I don't feel bad deleting occasional false positives.

X-Mailer: PHP/4.3
[IMAGE]
X-Mailer: SmartSend
X-Mailer: SendBlaster
nuestras listas
CAN-SPAM

And a few others.  Check your mail and decide for yourself.  I also
delete mail with a Reply-To: at hotmail or yahoo.

You could possibly also nuke

X-Converted-To-Plain-Text: from text/html by demime 1.01d

Who sends html email to a technical list?  Spammers and FAQ posters.

> I started working on something to check for word count % of words in an email,
> from /usr/share/dict/words to detect english-ness.  It does work well but has it
> already been one elsewhere?

Good idea, but maybe more work than necessary.  It'll probably flag
submitted patches and you'd have to teach it lots of words.  Just
searching for a few simple strings, spam is no longer hitting my inbox.

As an aside, it's nice to see someone addressing a problem that's
within their realm of control rather than complaining about it to the
list.

Nicolai

Reply | Threaded
Open this post in threaded view
|

Re: spam filtering misc spams

Ted Unangst-6
In reply to this post by David Diggles-2
On Tue, Oct 09, 2012 at 00:40, David Diggles wrote:
> I'm interested in hearing about peoples experiences with spam filtering
> the spam
> emails that make it through to misc.  Mostly non-english.  I have been using
> SpamAssassin and training it, yet the bayes in default weightings are not
> enough
> to get the misc spams into my spam box... in fact many still autolearn as
> ham.

I adjusted the scores so that anything with bayes probability greater
than 50 is spam.  That works pretty well.  Not really any reason to go
past 5, but I figured if I ever changed the minimum I'd be ready.

score BAYES_50 5
score BAYES_60 6
score BAYES_80 8
score BAYES_95 9
score BAYES_99 10

Reply | Threaded
Open this post in threaded view
|

Re: spam filtering misc spams

Mikkel C. Simonsen
In reply to this post by David Diggles-2
David Diggles wrote:
> I'm interested in hearing about peoples experiences with spam filtering the spam
> emails that make it through to misc.  Mostly non-english.  I have been using
> SpamAssassin and training it, yet the bayes in default weightings are not enough
> to get the misc spams into my spam box... in fact many still autolearn as ham.

I use bogofilter, and it tags almost all spam from this mailing list as
spam. There is an occasional false positive also though...

Best regards,

Mikkel C. Simonsen

Reply | Threaded
Open this post in threaded view
|

Re: spam filtering misc spams

Thomas Pfaff-5
On Mon, 08 Oct 2012 18:14:25 +0200
"Mikkel C. Simonsen" <[hidden email]> wrote:

> David Diggles wrote:
> > I'm interested in hearing about peoples experiences with spam filtering the spam
> > emails that make it through to misc.
[...]
> I use bogofilter, and it tags almost all spam from this mailing list as
> spam. There is an occasional false positive also though...
>

I also have very good results with bogofilter.

In my .mailfilter file I have (excerpt):

  xfilter "/usr/local/bin/bogofilter -u -e -p"

  if ( /^X-Bogosity: (Spam|Unsure)/ )
    to $MAILDIR/.Spam

Then there's the cron job that tells bogofilter that stuff
in TagSpam is spam (that I move there manually).  The end
result is pretty damn good and you don't get the noise that
DNSBLs generate.  Fine for SOHO at least.

Reply | Threaded
Open this post in threaded view
|

Re: spam filtering misc spams

David Diggles-2
In reply to this post by Ted Unangst-6
On Mon, Oct 08, 2012 at 12:11:43PM -0400, Ted Unangst wrote:

> On Tue, Oct 09, 2012 at 00:40, David Diggles wrote:
> > I'm interested in hearing about peoples experiences with spam filtering
> > the spam
> > emails that make it through to misc.  Mostly non-english.  I have been using
> > SpamAssassin and training it, yet the bayes in default weightings are not
> > enough
> > to get the misc spams into my spam box... in fact many still autolearn as
> > ham.
>
> I adjusted the scores so that anything with bayes probability greater
> than 50 is spam.  That works pretty well.  Not really any reason to go
> past 5, but I figured if I ever changed the minimum I'd be ready.
>
> score BAYES_50 5
> score BAYES_60 6
> score BAYES_80 8
> score BAYES_95 9
> score BAYES_99 10

Thanks Ted,

I am now trialing adjustment of bayes.

I had hoped something like this would have been possible in config.

if (header MAILING_LIST exists:list-id)
  score BAYES_50 5
  score BAYES_60 6
  score BAYES_80 8
  score BAYES_95 9
  score BAYES_99 10
endif

So it only adjusts the bayes for mailing lists.

Apparently need to write a plugin to do that.

.d.d.