Discussion:
Spamassassin default SHORT_URI list obsolete/outdated
(too old to reply)
jimimaseye
2016-07-01 07:35:14 UTC
Permalink
Recently I was in discussion with the creator of a URI_SHORTENER black list
maintainer that created a list of domains handling short URLs. (You can
find his full rule and details here:
http://snork.ca/posts/2016-06-24-surbl-of-url-shorteners-for-spamassassin/).
He has identified over 200 CURRENT url shorteners and maintains them
accordingly (viewable here:
http://snork.ca/posts/2016-06-24-surbl-of-url-shorteners-for-spamassassin/url_shorteners.txt).

I then informed him that SA alreadyhas a URL_SHORTENER checking rule found
in 72_ACTIVE.CF. I was currently using this as a META rule thus:

meta MY_URI_URLSHORT __URL_SHORTENER # defined in 72_active.cf

He quite rightly pointed out that the 43 included shortener domains that SA
checks for in the default rule is drastically short and outdated (some even
dont exist anymore) compared to his more current recently 200 researched
list.

Is there any way that maybe the default list that SA checks for in 72_ACTIVE
can be updated and how is this request made or implemented? (Forgive me, I
dont know how these things work).



--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-default-SHORT-URI-list-obsolete-outdated-tp121584.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Axb
2016-07-01 07:56:38 UTC
Permalink
Post by jimimaseye
Recently I was in discussion with the creator of a URI_SHORTENER black list
maintainer that created a list of domains handling short URLs. (You can
http://snork.ca/posts/2016-06-24-surbl-of-url-shorteners-for-spamassassin/).
He has identified over 200 CURRENT url shorteners and maintains them
http://snork.ca/posts/2016-06-24-surbl-of-url-shorteners-for-spamassassin/url_shorteners.txt).
I then informed him that SA alreadyhas a URL_SHORTENER checking rule found
meta MY_URI_URLSHORT __URL_SHORTENER # defined in 72_active.cf
ATM it seems there is no such rule - pls verify the name after running
sa-update
Post by jimimaseye
He quite rightly pointed out that the 43 included shortener domains that SA
checks for in the default rule is drastically short and outdated (some even
dont exist anymore) compared to his more current recently 200 researched
list.
URL shorteners aren't bad per se so it makes little sense to waste
cycles processing a long list which may or not be abused. Many of these
sites won't be around in 6 months, some have zero abuse some may even
be NXDOMAIN

Such rules are best mantained/provided by interested third parties which
may or not commit to keep them up to date.
SA devs don't really have the time to chase sites/domains and to load
the default rule set with extra bloat doesn't sound very wise.

Why not make this YOUR project?
Post by jimimaseye
Is there any way that maybe the default list that SA checks for in 72_ACTIVE
can be updated and how is this request made or implemented? (Forgive me, I
dont know how these things work).
See above..
Groach
2016-07-01 08:13:19 UTC
Permalink
Post by Axb
Post by jimimaseye
I then informed him that SA alreadyhas a URL_SHORTENER checking rule
found
meta MY_URI_URLSHORT __URL_SHORTENER # defined in 72_active.cf
ATM it seems there is no such rule - pls verify the name after running
sa-update
As quoted, it is " __URL_SHORTENER "

The entry reads as follows:

uri __URL_SHORTENER
/^http:\/\/(?:bit\.ly|tinyurl\.com|ow\.ly|is\.gd|tumblr\.com|formspring\.me|ff\.im|youtu\.be|tl\.gd|plurk\.com|migre\.me|j\.mp|cli\.gs|goo\.gl|yfrog\.com|lnk\.ms|su\.pr|fb\.me|alturl\.com|wp\.me|ping\.fm|chatter\.com|post\.ly|twurl\.nl|tiny\.cc|4sq\.com|ustre\.am|short\.to|u\.nu|flic\.kr|budurl\.com|digg\.com|twitvid\.com|gowal\.la|om\.ly|justin\.tv|icio\.us|p\.gs|loopt\.us|tcrn\.ch|xrl\.us|wpo\.st|bkite\.com)\/[^\/]{3}\/?/

and is used in other META rules such as MONEY_FRAUD_5 (you see it is
preceeded with "__" )
Post by Axb
URL shorteners aren't bad per se so it makes little sense to waste
cycles processing a long list which may or not be abused. Many of
these sites won't be around in 6 months, some have zero abuse some
may even be NXDOMAIN
You can see from 72_ACTIVE that the idea of using a url shortener isnt
bad by itself and that SA rules do use it in conjunction with other
'more likely' postive matching (such as MONEY_FRAUD_5)
Post by Axb
Such rules are best mantained/provided by interested third parties
which may or not commit to keep them up to date.
SA devs don't really have the time to chase sites/domains and to load
the default rule set with extra bloat doesn't sound very wise.
Why not make this YOUR project?
Ok, well, I will leave it as HIS project ;-) (the guy who has already
applied his research to provided this surbl lookup). He also has stated
that many of these sites come and go (as you imply).

Thanks
Axb
2016-07-01 08:40:24 UTC
Permalink
Post by Groach
Post by Axb
Post by jimimaseye
I then informed him that SA alreadyhas a URL_SHORTENER checking rule
found
meta MY_URI_URLSHORT __URL_SHORTENER # defined in 72_active.cf
ATM it seems there is no such rule - pls verify the name after running
sa-update
As quoted, it is " __URL_SHORTENER "
uri __URL_SHORTENER
/^http:\/\/(?:bit\.ly|tinyurl\.com|ow\.ly|is\.gd|tumblr\.com|formspring\.me|ff\.im|youtu\.be|tl\.gd|plurk\.com|migre\.me|j\.mp|cli\.gs|goo\.gl|yfrog\.com|lnk\.ms|su\.pr|fb\.me|alturl\.com|wp\.me|ping\.fm|chatter\.com|post\.ly|twurl\.nl|tiny\.cc|4sq\.com|ustre\.am|short\.to|u\.nu|flic\.kr|budurl\.com|digg\.com|twitvid\.com|gowal\.la|om\.ly|justin\.tv|icio\.us|p\.gs|loopt\.us|tcrn\.ch|xrl\.us|wpo\.st|bkite\.com)\/[^\/]{3}\/?/
ok - found it... and must say this rule is pretty sloppy and should
probably be deprecated. I hope whoever compiled this list takes a look
into this.
It includes domains which are clearly not URI shorteners, or never used
in spam, etc.

Imo, this rule can probably be deprecated in favour of network lookups
Post by Groach
and is used in other META rules such as MONEY_FRAUD_5 (you see it is
preceeded with "__" )
Post by Axb
URL shorteners aren't bad per se so it makes little sense to waste
cycles processing a long list which may or not be abused. Many of
these sites won't be around in 6 months, some have zero abuse some
may even be NXDOMAIN
You can see from 72_ACTIVE that the idea of using a url shortener isnt
bad by itself and that SA rules do use it in conjunction with other
'more likely' postive matching (such as MONEY_FRAUD_5)
Post by Axb
Such rules are best mantained/provided by interested third parties
which may or not commit to keep them up to date.
SA devs don't really have the time to chase sites/domains and to load
the default rule set with extra bloat doesn't sound very wise.
Why not make this YOUR project?
Ok, well, I will leave it as HIS project ;-) (the guy who has already
applied his research to provided this surbl lookup). He also has stated
that many of these sites come and go (as you imply).
His project is to mantain a domain list, similar to Spamhaus DBL's
section "127.0.1.103 abused spammed redirector domain"
To mantain a SA rule with that data seems like a redundant effort but if
someone needs this in would be wiser to tackle it at source to avoid
stale data.

Loading...