Skip to: Navigation | Content | Sidebar | Footer


Weblog Entry

Random Absurdities

January 31, 2005

A few recent occurrences that have me wondering about the general sanity of publishing online.

Observed on this site in the last 7 days (pardon my obfuscation):

Exhibit A — a prolonged attempt to flood the search form with keywords. A sequential, alphabetical list of nouns and verbs relating to common household items, popular pop culture references, and common sp@mmer standbys.

Exhibit B — continued daily sp@m attempts on the Zen Garden submission form. P0ker, dru9s, and making quick bucks are as popular as ever.

Exhibit C — (and this has to be my favourite) A semi-Godwinian comment on a past entry comparing lack of sensitivity in the design world, to a recent prison scandal on foreign soil. A Google search into this site based on the name of said prison, by another individual. A new and politically-oriented, borderline racist comment added, with an unspecified threat toward the site-owner (um, me) if said comment were deleted. (Comment was deleted). A crap-flood of vengeful, taunting remarks over the course of a few minutes.

The first, I don’t understand. The second tells me sp@mmers are a) stupid, and b) attack anything that submits. The third tells me that com-ments on a personal site are no place for free speech.

But hey, silver lining: com-ment and refe-rrer sp@m is at an all-time low. And the less you talk about it, and the more you protect yourself, the more chance you have of similar luck.

Update: The Register published an interview with a link sp@mmer today. Worth a read. (via)


Dave S. says:
January 31, 04h

Zero tolerance for monkeying around in here today. Comments will most likely be nice and hyphenated like the original article, too. You’ve been warned.

2
W3bbo says:
January 31, 04h

Whats wrong with good ol’ fashioned “CAPTCHAS” to prevent it?

Then there’s always the fallbacks of IP and cookie filtering and posting throttles

…The rest of us just set an appropiate robots.txt on our comments pages anyway, thus preventing them from being indexed.

I don’t see the point in hyphenating s-p-a-m when a simple googleblock does the trick

Hint: Typekey :)

Dave S. says:
January 31, 04h

CAPTCHAs are inaccessible in most ways. I have perfect vision and I have a hard time reading them myself sometimes. And they’ve already been routed around in some cases.

The point wasn’t comments in general, anyway. The point was to highlight some of the other oddities I’ve seen.

Dave S. says:
January 31, 04h

And, you’re not suggesting I set robots.txt to deny indexing of articles themselves on this site, are you? Because that’s where the hyphenation was happening when you commented.

Tom says:
January 31, 04h

I don’t like CAPTCHAs either. You might find this interesting, its an interview with one of the sp@mmers.

http://www.theregister.co.uk/2005/01/31/link_spamer_interview/

January 31, 04h

What’s the point in hyphenating sp@m? Are they known to search for it?

I did find an unusually high concentration of sp@mming on the one entry on my site that talks about a (flawed) counter-measure I came up with. This would seem to support this theory, but I’d be interested hearing about more solid evidence.

W3bbo: the problem with CAPTCHAs are:

1: If you can’t see images, because you are visually impaired, are using a text browser or otherwise, then you can’t pass the test.

2: Sp@mmers have been known to take CAPTCHAs from websites and use them on their own sites (usually porn). They get the clear text that way.

3: They are just plain annoying. :)

January 31, 04h

Typekey, for MT users, seems more and more attractive. I’m starting a new MT-powered site, in 1 month tops, and I think I’m going for Typekey-only com-menting.

Andrew K says:
January 31, 05h

Yesterday I posted about a particular sp@mmer that was flooding me… This morning he had made his biggest single attack yet; attempting to post crued sp@m on every post in my blog.

I think I’m going to block c-o-m-c-a-s-t, the entire ISP — they bring me nothing but trouble!

Dave: Enjoy your mini sp@m break while it lasts, ‘cause you know it won’t last ;)

Dave S. says:
January 31, 05h

David — The hyphenation is because I have read more than one claim that comment sp@mmers will take on the challenge of going after anyone who ‘taunts’ them by posting about their lack of the gooey stuff, or the fighting technique, or whatever that’s overly anti-sp@m. How do the bad guys find ‘em? Google, naturally. So that lends some additional hearsay to your observation, anyway.

There’s probably only a slim chance it’s really that effective, but Google likes me enough that I’d still prefer not to risk it.

Dave S. says:
January 31, 05h

Andrew — probably not. Then comments would just go away, unfortunately.

January 31, 05h

I am so glad I don’t have a popular enough website to lure these sp@mmers, I think that might be the only happiness I get from having such a unused site.

There are so many possible solutions to the comment problem, but none provide a relatively frustration-free experience for the visitor. The time when visitors can anonymously post comments may soon be coming to an end, I fear.

So we have some options…
1) CAPTCHAs, which I don’t like.
2) Require user authentication, which I don’t like either.
3) If we’re trying to get rid of bots, a simple cookie writing could solve the problem….I like this.
4) Send an email to the visitor’s email address for verification. They click a link that activates the comment…I also like this.

January 31, 05h

I forgot, another method of alleviating bot sp@m would be checking the referrer of the form post. If the form is not posted from your site, don’t allow it.

13
Paul D says:
January 31, 05h

In order to be efficient, sp@mmers must rely on brute-force, volume and techniques that can be applied en masse to many, many websites. So it would seem that the means of defeating is not to make sp@mming impossible, just unprofitable.

It seems to me that a number of techniques, if combined would accomplish this:

1. Change the address of the comment submission page.

2. Ban frequent or consecutive comments (within reason) from the same IP address.

3. Use a custom or modified CAPTCHA method, perhaps using more than one image. Ensure that these images cannot be retrieved without actually viewing the page that contains them.

4. Use Google’s new anti-referral attribute to keep comment links from being indexed.

Of the above, (3) seems the most difficult, but by no means impossible. The key might be to make your comment system unique in some way to eliminate one-size-fits-all sp@mming techniques.

gavin says:
January 31, 06h

It could be that I don’t fully understand the Google anti-referral thingy but it seems that it won’t stop the stupid spiced-hammers (it appears the stupid ones are the topic here) — they will keep doing it, oblivious of the lack of referrals.

Gavin says:
January 31, 06h

It certainly is annoying, I’ve been trying to figure out a solution to it for a long time, and while the process of validating every comment would be too tedious, flagging ‘suspicious’ comments (and keeping them hidden until validated) is the most efficient way I’ve found.

My site stores a long list of IP’s and domain names associated with sp@mmers (usually get around ~10 uniques per day) and any comment containing a predefined ‘bad word’, more than a certain number of hyper links, <h1> or <b> tags used more than a certain number of times and of course a few other methods of weeding out sp@m comments (usually looking at ~20 a day depending on the ‘attack’). Still needs some fine tuning, but is working not too badly at this stage.

16
Ottawa says:
January 31, 07h

CAPTCHAs don’t have to be bad and complicated. You can setup a simple CAPTCHA, even a text one. Sp@mmers will not go out of their way to try and overcome your CAPTCHA. They do mass sp@mming and don’t check whether YOUR particular blog has a comment. If it didn’t work – it didn’t work. End of story.

You can even setup a simple text CAPTCHA. Something like “What is 4+7?”

Joel says:
January 31, 08h

Horrible about the threats, Dave. =\
Incredibly unjustified.

It may or may not help, but as you know, most ISPs have a legal department if the IPs were within a consistent range.

I’m not sure about the overall intelligence of people like that in the greater blogosphere, but in the niches (mostly livejournal and derivatives), said negative individuals aren’t bright enough to mask their IPs.

I’ve made a few calls to ISPs before, and they’re usually pretty good about consistent abuse.

But, MT Commenting is just a mess to manage. I spent more time deleting junk than working on or enjoying my own site, so I just pulled it. =\

Better luck and best wishes!=)
~ Joel

Josh says:
January 31, 09h

I know you’re trying to avoid hits with your fancy spacing, but Google removes single letter hyphenation and spacing like that.

http://www.google.com/search?q=%22Find+out+what+it+means+to+me%22+site:lyricalcontent.com

http://www.google.com/search?q=i+let+u+b+u

February 01, 01h

I suggested on another site that by changing the name of the form fields, the script file name and then encode the entire form would prevent sp@mmers from finding it again. Just like we can encode email addresses.

Discussion:
http://www.maadmob.net/donna/blog/archives/000597.html

Example:
http://www.baekdal.com/x/encode.html (points to Donna’s blog so please do not submit data)

I am however just theorizing, I do not know if it would actually work.

20
John Fairley says:
February 01, 01h

Here’s an interesting ook at the breaking of CAPTCHAs:

http://www.cs.berkeley.edu/~mori/gimpy/gimpy.html

Short of moderating all comments there;s not much you can do, I’m not aware of any system that effictively removes sp@m 100% and leaves good email/comments alone.

Oli says:
February 01, 01h

Another sp@mming favourite at the moment is ref-ferer sp@mming.
Im receiving lots of hits from completely unrelated sites, a bit of detective work and I find a site selling software that does this for you, Quite what they think it achives I dont know and im not going to help them by giving out the url here.

February 01, 03h

My site only gets a handful of visitors a day, so I haven’t yet faced the comment-sp@m problem in a major way.

It’s possible that cookie-validation (#2) could be combined with javascript encoding (#20), such that the only ones for whom commenting is inaccessible are those with cookies AND javascript disabled.

On the view-article page, set the cookie. Server side, on the request for the comment-page, check if the cookie’s set. If it is, show the plaintext form, otherwise, show the javascript encoded form.

February 01, 04h

The mechanism I’ve proposed above attempts to solve the ‘robot using your form’ problem. The other problem is ‘robot directly hitting the submit script with POST data’.

The solution to that one seems obvious to me — generate a one-time-ticket that’s submitted as a hidden field with each form. Have a simple table in your database that holds all one-time-tickets and removes them when they’re used. Simply timestamp each one and clear it out if it’s unused after 24 hours.

Even if a user submits a comment with an invalid ticket, you can just return them to the comment page with their comment already in the box, and request that they resubmit it (with a fresh ticket).

February 01, 04h

In response to Justin up at #12: an automated sp@m tool can put whatever it wants in the HTTP_REFERER header. I would imagine somebody’s already figured out that it should be the page the form was originally found on.

I have a friend who studies his server logs assiduously to see which search terms drive people to his (commercial) site; I’ve been thinking of setting something up to hit his site over a period of a few days with a fake Google referer header with all kinds of bizarre search terms… but luckily I don’t have too much free time :-)

25
Angus says:
February 01, 05h

I’m disappointed. I don’t have com/ments enabled on my blog(s), but I get a certain number of hits on the comm/ent script anyway (I use MT, so the name is well-known). I finally substituted the default script with one of my own, just so I could see who was trying to inject their sp@m, but since I did that not one sp@mmer has even tried to sp@m me.

I know what I have to do to get an email address bombarded with sp@m: what do I have to do to lure com.ment sp@mmers down to my honeypot?

February 01, 07h

Standard image CAPTCHAs are indeed inaccessible. Other CAPTCHAs don’t have to be.

I wrote an accessible CAPTCHA package for WordPress: http://meyerweb.com/eric/thoughts/2005/01/24/wp-gatekeeper/ . Also, there’s recently been a proposal for accessible image CAPTCHAs: http://www.standards-schmandards.com/index.php?2005/01/01/11-captcha .

Of course, none of this addresses the more fundamental concern over whether CAPTCHAs are a good idea or not. There are arguments to be made either way.

February 01, 08h

Nick,
Yes I agree the referrer can be spoofed, but this is a huge improvement over allowing anyone to post to your scripts. I don’t have enough experience the say whether most link sp@mmers already spoof the referrer or not.

There are other methods that go hand-in-hand with only allowing pages from your own site to post, and that is set up some type of user session. Give the user session a key (store it on the server, not client), then when the form data is posted authenticate the user against this session key. Basically, the only way to spoof that would be browser emulation, and I *know* most sp@mmers don’t do that (yet). ASP.NET does this by default, it stops a lot of potential sp@m from making it into the forms I work with.

28
John B says:
February 01, 09h

Novice alert…

Anyway, over on Autoblog.com (and I’m sure other sites) they require an e-mail confirmation for posting comments. It’s cumbersome as hell, but would something like that work? Seems to be working there. The scumbags would have to use a real e-mail address, but even if they automated the confirmation process, perhaps that would give a basis for filtering out the comment sp@m.

I dunno, just kicking out ideas.

29
Eric Thompson says:
February 01, 10h

I’ll second Typekey.

January 26, 16h

Im receiving lots of hits from completely unrelated sites, a bit of detective work and I find a site selling software that does this for you.