Taming the robots
Wednesday May 28, 2003 – 11:52 amI just deleted a post I made a few weeks ago which contained a list of jokes of a non-politically correct nature. Because now in my stats it shows that the current top search term used to access this site (from anywhere) is not my name, but the term "Jokes About XXXX." (XXXX being the term I want to avoid placing into the search engines!) I’ve gotten around 300 hits from people looking for that exact term, and even more hits from other variations. It’s not the kind of traffic I want to attract, so I just removed the post. I still think the jokes are funny though. A better solution would be for me to more closely examine the logic behind the robots.txt file, and how to tell the SE’s that you don’t want certain individual pages crawled. I only know how to exclude them from whole directories.







May 29th, 2003 at 12:59 am
You disallow certain files the same way you disallow whole directories. Just use this syntax:
Disallow /directory/filename.html
or
Disallow /filename.html (for URLs on the root)
Hope this helps.
May 29th, 2003 at 1:14 am
THANKS, I’ll try that. I’d rather do it the legit way instead of having to delete posts that are attracting too much attention. Especially since I run my own cgi search engine — if people REALLY want to find something they should look there anyway instead of trusting what Google says.
May 29th, 2003 at 8:30 am
You probably figured this out, but I forgot the colon after "Disallow" in the example I sent you. Sorry ’bout that.
Great blog, BTW, and great music. If you get a chance and want to listen to some good indie label stuff, check out http://www.ibrecords.com. If your own music is any indication, you’d like e_o’s album, in particular.
May 29th, 2003 at 8:53 am
I did figure it out, thanks. Doesn’t it suck how the syntax is different for every frickin’ thing you have to do in this world?