View Single Post
  #5 (permalink)  
Old 24-09-2007, 09:54 PM
yonghs yonghs is offline
Senior Webmaster
 
Join Date: Apr 2007
Location: Penang
Posts: 483
Rep Power: 30
yonghs is on a distinguished road
Quote:
Originally Posted by GeminiGeek View Post
Oh! I forgot about robots.txt!! Hah. I have it to control the spider from crawling to my feeds... htaccess is definitely not a good idea because I am lazy to login every time I access the subdomain.


And I am not suppose to put that in robots.txt in domain root, right? That would be disallowing the search engine to crawl in my domain right?

Just wondering how effective this robots.txt is. Is it a Google-only tool, or all crawlers and spiders read robots.txt as well? If I am not mistaken, the link rel=nofollow applies to google only, so I am wondering if robots.txt is the same.
The robots.txt is a standard that all major(proper) search engines should comply .... unless of course there are smaller rogue search engines that may choose to ignore the file. The most effective is still htaccess password protect

It should be placed at the ROOT. You just state the folder to disallow in the robots.txt file.

http://www.maindomain.com/robots.txt

User-agent: *
Disallow: /subfolder/


*****************************************
The above works nicely if you want to block indexing of a folder... but since you are using subdomain http://subdomain.maindomain.com .............. you might put another robots.txt inside the folder http://subdomain.maindomain.com/robots.txt ............... but this time i'm not sure what should be the content???? Is it as below:

User-agent: *
Disallow: all


So, if you use subdomain. You may want to use 2 robots.txt files to be sure. Pease confirm, as I've not used the combination of robots.txt AND subdomain before.

Last edited by yonghs; 24-09-2007 at 10:02 PM.
Reply With Quote