Quote:
Originally Posted by GeminiGeek
Oh! I forgot about robots.txt!! Hah. I have it to control the spider from crawling to my feeds... htaccess is definitely not a good idea because I am lazy to login every time I access the subdomain.
And I am not suppose to put that in robots.txt in domain root, right? That would be disallowing the search engine to crawl in my domain right?
Just wondering how effective this robots.txt is. Is it a Google-only tool, or all crawlers and spiders read robots.txt as well? If I am not mistaken, the link rel=nofollow applies to google only, so I am wondering if robots.txt is the same.
|
The robots.txt is a standard that all major(proper) search engines should comply .... unless of course there are smaller rogue search engines that may choose to ignore the file. The most effective is still htaccess password protect
It should be placed at the ROOT. You just state the folder to disallow in the robots.txt file.
http://www.maindomain.com/robots.txt
User-agent: *
Disallow: /subfolder/
*****************************************
The above works nicely if you want to block indexing of a folder... but since you are using subdomain
http://subdomain.maindomain.com .............. you might put another robots.txt inside the folder
http://subdomain.maindomain.com/robots.txt ............... but this time i'm not sure what should be the content???? Is it as below:
User-agent: *
Disallow: all
So, if you use subdomain. You may want to use 2 robots.txt files to be sure. Pease confirm, as I've not used the combination of robots.txt
AND subdomain before.