Go Back   Webmaster Malaysia Forum » Website Marketing and Promotion » Search Engine Marketing

Reply
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 24-09-2007, 05:59 PM
Novice Webmaster
 
Join Date: Apr 2004
Location: Miri, Sarawak
Posts: 93
Rep Power: 58
GeminiGeek is on a distinguished road
Send a message via MSN to GeminiGeek Send a message via Skype™ to GeminiGeek
Can I prevent Search Engine from crawling into my subdomain?

I plan to create a subdomain where I have the exact copy of my live and working site, because I plan to play with Wordpress 2.3 before doing upgrade to my live site, and probably will start playing with Wordpress SVN. Now, if I do an exact copy of content, that means that the search engine will start penalizing my site. So, what can I do to prevent any bots, crawlers or spider from crawling to this copy of my site?

Thanks in advance. I do plan to play around in a local environment, but I would prefer to test it in a Linux environment e.g. my webhost.
__________________
Don't Ask Me Why I Joined This Forum
GeminiGeek's Online Journal
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 24-09-2007, 06:44 PM
Senior Webmaster
 
Join Date: Apr 2007
Location: Penang
Posts: 479
Rep Power: 29
yonghs is on a distinguished road
Since the subdomain is essentially a folder.....how about putting a robots.txt file in the domain root with the following content:

User-agent: *
Disallow: /subdomainfolder/

OR use htaccess to password protect the folder? http://www.javascriptkit.com/howto/htaccess3.shtml

Last edited by yonghs; 24-09-2007 at 06:49 PM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 24-09-2007, 07:49 PM
mysticmind's Avatar
Super Moderator
 
Join Date: Jun 2001
Location: Mystic Kingdoms
Posts: 2,680
Rep Power: 145
mysticmind will become famous soon enough mysticmind will become famous soon enough
Send a message via Yahoo to mysticmind
User-agent: *
Disallow: all
__________________
Personal's Blog! - Malaysian Artist!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 24-09-2007, 09:25 PM
Novice Webmaster
 
Join Date: Apr 2004
Location: Miri, Sarawak
Posts: 93
Rep Power: 58
GeminiGeek is on a distinguished road
Send a message via MSN to GeminiGeek Send a message via Skype™ to GeminiGeek
Quote:
Originally Posted by yonghs View Post
Since the subdomain is essentially a folder.....how about putting a robots.txt file in the domain root with the following content:

User-agent: *
Disallow: /subdomainfolder/

OR use htaccess to password protect the folder? Comprehensive guide to .htaccess- password protection
Oh! I forgot about robots.txt!! Hah. I have it to control the spider from crawling to my feeds... htaccess is definitely not a good idea because I am lazy to login every time I access the subdomain.

Quote:
Originally Posted by mysticmind View Post
User-agent: *
Disallow: all
And I am not suppose to put that in robots.txt in domain root, right? That would be disallowing the search engine to crawl in my domain right?

Just wondering how effective this robots.txt is. Is it a Google-only tool, or all crawlers and spiders read robots.txt as well? If I am not mistaken, the link rel=nofollow applies to google only, so I am wondering if robots.txt is the same.
__________________
Don't Ask Me Why I Joined This Forum
GeminiGeek's Online Journal
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 24-09-2007, 09:54 PM
Senior Webmaster
 
Join Date: Apr 2007
Location: Penang
Posts: 479
Rep Power: 29
yonghs is on a distinguished road
Quote:
Originally Posted by GeminiGeek View Post
Oh! I forgot about robots.txt!! Hah. I have it to control the spider from crawling to my feeds... htaccess is definitely not a good idea because I am lazy to login every time I access the subdomain.


And I am not suppose to put that in robots.txt in domain root, right? That would be disallowing the search engine to crawl in my domain right?

Just wondering how effective this robots.txt is. Is it a Google-only tool, or all crawlers and spiders read robots.txt as well? If I am not mistaken, the link rel=nofollow applies to google only, so I am wondering if robots.txt is the same.
The robots.txt is a standard that all major(proper) search engines should comply .... unless of course there are smaller rogue search engines that may choose to ignore the file. The most effective is still htaccess password protect

It should be placed at the ROOT. You just state the folder to disallow in the robots.txt file.

http://www.maindomain.com/robots.txt

User-agent: *
Disallow: /subfolder/


*****************************************
The above works nicely if you want to block indexing of a folder... but since you are using subdomain http://subdomain.maindomain.com .............. you might put another robots.txt inside the folder http://subdomain.maindomain.com/robots.txt ............... but this time i'm not sure what should be the content???? Is it as below:

User-agent: *
Disallow: all


So, if you use subdomain. You may want to use 2 robots.txt files to be sure. Pease confirm, as I've not used the combination of robots.txt AND subdomain before.

Last edited by yonghs; 24-09-2007 at 10:02 PM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 25-09-2007, 10:26 AM
Novice Webmaster
 
Join Date: Apr 2004
Location: Miri, Sarawak
Posts: 93
Rep Power: 58
GeminiGeek is on a distinguished road
Send a message via MSN to GeminiGeek Send a message via Skype™ to GeminiGeek
I googled and found this, and it should answered all the questions:
How do I use a robots.txt file?

That means I only have to put rules in robots.txt for both domain root and subdomain root. Thanks for the answers peoples
__________________
Don't Ask Me Why I Joined This Forum
GeminiGeek's Online Journal
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 25-09-2007, 12:27 PM
iamfreelancer's Avatar
Vibrate your Brain Please
 
Join Date: Sep 2005
Location: in my body lar...
Posts: 1,249
Rep Power: 65
iamfreelancer will become famous soon enough iamfreelancer will become famous soon enough
alternatively... if you are using some sort of template ... you can direct insert the meta code below to your html header

HTML Code:
    <META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">	
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 25-09-2007, 11:01 PM
mysticmind's Avatar
Super Moderator
 
Join Date: Jun 2001
Location: Mystic Kingdoms
Posts: 2,680
Rep Power: 145
mysticmind will become famous soon enough mysticmind will become famous soon enough
Send a message via Yahoo to mysticmind
google bot respect the robot.txt
but yahoo slurp etc is SO SO..
__________________
Personal's Blog! - Malaysian Artist!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 29-09-2007, 12:33 PM
Moderator
 
Join Date: Jun 2001
Location: BTHO
Posts: 750
Rep Power: 106
MENJ is on a distinguished road
Send a message via ICQ to MENJ
You can block spiders with a line in robots.txt or use the function in WP which disallows spiders to index the blog.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 07-10-2007, 10:57 PM
Novice Webmaster
 
Join Date: Apr 2004
Location: Miri, Sarawak
Posts: 93
Rep Power: 58
GeminiGeek is on a distinguished road
Send a message via MSN to GeminiGeek Send a message via Skype™ to GeminiGeek
I use all the methods here listed, and so far I think that subdomain isn't indexed. How do I check if my subdomain is indexed?
__________________
Don't Ask Me Why I Joined This Forum
GeminiGeek's Online Journal
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #11 (permalink)  
Old 08-10-2007, 08:08 AM
Senior Webmaster
 
Join Date: Apr 2007
Location: Penang
Posts: 479
Rep Power: 29
yonghs is on a distinguished road
Copy about one sentence of text from your subdomain's content .. or the title's text and search in Google/Yahoo. See if the search results shows the url of your subdomain or not.

Or type this in Google search >>> inurl:subdomain.yourdomain.com ... and see if any of the sudomain pages are shown.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 03-12-2007, 02:07 AM
mac mac is offline
Novice Webmaster
 
Join Date: Feb 2004
Location: KL
Posts: 37
Rep Power: 0
mac is on a distinguished road
If you don't link to your subdomain it won't be spidered anyway.

Not all spiders respect robots.txt, but put a "disallow:all" in the root of the subdomain because a subdomain is seen as a separate website, and also in the main domain as disallow: /subdomaindirectory/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search Engine korbins Website Programming 3 18-11-2007 06:38 PM
search engine in PERL/CGI kulakudin Website Programming 4 07-07-2007 07:18 PM
Which is your most famous search engine? ksstudio Mamak Stall 17 25-04-2004 10:46 PM
search engine question... BanditLeader Mamak Stall 14 28-02-2004 12:01 PM
Search Engine hymns Website Programming 0 13-10-2002 02:27 PM



All times are GMT +8. The time now is 11:58 PM. Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.1.0 vBulletin skin by ForumMonkeys.com.


WebmasterMalaysia.com is Proudly Hosted by Exabytes Semi Dedicated Server.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60