Archive for the Internet Category

Make it hide or searchable

Posted in Internet with tags , , on November 17, 2006 by wsjoung

Some people are concerned about their personal or some information which they want to hide. Even if they do, sometimes they up-load those information on the public web space. Okay, there is a way to hide those information from the search engine or robot.
Currently most search robots don’t support Meta tags but, there is another thing for robots. “robots.txt” this file gives a direction to the robots which are trying to search your web site. Which directories or files are allowed or not allowed for searching.
There are some examples.

To exclude all robots from the entire server
User-agent: *
Disallow: /

To allow all robots complete access
User-agent: *
Or create an empty “/robots.txt” file.

To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/

To exclude a single robot
User-agent: BadBot
Disallow: /

To allow a single robot
User-agent: WebCrawler

User-agent: *
Disallow: /

To exclude all files except one
This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “docs”, and leave the one file in the level above this directory:

User-agent: *
Disallow: /~joe/docs/

Alternatively you can explicitly disallow all disallowed pages:
User-agent: *
Disallow: /~joe/private.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html