Robots and Spiders
Robots/spiders list http://info.webcrawler.com/mak/projects/robots/robots.html
Robot/spider
meta tag commands:
The Robots META tag is a simple mechanism to indicate to visiting Web Robots if
a page should be indexed, or links on the page should be followed.
Note: Currently only few robots support this tag!
Where to put the Robots META tag
Like any META tag it should be placed in the HEAD
section of an HTML page:
<html>
<head>
<meta name="robots" content="noindex, nofollow">
<meta name="description" content="This page ....">
<title>...</title>
</head>
<body>
...
What to put into the Robots META tag
The content of the Robots META tag contains directives separated by commas.
The currently defined directives are [NO] INDEX and [NO] FOLLOW.
The INDEX directive specifies if an indexing robot should index the page.
The FOLLOW directive specifies if a robot is to follow links on the page.
The defaults are INDEX and FOLLOW.
The values ALL and NONE set all directives on or off:
ALL=INDEX, FOLLOW and NONE= NOINDEX, NOFOLLOW.
Some examples:
<meta name="robots" content="index,follow">
<meta name="robots" content="noindex,follow">
<meta name="robots" content="index,nofollow">
<meta name="robots" content="noindex,nofollow">
Note the "robots" name of the tag and the content are case
insensitive.
You obviously should not specify conflicting or repeating
directives such as:
<meta name="robots" content="INDEX,NOINDEX,NOFOLLOW,FOLLOW,FOLLOW">
A formal syntax for the Robots META tag content is:
content = all | none | directives
all = "ALL"
none = "NONE"
directives = directive ["," directives]
directive = index | follow
index = "INDEX" | "NOINDEX"
follow = "FOLLOW" | "NOFOLLOW"
spiderhunter.com/spiderlist/
list of 358 known spiders. At spiderhunter.com find spider name and IP address
from Altavista, inktomi.com, hotbot, google, northernlight, lycos, excite and
infoseek.
http://www.tardis.ed.ac.uk/~sxw/robots/botwatch.html
BotWatch is a short perl script
that analyses log files (in either the Common, or NCSA Extended log file
formats) and produces an HTML page reporting on the robots seen.
|