Wednesday, March 7, 2007

Nasty Google Spider

a spider is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. the major search engines on the Web all have such a program, which is also known as a "crawler" or a "bot" Spiders are typically programmed to visit sites that have been submitted by their owners as new or updated. entire sites or specific pages can be selectively visited and indexed. spiders are called spiders because they usually visit many sites in parallel at the same time, their "legs" spanning a large area of the "web" spiders can crawl through a site's pages in several ways. One way is to follow all the hypertext links in each page until all the pages have been read.


google spider

google has 2 spiders so far --->

    the normal google spider: 66.249.64.47 - "GET /robots.txt HTTP/1.0" 404 1227 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

    the additional google spider: 66.249.66.129 - "GET / HTTP/1.1" 200 38358 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"



please tell me the difference between these two google spiders

the new Google spider uses a slightly different user agent: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)".

this means that Googlebot now also accepts the HTTP 1.1 protocol. The new spider might be able to understand more content formats, including compressed HTML.


why google does that?

google hasn't revealed the reason for it yet.

but there are two main theories:

  • the first theory is that google uses the new spider to spot web sites that use cloaking, javascript redirects and other dubious web site optimization techniques. as the new spider seems to be more powerful than the old spider, this sounds plausible.


  • the second theory is that google's extensive crawling might be a panic reaction because the index needs to be rebuilt from the ground up in a short time period. the reason for this might be that the old index contains too many spam pages.



ok, i get it, but what the hell does that mean to my blog/webpage?

if you use questionable techniques such as cloaking or javascript redirects, you might get into trouble. if google really uses the new spider to detect spamming web sites, it's likely that these sites will be banned from the index.

to obtain long-term results on search engines, it's better to use ethical search engine optimization methods. general information about Google's web page spider can be found here.

6 comments:

outlaw said...

though you're only 13 yrz old i have no idea why you talk like a geek :P (no offence) you're still young to be talking about pc's and spiders and coding,,, enjooooy your lifeeeeeeee

Kitty said...

this is my life

Anonymous said...

Well I was going to say the same thing. I don't think I came across a Kuwaiti female geek in my life. Reading your profile, you are a female, from Kuwait, geek, and 13!

if i was only 14 and still single, i would have proposed! Are you really 13? no offense :)

Anonymous said...

OK, I can see you've been cheating on few things :)

First, this is the original post you quoted from:
http://www.articlealley.com/article_130814_6.html

second, your IP resolves to Kuwait and I don't assume you should be out of school right now :)

Kitty said...

*if you read my blog well you know i already have a boyfriend :P

*and if you clicked on the post title you would know that i already linked to that page ^_^

* and if that maks you feel better, i am not 13 and i am not in france, i removed my age from my profile, maybe i will remove my location as well xD

*and one last thing, i'm a hord, not a geek 8-)

Anonymous said...

I'm not gonna read your whole blog to know you have a boyfriend. You are supposed to have a flash sign at the top saying so!

Post title linking to original post: Was not obvious to me :)

Well, you solved my mystery. I still can' find a kuwaiti female geek, and a 13 years old girl does not know more than me. Anyways, nice to see people know about those things. I think half my professors won't know what you posted.