http://www.webmasterworld.com Welcome to WebmasterWorld Guest from 78.94.98.34
register, login, search, glossary, subscribe, help, library, conference, recent posts
PubCon Media Sponsor:
Home / Forums Index / The Search Engine World / Search Engine Spider Identification
Forum Library : Charter : Moderators: volatilegx

Search Engine Spider Identification

  
New msn robot spotted today
MSNPTC user-agent
jdMorgan


#:401118
 4:16 am on July 4, 2004 (utc 0)

The user-agent string contains no link or contact address to get more info, but this is a legitimate MS IP address.

131.107.x.xx - - [03/Jul/2004:18:57:43 -0400] "GET / HTTP/1.0" 403 862 "-" "MSNPTC/1.0"

I hope it's not an important robot, because unknown 'bots are blocked from this particular site to avoid abuse, and they just get a 403.

Robot authors/users: Please provide contact info, a link, and/or a meaningful user-agent name (MSN proxy tester/checker?), you know, like Google does... Thanks.

Jim

volatilegx


#:401119
 1:25 pm on July 5, 2004 (utc 0)

Jim, can you provide the complete IP address, please? That is now allowed on this heavily moderated forum :)

bobothecat


#:401120
 1:47 pm on July 5, 2004 (utc 0)

I've seen this several times as well, though using a different IP:

207.46.238.143 - - [30/May/2004:16:26:30 -0400] "GET / HTTP/1.0" 200 20684 "-" "MSNPTC/1.0"

*added* Never did check for robots.txt

Alternative Future


#:401121
 2:00 pm on July 5, 2004 (utc 0)

131.107.3.84 - - [02/Jul/2004:18:34:07 -0400] "GET / HTTP/1.0" 200 32079 "-" "MSNPTC/1.0"

Confirmed not checking for robots.txt

-George

mat


#:401122
 2:41 pm on July 5, 2004 (utc 0)

Also seen it on 207.46.228.98

Staffa


#:401123
 3:56 pm on July 5, 2004 (utc 0)

From 207.46.228.98 has visited 12 times since early May, calling one page per visit and alternating between only two different pages - Never asked for robots txt.

If it comes a few more times without changing it's behaviour it'll get whacked.

jdMorgan


#:401124
 5:10 am on July 7, 2004 (utc 0)

> Jim, can you provide the complete IP address, please?

I had it visit from 131.107.3.84 and it did not fetch robots.txt on my sites, either. It fetched only the root index page.

Sorry for the slow response, I "timed out" on the premod, and forgot to re-check the thread.

Jim

Leosghost


#:401125
 11:36 am on July 7, 2004 (utc 0)

Doubtless M$ are in the process of implementing their own proprietary robots .txt and true to form will be ignoring all others ...send them a bill for bandwidth?

volatilegx


#:401126
 1:12 pm on July 7, 2004 (utc 0)

Any reason to think this bot has anything to do with the new MSN search engine that's in beta?

robotsdobetter


#:401127
 1:45 pm on July 7, 2004 (utc 0)

I seen it too at my site today and three days ago. I did some searching on the web for it, but found nothing.

Is it the MSN newsbot that's in BETA because I have yet to see that one?

volatilegx


#:401128
 6:29 pm on July 7, 2004 (utc 0)

I believe the MSN beta search spider is named "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"

wilderness


#:401129
 2:44 pm on July 19, 2004 (utc 0)

Jim,
The new UA and bot was in my morning logs.
I've had a mail into Microsoft for over a week awaiting a response on some type of expalantion as to the why of there two bots continuiously crawling my sites simultaneously. (I'm glad I'm not holding my breath in anticipation of their reply.)
Add this third to the mix and from the IP's which the orignal MS bot started crawing from unidentified (see old old thread)and it still leaves plenty of questions conerning MS methods and use unanswered. MS has not really udpated their search capability while on these crawls are ongoing, in fact I read an article last week where the present status of improvemnt or change in the MSN search plans were put on hold.
As a result, as websmasters, we are no better off or informed of their use (or crawling) of our data than when they began either in 2002 or 2003 (I don't recall.)

(there are two very extensive threads in the archives when MS first began crawling. One I recall being in the 15-20 page range.)

volatilegx


#:401130
 3:26 pm on July 30, 2004 (utc 0)

also seen this UA crawling from 131.107.3.74

BillyS


#:401131
 3:19 am on Aug. 10, 2004 (utc 0)

This one just hit my logs tonight. I saw msnbot nearly at the same time too, so maybe the robots.txt is relayed somehow. I allow msnbot.

207.46.238.142 - - [09/Aug/2004:22:29:02 -0400] "GET / HTTP/1.0" 200 22175 "-" "MSNPTC/1.0"

Took only the one page.

wilderness


#:401132
 4:34 am on Aug. 13, 2004 (utc 0)

A follow up on this.

I have some directories and pages on my largest site that have been in existence some five and half years. These folders and page names have mixed case names from when I began and didn't know other wise.
Over time I've been able to utilize the wrong case spidering of some of these directories and pages in either identifying unknown, malicious or even badly programmed bots.
These malicious bots will freqently only visit a solitary page without reading robots and in most instances will have a void in either the referal or ua.
Upon detailing the visit for my own records it is either referred to as a "snoop or probe."

I suppose it's entirely posssible MSN has begun with a badly programmed bot? (and after all their research and perhaps two years worth of Mr. Gates money?)

I'm more inlcined to believe that this bot however is a fake.

Add to this that MSN is filling my logs daily and simultaneoulsy from a variety of bots and IP's with little chance of daybreak or benefit and I'm not a happy camper.

I've only had two visits from this UA. One in July and another in August. Two of the three pages crawled were 404's as a result of case errors. Robots.txt was not read nor were there any referrals.

July
207.46.238.143 - - [18/Jul/2004:21:58:52 -0700] "GET /folder/mypage.html
HTTP/1.0" 404 - "-" "MSNPTC/1.0"
131.107.3.84 - - [18/Jul/2004:21:58:52 -0700] "GET /OthwerFolder/anotherPage.html
HTTP/1.0" 200 31872 "-" "MSNPTC/1.0"

August
207.46.238.142 - - [12/Aug/2004:19:10:21 -0700] "GET /folder/differentPage.htm
HTTP/1.0" 404 - "-" "MSNPTC/1.0"

jdMorgan


#:401133
 5:10 am on Aug. 14, 2004 (utc 0)

> I'm more inlcined to believe that this bot however is a fake.

If it's a fake, then MSN must have an open proxy -- that 207.46/16 IP range resolves to Microsoft.

I suppose it's possible, though. I had a badly-behaved 'bot show up from a leading anti-virus company one time. I emailed them, and their network admin replied that he shut it down because it was an unauthorized employee project -- I was surprised and grateful for the forthright reply.

Jim

volatilegx


#:401134
 7:53 pm on Aug. 27, 2004 (utc 0)

I'm seeing this bot coming from dialup IPs owned by Microsoft.

I'm wondering if it is some kind of internet accellerator, filter, or similar?

 

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
WebmasterWorld ® is a Registered Trademark of WebmasterWorld Inc.
© WebmasterWorld Inc. / SearchEngineWorld 1996-2007 all rights reserved