|
New msn robot spotted today
MSNPTC user-agent |
jdMorgan
#:401118
| 4:16 am on July 4, 2004 (utc 0) |
The user-agent string contains no link or contact address to get more info, but this is a legitimate MS IP address. 131.107.x.xx - - [03/Jul/2004:18:57:43 -0400] "GET / HTTP/1.0" 403 862 "-" "MSNPTC/1.0" I
hope it's not an important robot, because unknown 'bots are blocked
from this particular site to avoid abuse, and they just get a 403. Robot
authors/users: Please provide contact info, a link, and/or a meaningful
user-agent name (MSN proxy tester/checker?), you know, like Google
does... Thanks. Jim
|
volatilegx
#:401119
| 1:25 pm on July 5, 2004 (utc 0) |
Jim, can you provide the complete IP address, please? That is now allowed on this heavily moderated forum :)
|
bobothecat
#:401120
| 1:47 pm on July 5, 2004 (utc 0) |
I've seen this several times as well, though using a different IP: 207.46.238.143 - - [30/May/2004:16:26:30 -0400] "GET / HTTP/1.0" 200 20684 "-" "MSNPTC/1.0" *added* Never did check for robots.txt
|
Alternative Future
#:401121
| 2:00 pm on July 5, 2004 (utc 0) |
131.107.3.84 - - [02/Jul/2004:18:34:07 -0400] "GET / HTTP/1.0" 200 32079 "-" "MSNPTC/1.0" Confirmed not checking for robots.txt -George
|
mat
#:401122
| 2:41 pm on July 5, 2004 (utc 0) |
Also seen it on 207.46.228.98
|
Staffa
#:401123
| 3:56 pm on July 5, 2004 (utc 0) |
From
207.46.228.98 has visited 12 times since early May, calling one page
per visit and alternating between only two different pages - Never
asked for robots txt. If it comes a few more times without changing it's behaviour it'll get whacked.
|
jdMorgan
#:401124
| 5:10 am on July 7, 2004 (utc 0) |
> Jim, can you provide the complete IP address, please? I had it visit from 131.107.3.84 and it did not fetch robots.txt on my sites, either. It fetched only the root index page. Sorry for the slow response, I "timed out" on the premod, and forgot to re-check the thread. Jim
|
Leosghost
#:401125
| 11:36 am on July 7, 2004 (utc 0) |
Doubtless M$ are in the process of implementing their own proprietary
robots .txt and true to form will be ignoring all others ...send them a
bill for bandwidth?
|
volatilegx
#:401126
| 1:12 pm on July 7, 2004 (utc 0) |
Any reason to think this bot has anything to do with the new MSN search engine that's in beta?
|
robotsdobetter
#:401127
| 1:45 pm on July 7, 2004 (utc 0) |
I seen it too at my site today and three days ago. I did some searching on the web for it, but found nothing. Is it the MSN newsbot that's in BETA because I have yet to see that one?
|
volatilegx
#:401128
| 6:29 pm on July 7, 2004 (utc 0) |
I believe the MSN beta search spider is named "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"
|
wilderness
#:401129
| 2:44 pm on July 19, 2004 (utc 0) |
Jim, The new UA and bot was in my morning logs. I've
had a mail into Microsoft for over a week awaiting a response on some
type of expalantion as to the why of there two bots continuiously
crawling my sites simultaneously. (I'm glad I'm not holding my breath
in anticipation of their reply.) Add this third to the mix and
from the IP's which the orignal MS bot started crawing from
unidentified (see old old thread)and it still leaves plenty of
questions conerning MS methods and use unanswered. MS has not really
udpated their search capability while on these crawls are ongoing, in
fact I read an article last week where the present status of improvemnt
or change in the MSN search plans were put on hold. As a result,
as websmasters, we are no better off or informed of their use (or
crawling) of our data than when they began either in 2002 or 2003 (I
don't recall.) (there
are two very extensive threads in the archives when MS first began
crawling. One I recall being in the 15-20 page range.)
|
volatilegx
#:401130
| 3:26 pm on July 30, 2004 (utc 0) |
also seen this UA crawling from 131.107.3.74
|
BillyS
#:401131
| 3:19 am on Aug. 10, 2004 (utc 0) |
This one
just hit my logs tonight. I saw msnbot nearly at the same time too, so
maybe the robots.txt is relayed somehow. I allow msnbot. 207.46.238.142 - - [09/Aug/2004:22:29:02 -0400] "GET / HTTP/1.0" 200 22175 "-" "MSNPTC/1.0" Took only the one page.
|
wilderness
#:401132
| 4:34 am on Aug. 13, 2004 (utc 0) |
A follow up on this. I
have some directories and pages on my largest site that have been in
existence some five and half years. These folders and page names have
mixed case names from when I began and didn't know other wise.
Over time I've been able to utilize the wrong case spidering of some of
these directories and pages in either identifying unknown, malicious or
even badly programmed bots. These malicious bots will freqently
only visit a solitary page without reading robots and in most instances
will have a void in either the referal or ua. Upon detailing the visit for my own records it is either referred to as a "snoop or probe." I
suppose it's entirely posssible MSN has begun with a badly programmed
bot? (and after all their research and perhaps two years worth of Mr.
Gates money?) I'm more inlcined to believe that this bot however is a fake.
Add to this that MSN is filling my logs daily and simultaneoulsy from a
variety of bots and IP's with little chance of daybreak or benefit and
I'm not a happy camper. I've
only had two visits from this UA. One in July and another in August.
Two of the three pages crawled were 404's as a result of case errors.
Robots.txt was not read nor were there any referrals. July 207.46.238.143 - - [18/Jul/2004:21:58:52 -0700] "GET /folder/mypage.html HTTP/1.0" 404 - "-" "MSNPTC/1.0" 131.107.3.84 - - [18/Jul/2004:21:58:52 -0700] "GET /OthwerFolder/anotherPage.html HTTP/1.0" 200 31872 "-" "MSNPTC/1.0" August 207.46.238.142 - - [12/Aug/2004:19:10:21 -0700] "GET /folder/differentPage.htm HTTP/1.0" 404 - "-" "MSNPTC/1.0"
|
jdMorgan
#:401133
| 5:10 am on Aug. 14, 2004 (utc 0) |
> I'm more inlcined to believe that this bot however is a fake. If it's a fake, then MSN must have an open proxy -- that 207.46/16 IP range resolves to Microsoft. I
suppose it's possible, though. I had a badly-behaved 'bot show up from
a leading anti-virus company one time. I emailed them, and their
network admin replied that he shut it down because it was an
unauthorized employee project -- I was surprised and grateful for the
forthright reply. Jim
|
volatilegx
#:401134
| 7:53 pm on Aug. 27, 2004 (utc 0) |
I'm seeing this bot coming from dialup IPs owned by Microsoft. I'm wondering if it is some kind of internet accellerator, filter, or similar?
|