Feedback

MSN & Bing

We really don't have enemies. It's just that some of our best friends are trying to kill us. Oscar Wilde

MICROSOFT robots - Bing's secret life...

WebMasters let search-engine robots access their site in order to index their pages to get exposure (without exposure, you simply do not exist on the Web).

But if you are a bit technically inclined then you may want to check what search engines are doing when they visit your servers.

This is useful to distinguish between friends and foes:


From: TrustLeap Pierre [xxxxxx@trustleap.com]
Sent: Tuesday, January 19, 2010 1:08 AM
To: Bing Webmaster Center
Subject: MSNbot issues?

Hello MSN WebMaster,

Unlike others (CPAN, OpenWatcom, etc.) I do not complain: 
my server happily copes with extra loads (and timeouts) 
so your robots are not a threat to our operations.

However, I have 4 very simple questions for you:

1) why your robots are trying to repeatedly fetch files
   that just can't exist (such as  GET /' )?

2) why, despite several robots working concurrently
   every day of the week, links and pages that have
   been removed from the Web site for months are still
   queried, pointlessly inflating 404 error traffic?

3) why a large ZIP or PDF file is fetched by the same
   robot many times at one or two seconds of interval
   for days?

4) why robots need to open TCP connections and do
   nothing for increasing delays (accept timeouts),
   or start sending a request but never finish it (read
   timeouts), or start fetching the server reply but
   never complete it (send timeouts), or do it in a so
   slow manner that it is ridiculous (slow timeouts)?

Failure to explain the legitimate needs for behaviors
such as those describbed above will inevitably push
Web site owners to forge their own opinions on the
matter -and this might not benefit to your company.

An official position from your organization is more
than welcome to dismiss any misplaced doubt.

Best regards,

Pierre. 

From: "Sultan Abdul Kader" [skader@microsoft.com]
To: "TrustLeap Pierre" [xxxxxx@trustleap.com]
Sent: Tuesday, January 19, 2010 10:12 PM
Subject: RE: MSNbot issues?

Dear Pierre,

Thank you very much for contacting us.  I will try to 
get the information you requested as soon as possible.

The questions you have span different teams within Bing, 
so I am going to be collecting the answers and getting 
back to you. 

While we are investigating, I will appreciate if you 
can help us with information from your server logs.

If you have any other questions, feel free to contact me.

Regards,
Sultan

From: "TrustLeap Pierre" [xxxxxx@trustleap.com]
To: "Sultan Abdul Kader" [skader@microsoft.com]
Sent: Wednesday, January 20, 2010 9:37 AM
Subject: Re: MSNbot issues?

Hello Sultan,

Our domain names (trustleap.* and gwan.*)
are all mapped to one single IP address:
87.106.145.65

The involved robots are using the following
MICROSOFT networks:

[points #1-4]:
65.52.0.0 - 65.55.255.255

[point #2 only]:
207.46.0.0 - 207.46.255.255

You will find many of the 4 points described
previously in your activity logs.

Best regards,

Pierre.

From: TrustLeap Pierre [xxxxxx@trustleap.com] 
Sent: Thursday, January 21, 2010 2:02 AM
To: "Sultan Abdul Kader" [skader@microsoft.com]
Subject: Re: MSNbot issues?

Hello Sultan,

I forgot this network:

[point #4]:
131.107.0.0 - 131.107.255.255

Best regards,

Pierre.

From: "TrustLeap Pierre" [xxxxxx@trustleap.com]
To: "Sultan Abdul Kader" [skader@microsoft.com]
Sent: Wednesday, February 10, 2010 9:25 AM
Subject: Re: MSNbot issues?

Hello Sultan,

> The first two issues are already identified by us 
> and we will be fixing them soon. 

Should I assume that these were considered as
"bugs" by MICROSOFT?

If so, can we now assume that as they were fixed,
these odd behaviors will no longer happen?

> Regard to the issue (3), can you let us know what 
> is the size of these zip files. 

ZIP: ~400 KB
PDF: 100KB-3.1MB

> Can you provide us an IP address of the robot 
> that is causing the issue (4)?

The list below may have duplicates, given its size, 
I did not bother to sort it by IP address (it is sorted
by date/time):

65.55.165.52 65.55.165.85 65.55.230.219 65.55.211.137 
65.55.208.203 65.55.211.125 65.55.110.170 65.55.165.14 
65.55.104.158 65.55.208.212 65.55.107.203 65.55.110.77 
65.55.107.211 65.55.109.108 65.55.230.224 65.55.110.90 
65.55.110.162 65.55.109.24 65.55.109.195 65.55.109.82
65.55.110.148 65.55.109.132 65.55.109.192 65.55.207.53 
65.55.110.17 65.55.110.20 65.55.109.188 65.55.110.184 
65.55.110.190 65.55.109.66 65.55.104.28 65.55.232.13 
65.55.110.233 65.55.207.53 65.55.110.208 65.55.109.155 
65.55.110.147 65.55.106.204 65.55.109.121 65.55.109.141
65.55.109.160 65.55.110.202 65.55.109.145 65.55.110.66 
65.55.104.27 65.55.110.64 65.55.109.10 65.55.109.86 
65.55.109.64 65.55.107.191 65.55.110.236 65.55.109.232 
131.107.0.75 131.107.0.103

> We do really appreciate your information.

I would really appreciate answers to my questions.

Best regards,

Pierre.

> Regards,
> Sultan

Seven months later, MICROSOFT still did not explain why it uses its robots to disrupt the Web sites of its competitiors with the kind of time-out attacks that kill Apache and IIS (yes, IIS too, thanks to a 120-second 'after accept' time-out).

While GOOGLE robots never attacked http://gwan.ch, MICROSOFT robots never stopped their attacks – even after my emails.

Would the practice be done by TrustLeap at the expense of the World Leader of Software, I would be in jail overnight.

I am not sure that using these Double-Standards at every possible occasion benefits to 'fair competition', nor to end-users...


Now, what about Cyveillance? (another MICROSOFT 'stragetic partner')

Cyveillance, based in Washington D.C., started to attack gwan.com in July 2009, 3 days only after G-WAN v1.0 was publicly released.

And Cyveillance sent hand-crafted attacks using the 3-day old G-WAN API (not just out-of-the-shelves junk requests):

38.100.41.107 "/source/xbuf_frurl(&buf,          " 404 251
38.100.41.107 "/source/20;" 404 251
38.100.41.107 "/source/0;" 404 251
38.100.41.107 "/source/imgs/imgs/.../imgs/errors.css" 404 251
38.105.83.11 "/source/xbuf_frurl(&buf,    " 404 251
38.105.83.11 "/source/0,    " 404 251
38.100.8.50 "/?rate=" 404 251
38.100.8.50 "/?amount=" 404 251
38.100.8.50 "/?term=" 404 251
38.100.8.50 "/?csp?loan&name=" 404 251
...

And the very same unsuccessful attacks come on a regulary basis. Why? They are used as a cover channel to send (really dangerous) timeout attacks – the same deadly attacks that MICROSOFT Bing and MSN robots are using under the cover of similar 'benign' junk traffic to crash Apache Web servers.

Its mission statement (proudly published by microsoft.com) tells it all:

"Cyveillance proactively eliminates threats enabling its customers to preserve their reputation and revenues".

No wonder why Cyveillance says MICROSOFT is its largest customer:

With $6 billion invested in R&D each year, MICROSOFT is in serious need for 'protection' if it has to break international laws to attack challengers like G-WAN, a free product developed in one single year by a single person – without any budget.