Re: SOP for WebDNA talk - MSNBot Crashing

This WebDNA talk-list message is from

2004


It keeps the original formatting.
numero = 57812
interpreted = N
texte = I heard about web crawlers before but didn't investigate on them or implemented any changes on the code for them. At times WebDNA crashed, I saved some of netstat -a on a file. netstat -a > netstat.5-13-2004 I saved them as a reference for the times ../WebCatalogCtl restart or ./WebCatalogCtl stop/start produced another instance of WebDNA. LISTENING /tmp/.webcatalog LISTENING /tmp/.webcatalog Looking at them now I notice that a lot of connections from msnbot. I put robots.txt (to deny MSNBot) on all the sites except for one. In 30 mins, the site received 700 requests from msnbot. All I can say is there is a glitch on MSNBot. A web crawler should not cripple a site unless you put some code specifically for the web crawler (i.e. to increase hits). Eduardo ----- Original Message ----- From: "Alain Russell" To: "WebDNA Talk" Sent: Thursday, May 13, 2004 1:00 PM Subject: Re: SOP for WebDNA talk - MSNBot Crashing > Are you sure you're not redirecting the robot around the place .. So it > ends up bouncing from one page to another ? > We had a spider that went AWOL on our server once, we took about > 127,000 page requests in the space of an hour .. no crashing. > > Micro$osft are a pain the arse but I doubt the coders working on their > spider are stupid .. > > > On 14/05/2004, at 7:51 AM, wrote: > > > How about 10,000 page requests from MSNBot > > in about 3 hours. > > > > I created robots.txt > > ---------------- > > # MSNBot Search > > User-agent: msnbot > > Disallow: / > > ------------- > > > > on all the root directory of all our sites just to > > stop webDNA from crashing. > > > > I will refine the file later. Right now w/out this file > > its like MSNBot is doing a DoS on us. > > > > > > Eduardo > > > > ----- Original Message ----- > > From: "Alain Russell" > > To: "WebDNA Talk" > > Sent: Thursday, May 13, 2004 12:13 PM > > Subject: Re: SOP for WebDNA talk - MSNBot Crashing > > > > > >> I don't mean to cause trouble here but I can't remember the lsat time > >> I > >> saw WebDNA crash .. not even under heavy load .. > >> The first thing I would look for here is some bad code .. code that > >> expects a cart to be passed, eg - what happens if you call the page > >> with no cart= ? > >> > >> I just don't buy the webDNA crashing line anymore .. > >> > >> Alain > >> > >> > >> On 14/05/2004, at 6:59 AM, Paul Uttermohlen wrote: > >> > >>> Blocking the ranges of IP's used by the MSNBot servers via the > >>> firewall did > >>> the trick.... BUT the client wants the sites indexed by MSN. > >>> > >>> Gotta let them in... Gotta keep webcat from crashing a thousand times > >>> an > >>> hour... > >>> > >>> Paul > >>> > >>> > >>> > >>> > >>> On 5/13/04 2:09 PM, "Gary Krockover" wrote: > >>> > >>>> Can you do a redirect for MSNBots, something like: > >>>> > >>>> [showif [browsername]^MSNBOT] > >>>> [redirect (to a static sitemap that has no links & doesn't carry the > >>>> [cart])] > >>>> [/showif] > >>>> > >>>> Not sure if that would rectify the problem, just thinking out > >>>> loud.... > >>>> > >>>> GK > >>>> > >>>> At 11:37 AM 5/13/2004, you wrote: > >>>>> Scott, > >>>>> > >>>>> This looks like what I experienced last week. I tracked the cart > >>>>> problems > >>>>> down to MSNBot which was flooding the shopping cart folder with > >>>>> carts. Many > >>>>> of the carts had file names that where hundreds of characters long. > >>>>> That was > >>>>> partly due to the way I handle cart naming for repeat customers. > >>>>> > >>>>> MSNBot is beta. I think it is flawed. MSN insists that it's just > >>>>> sophisticated. And perhaps it is WebDNA that is flawed. > >>>>> > >>>>> There may be some truth to that. Large volumes of carts or long > >>>>> file > >>>>> names > >>>>> for carts should not cause Webcatalog to crash, but it does... And > >>>>> with > >>>>> great frequency and predictability. > >>>>> > >>>>> So Scott, if there is anything that you can do to stop this > >>>>> crashing > >>>>> behavior displayed when MSNBot floods our servers it would be > >>>>> greatly > >>>>> appreciated. > >>>>> > >>>>> Crashes occur on 4.5.1, 5.0.1 and 6 > >>>>> > >>>>> Thanks, Paul > >>>> > >>>> > >>>> ------------------------------------------------------------- > >>>> This message is sent to you because you are subscribed to > >>>> the mailing list . > >>>> To unsubscribe, E-mail to: > >>>> To switch to the DIGEST mode, E-mail to > >>>> > >>>> Web Archive of this list is at: http://webdna.smithmicro.com/ > >>> > >>> ________________________________________________________ > >>> Paul Uttermohlen > >>> http://www.Anoweb.com/ > >>> http://www.Uttermohlen.com/ > >>> Paul@Anoweb.com > >>> Columbus, Ohio 43026 > >>> 614-529-8963 > >>> _______________________________________________________ > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------- > >>> This message is sent to you because you are subscribed to > >>> the mailing list . > >>> To unsubscribe, E-mail to: > >>> To switch to the DIGEST mode, E-mail to > >>> > >>> Web Archive of this list is at: http://webdna.smithmicro.com/ > >>> > >> > >> > >> > >> ------------------------------------------------------------- > >> This message is sent to you because you are subscribed to > >> the mailing list . > >> To unsubscribe, E-mail to: > >> To switch to the DIGEST mode, E-mail to > > > >> Web Archive of this list is at: http://webdna.smithmicro.com/ > > > > > > ------------------------------------------------------------- > > This message is sent to you because you are subscribed to > > the mailing list . > > To unsubscribe, E-mail to: > > To switch to the DIGEST mode, E-mail to > > > > Web Archive of this list is at: http://webdna.smithmicro.com/ > > > > > > ------------------------------------------------------------- > This message is sent to you because you are subscribed to > the mailing list . > To unsubscribe, E-mail to: > To switch to the DIGEST mode, E-mail to > Web Archive of this list is at: http://webdna.smithmicro.com/ ------------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://webdna.smithmicro.com/ Associated Messages, from the most recent to the oldest:

    
  1. Re: SOP for WebDNA talk - MSNBot Crashing ( Paul Uttermohlen 2004)
  2. Re: SOP for WebDNA talk - MSNBot Crashing ( 2004)
  3. Re: SOP for WebDNA talk - MSNBot Crashing ( "Scott Anderson" 2004)
  4. Re: SOP for WebDNA talk - MSNBot Crashing ( Frank Nordberg 2004)
  5. Re: SOP for WebDNA talk - MSNBot Crashing ( 2004)
  6. Re: SOP for WebDNA talk - MSNBot Crashing ( 2004)
  7. Re: SOP for WebDNA talk - MSNBot Crashing ( Paul Uttermohlen 2004)
  8. Re: SOP for WebDNA talk - MSNBot Crashing ( Donovan Brooke 2004)
  9. Re: SOP for WebDNA talk - MSNBot Crashing ( Donovan Brooke 2004)
  10. Re: SOP for WebDNA talk - MSNBot Crashing ( 2004)
  11. Re: SOP for WebDNA talk - MSNBot Crashing ( Alain Russell 2004)
  12. Re: SOP for WebDNA talk - MSNBot Crashing ( 2004)
  13. Re: SOP for WebDNA talk - MSNBot Crashing ( Alain Russell 2004)
  14. Re: SOP for WebDNA talk - MSNBot Crashing ( Paul Uttermohlen 2004)
  15. Re: SOP for WebDNA talk - MSNBot Crashing ( Gary Krockover 2004)
  16. Re: SOP for WebDNA talk - MSNBot Crashing ( Paul Uttermohlen 2004)
I heard about web crawlers before but didn't investigate on them or implemented any changes on the code for them. At times WebDNA crashed, I saved some of netstat -a on a file. netstat -a > netstat.5-13-2004 I saved them as a reference for the times ../WebCatalogCtl restart or ./WebCatalogCtl stop/start produced another instance of WebDNA. LISTENING /tmp/.webcatalog LISTENING /tmp/.webcatalog Looking at them now I notice that a lot of connections from msnbot. I put robots.txt (to deny MSNBot) on all the sites except for one. In 30 mins, the site received 700 requests from msnbot. All I can say is there is a glitch on MSNBot. A web crawler should not cripple a site unless you put some code specifically for the web crawler (i.e. to increase hits). Eduardo ----- Original Message ----- From: "Alain Russell" To: "WebDNA Talk" Sent: Thursday, May 13, 2004 1:00 PM Subject: Re: SOP for WebDNA talk - MSNBot Crashing > Are you sure you're not redirecting the robot around the place .. So it > ends up bouncing from one page to another ? > We had a spider that went AWOL on our server once, we took about > 127,000 page requests in the space of an hour .. no crashing. > > Micro$osft are a pain the arse but I doubt the coders working on their > spider are stupid .. > > > On 14/05/2004, at 7:51 AM, wrote: > > > How about 10,000 page requests from MSNBot > > in about 3 hours. > > > > I created robots.txt > > ---------------- > > # MSNBot Search > > User-agent: msnbot > > Disallow: / > > ------------- > > > > on all the root directory of all our sites just to > > stop webDNA from crashing. > > > > I will refine the file later. Right now w/out this file > > its like MSNBot is doing a DoS on us. > > > > > > Eduardo > > > > ----- Original Message ----- > > From: "Alain Russell" > > To: "WebDNA Talk" > > Sent: Thursday, May 13, 2004 12:13 PM > > Subject: Re: SOP for WebDNA talk - MSNBot Crashing > > > > > >> I don't mean to cause trouble here but I can't remember the lsat time > >> I > >> saw WebDNA crash .. not even under heavy load .. > >> The first thing I would look for here is some bad code .. code that > >> expects a cart to be passed, eg - what happens if you call the page > >> with no cart= ? > >> > >> I just don't buy the webDNA crashing line anymore .. > >> > >> Alain > >> > >> > >> On 14/05/2004, at 6:59 AM, Paul Uttermohlen wrote: > >> > >>> Blocking the ranges of IP's used by the MSNBot servers via the > >>> firewall did > >>> the trick.... BUT the client wants the sites indexed by MSN. > >>> > >>> Gotta let them in... Gotta keep webcat from crashing a thousand times > >>> an > >>> hour... > >>> > >>> Paul > >>> > >>> > >>> > >>> > >>> On 5/13/04 2:09 PM, "Gary Krockover" wrote: > >>> > >>>> Can you do a redirect for MSNBots, something like: > >>>> > >>>> [showif [browsername]^MSNBOT] > >>>> [redirect (to a static sitemap that has no links & doesn't carry the > >>>> [cart])] > >>>> [/showif] > >>>> > >>>> Not sure if that would rectify the problem, just thinking out > >>>> loud.... > >>>> > >>>> GK > >>>> > >>>> At 11:37 AM 5/13/2004, you wrote: > >>>>> Scott, > >>>>> > >>>>> This looks like what I experienced last week. I tracked the cart > >>>>> problems > >>>>> down to MSNBot which was flooding the shopping cart folder with > >>>>> carts. Many > >>>>> of the carts had file names that where hundreds of characters long. > >>>>> That was > >>>>> partly due to the way I handle cart naming for repeat customers. > >>>>> > >>>>> MSNBot is beta. I think it is flawed. MSN insists that it's just > >>>>> sophisticated. And perhaps it is WebDNA that is flawed. > >>>>> > >>>>> There may be some truth to that. Large volumes of carts or long > >>>>> file > >>>>> names > >>>>> for carts should not cause Webcatalog to crash, but it does... And > >>>>> with > >>>>> great frequency and predictability. > >>>>> > >>>>> So Scott, if there is anything that you can do to stop this > >>>>> crashing > >>>>> behavior displayed when MSNBot floods our servers it would be > >>>>> greatly > >>>>> appreciated. > >>>>> > >>>>> Crashes occur on 4.5.1, 5.0.1 and 6 > >>>>> > >>>>> Thanks, Paul > >>>> > >>>> > >>>> ------------------------------------------------------------- > >>>> This message is sent to you because you are subscribed to > >>>> the mailing list . > >>>> To unsubscribe, E-mail to: > >>>> To switch to the DIGEST mode, E-mail to > >>>> > >>>> Web Archive of this list is at: http://webdna.smithmicro.com/ > >>> > >>> ________________________________________________________ > >>> Paul Uttermohlen > >>> http://www.Anoweb.com/ > >>> http://www.Uttermohlen.com/ > >>> Paul@Anoweb.com > >>> Columbus, Ohio 43026 > >>> 614-529-8963 > >>> _______________________________________________________ > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------- > >>> This message is sent to you because you are subscribed to > >>> the mailing list . > >>> To unsubscribe, E-mail to: > >>> To switch to the DIGEST mode, E-mail to > >>> > >>> Web Archive of this list is at: http://webdna.smithmicro.com/ > >>> > >> > >> > >> > >> ------------------------------------------------------------- > >> This message is sent to you because you are subscribed to > >> the mailing list . > >> To unsubscribe, E-mail to: > >> To switch to the DIGEST mode, E-mail to > > > >> Web Archive of this list is at: http://webdna.smithmicro.com/ > > > > > > ------------------------------------------------------------- > > This message is sent to you because you are subscribed to > > the mailing list . > > To unsubscribe, E-mail to: > > To switch to the DIGEST mode, E-mail to > > > > Web Archive of this list is at: http://webdna.smithmicro.com/ > > > > > > ------------------------------------------------------------- > This message is sent to you because you are subscribed to > the mailing list . > To unsubscribe, E-mail to: > To switch to the DIGEST mode, E-mail to > Web Archive of this list is at: http://webdna.smithmicro.com/ ------------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://webdna.smithmicro.com/

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

Date Bug (1998) WC2f3 (1997) WebDNS and RSS (2003) Emailer [cart] file names (1997) Bad creator codes which cause access denied errormessage (1997) WebCat chatroom (1997) Execute Applescript (1997) too many nested ... problem (1997) Check boxes (1997) Failure to document significant changes ... (2000) Date Formats (1997) Date Sorting (1997) Cart sweeper (2000) [SearchString] usage (1997) How do you cause a new window to appear behind the current (1999) [HTML*] (a proposition) (1997) FlushDatabase Suggestion (1998) upgrading (1997) Giving out error pages (1997) Re:quit command on NT (1997)