Re: OT: Google

This WebDNA talk-list message is from

2002


It keeps the original formatting.
numero = 46278
interpreted = N
texte = Glenn Busbin wrote: ...>>Well, it does now! Musica Viva has just been re-spidered, and Google now includes lots of pages it > > > Good for you.And for all other wecats too, I think. When you start playing around with WebDNA you soon end up with far more dynamically created pages than you though possible. It's good to know that the best search engine on the internet has improved its way to deal with such pages!> > Google's algorithm is way ahead of other search engines ...well, there's ixqick.com too, of course, but I expect their approach is cheating. ;-)> Unfortunately, there are a substantial number of other se's used by people, too. Those se's may not be able to handle the query strings.That's true, but the old Big Ones doesn't seem to do much spidering at all these days, so from their perspective the only thing that matters is how the web pages looked a year or two ago. A bit too late to do anything about that...> > AllTheWeb spidered one of my sites recently. It did it more than once, because it evidently saw the cart number as part of a new page URL and hit the same pages again and again each time it returned to look for newly added pages.Hmmm... I don't feel very much proud to be a Norwegian right now ;-)This is, of course, exactly the kind of thing Google tries to avoid by restricting their spidering of dynamic pages. Unfortunately it seems my fellow countrymen haven't got that point yet. I'm sure they'll eventually learn the hard way, though - just as Google had to do. No wonder AlltTheWeb can brag about how many pages they've indexed. They've probably got a couple of millions of pages from Musica Viva alone ;-)(Since I'm way off topic anyway: A good friend of mine, John Chambers, runs a music site on one of the Massachusetts Institute of Technology's servers. One of the first thing Google's spiders tried to do once they were let loose on the dynamic pages, was to spider the modest little music search engine John keeps on his site. Almost caused a complete overload breakdown of the university's network before they managed to stop it. Just imagine a power strong enough to throw *MIT* offline going wild on one of our wimpy little servers!)Glenn Busbin also wrote: > FWIW, I also said something about how trying to fool a se bot by having a page show one thing to real people and another to bots was a good way to get in trouble.Yep. Unfortunately, although the spiders are pretty stupid, the guys behind them are not!Frank Nordberg http://www.musicaviva.com P.S. Even more off-topic Google stuff: http://labs.google.com/ ------------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://webdna.smithmicro.com/ Associated Messages, from the most recent to the oldest:

    
  1. Re: OT: Google image search results ( devaulw@onebox.com 2004)
  2. Re: OT: Google image search results ( Donovan Brooke 2004)
  3. OT: Google image search results ( Colin Sidwell 2004)
  4. OT: google (Brian B. Burton 2003)
  5. Re: OT: Google (Gary Krockover 2002)
  6. Re: OT: Google (Glenn Busbin 2002)
  7. Re: OT: Google (Frank Nordberg 2002)
  8. Re: OT: Google (Glenn Busbin 2002)
  9. Re: OT: Google (Glenn Busbin 2002)
  10. OT: Google (Frank Nordberg 2002)
Glenn Busbin wrote: ...>>Well, it does now! Musica Viva has just been re-spidered, and Google now includes lots of pages it > > > Good for you.And for all other wecats too, I think. When you start playing around with WebDNA you soon end up with far more dynamically created pages than you though possible. It's good to know that the best search engine on the internet has improved its way to deal with such pages!> > Google's algorithm is way ahead of other search engines ...well, there's ixqick.com too, of course, but I expect their approach is cheating. ;-)> Unfortunately, there are a substantial number of other se's used by people, too. Those se's may not be able to handle the query strings.That's true, but the old Big Ones doesn't seem to do much spidering at all these days, so from their perspective the only thing that matters is how the web pages looked a year or two ago. A bit too late to do anything about that...> > AllTheWeb spidered one of my sites recently. It did it more than once, because it evidently saw the cart number as part of a new page URL and hit the same pages again and again each time it returned to look for newly added pages.Hmmm... I don't feel very much proud to be a Norwegian right now ;-)This is, of course, exactly the kind of thing Google tries to avoid by restricting their spidering of dynamic pages. Unfortunately it seems my fellow countrymen haven't got that point yet. I'm sure they'll eventually learn the hard way, though - just as Google had to do. No wonder AlltTheWeb can brag about how many pages they've indexed. They've probably got a couple of millions of pages from Musica Viva alone ;-)(Since I'm way off topic anyway: A good friend of mine, John Chambers, runs a music site on one of the Massachusetts Institute of Technology's servers. One of the first thing Google's spiders tried to do once they were let loose on the dynamic pages, was to spider the modest little music search engine John keeps on his site. Almost caused a complete overload breakdown of the university's network before they managed to stop it. Just imagine a power strong enough to throw *MIT* offline going wild on one of our wimpy little servers!)Glenn Busbin also wrote: > FWIW, I also said something about how trying to fool a se bot by having a page show one thing to real people and another to bots was a good way to get in trouble.Yep. Unfortunately, although the spiders are pretty stupid, the guys behind them are not!Frank Nordberg http://www.musicaviva.com P.S. Even more off-topic Google stuff: http://labs.google.com/ ------------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://webdna.smithmicro.com/ Frank Nordberg

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

Post Test (2008) One other big addition... (1997) Help name our technology! (1997) Date Calulation (1997) Cookie Crumble (1998) Conditional searching & displaying (1997) Enhanced Master Counter? (1997) [WebDNA] Can WebDNA corrupt a db? (2012) PROBLEM (1997) Location of Browser Info.txt file (1997) wow (1998) Banners and sort of random display (1997) Re:[off] Promotions Co? (1997) Redirect and passing more than one variable... (2002) adding up in a db (1997) WebCat2b12 Mac.acgi--[searchstring] bug (1997) RE:DatabaseHelper (1997) LOG IN LOG OUT (1997) Include files (1998) OT: IIS Log Rolling (2006)