Re: Search Engine questions ...

This WebDNA talk-list message is from

2002


It keeps the original formatting.
numero = 44834
interpreted = N
texte = >There is absolutely NO WAY a search engine is going to see any code in an >HTML page BEFORE webcat interprets it. Think about it guys. The search >engine is making page requests and your server is sending them the >information EXACTLY THE SAME WAY AS NORMAL. It can't do it any other way.I don't pretend to understand how they see pages, but based my comments on personal experience with web logs and a small script as well as the entries in AllTheWeb.Try this: Write a script that records page access and writes this to a db: URL, Date, Time, IPAddressStick it at the top of any page on your site that is available to a spider. Wait until your web logs show that the page has been hit again by a spider. There will be no record in the db indicating that a spider has seen the page. The inktomi spider is an exception, but Googlebot, FAST, the usual spam spiders, and others simply do not appear in the db. > >On the topic of what consititutes a dynamic page in terms of a search >engine. A dynamic pages content will be different from one load to the next >based on a series of variables passed to it.Right, but... A page can have static (unchanging) content, such as title, headers, text, images, etc. and also have dynamic content, such as a search that returns price, quantities available, etc.In my case, I recently put up a site that uses WC and, after keeping it blocked for a while with the robots.txt file, I opened it up for spidering and asked Google and AllTheWeb to take a look. Some of the content on the pages is static. Some is dynamic. AllTheWeb crawled the site and followed 100% of the links that were written manually (they do contain ?cart=[cart] and maybe some other variable). It did not follow any link that was written dynamically. Searching AllTheWeb for words and phrases that were displayed dynamically produced zero returns, even when I used the URL in the search.Google has hit the robots.txt file several times, but has not yet spidered the site.That's my experience with dynamic pages that are available for the spiders and se's. Maybe your's is different.Glenn------------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://search.smithmicro.com/ Associated Messages, from the most recent to the oldest:

    
  1. Re: Search Engine questions ... (Pedro Rivera 2002)
  2. Re: Search Engine questions ... (Wendell Kozak 2002)
  3. Re: Search Engine questions ... (dale's stuff 2002)
  4. Re: Search Engine questions ... (Gary Krockover 2002)
  5. Re: Search Engine questions ... (Gary Krockover 2002)
  6. Re: Search Engine questions ... (Kenneth Grome 2002)
  7. Re: Search Engine questions ... (Kenneth Grome 2002)
  8. Re: Search Engine questions ... (Kenneth Grome 2002)
  9. Re: Search Engine questions ... (Glenn Busbin 2002)
  10. Re: Search Engine questions ... (Kenneth Grome 2002)
  11. Re: Search Engine questions ... (Glenn Busbin 2002)
  12. Re: Search Engine questions ... (Kenneth Grome 2002)
  13. Re: Search Engine questions ... (Brian Fries 2002)
  14. Re: Search Engine questions ... (Glenn Busbin 2002)
  15. Re: Search Engine questions ... (Andrew Simpson 2002)
  16. Re: Search Engine questions ... (Glenn Busbin 2002)
  17. Re: Search Engine questions ... (Kenneth Grome 2002)
  18. Re: Search Engine questions ... (Glenn Busbin 2002)
  19. Re: Search Engine questions ... (Kenneth Grome 2002)
  20. Re: Search Engine questions ... (Glenn Busbin 2002)
  21. Re: Search Engine questions ... (dale's stuff 2002)
  22. Re: Search Engine questions ... (Glenn Busbin 2002)
  23. Re: Search Engine questions ... (John Peacock 2002)
  24. Re: Search Engine questions ... (Glenn Busbin 2002)
  25. Re: Search Engine questions ... (Alain Russell 2002)
  26. Re: Search Engine questions ... (Glenn Busbin 2002)
  27. Re: Search Engine questions ... (Dan Strong 2002)
  28. Re: Search Engine questions ... (Oleg Kremiansky 2002)
  29. Re: Search Engine questions ... (Glenn Busbin 2002)
  30. Re: Search Engine questions ... (Alain Russell 2002)
  31. Re: Search Engine questions ... (Kenneth Grome 2002)
  32. Re: Search Engine questions ... (Glenn Busbin 2002)
  33. Re: Search Engine questions ... (Glenn Busbin 2002)
  34. Re: Search Engine questions ... (Glenn Busbin 2002)
  35. Re: Search Engine questions ... (Glenn Busbin 2002)
  36. Re: Search Engine questions ... (Clayton Randall 2002)
  37. Re: Search Engine questions ... (Kenneth Grome 2002)
  38. Re: Search Engine questions ... (Glenn Busbin 2002)
  39. Re: Search Engine questions ... (Donovan Brooke 2002)
  40. Re: Search Engine questions ... (Alain Russell 2002)
  41. Re: Search Engine questions ... (Donovan Brooke 2002)
  42. Re: Search Engine questions ... (Alain Russell 2002)
  43. Search Engine questions ... (Kenneth Grome 2002)
>There is absolutely NO WAY a search engine is going to see any code in an >HTML page BEFORE webcat interprets it. Think about it guys. The search >engine is making page requests and your server is sending them the >information EXACTLY THE SAME WAY AS NORMAL. It can't do it any other way.I don't pretend to understand how they see pages, but based my comments on personal experience with web logs and a small script as well as the entries in AllTheWeb.Try this: Write a script that records page access and writes this to a db: URL, Date, Time, IPAddressStick it at the top of any page on your site that is available to a spider. Wait until your web logs show that the page has been hit again by a spider. There will be no record in the db indicating that a spider has seen the page. The inktomi spider is an exception, but Googlebot, FAST, the usual spam spiders, and others simply do not appear in the db. > >On the topic of what consititutes a dynamic page in terms of a search >engine. A dynamic pages content will be different from one load to the next >based on a series of variables passed to it.Right, but... A page can have static (unchanging) content, such as title, headers, text, images, etc. and also have dynamic content, such as a search that returns price, quantities available, etc.In my case, I recently put up a site that uses WC and, after keeping it blocked for a while with the robots.txt file, I opened it up for spidering and asked Google and AllTheWeb to take a look. Some of the content on the pages is static. Some is dynamic. AllTheWeb crawled the site and followed 100% of the links that were written manually (they do contain ?cart=[cart] and maybe some other variable). It did not follow any link that was written dynamically. Searching AllTheWeb for words and phrases that were displayed dynamically produced zero returns, even when I used the URL in the search.Google has hit the robots.txt file several times, but has not yet spidered the site.That's my experience with dynamic pages that are available for the spiders and se's. Maybe your's is different.Glenn------------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://search.smithmicro.com/ Glenn Busbin

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

How to Sort Summ data ? (1997) [OT] Industry news from Australia (2000) [WebDNA] Unexpected search results (2014) Help name our technology! (1997) local IIS setup (2004) [AppendFile] problem (WebCat2b13 Mac .acgi) (1997) RE: Sorting error (1997) Duplicate Cart ID (2001) WC2b15 File Corruption (1997) Symantec VIsual Page 1.0.3 (1997) [Q] Novice's question (1997) Help! WebCat2 bug (1997) Country & Ship-to address & other fields ? (1997) PIXO (1997) Weird Math and SV (1997) Sorry But This seems to be my fate (2000) 4.0 send email ticks. . (2000) Configuring E-mail (1997) logic help (1998) WebCat2b13MacPlugIn - More limits on [include] (1997)