Re: Search Engine questions ...

This WebDNA talk-list message is from

2002


It keeps the original formatting.
numero = 44848
interpreted = N
texte = I have this exact thing, and it is hit by spiders all the time.I added an agents list database that compares the header sent by the browser when the page is hit by anybody and if it is a spider that is on this agents list, it shows up.For example, this is on my webstat page from yesterday, where all I'm recording is the URL, date, Time, IPAddress *and* browsername, which you might be omitting and is why you aren't seeing spiders (agents): Date Time IP Address Browser Used Came From Page Visited 11/02/2002 14:44:27 216.122.066.074 SurveyBot/2.2 Whois Source http://www.whois.sc/ /index.htmlThat probably won't show very well in plain text, but you should be able to see that SurveyBot (agent) hit my index page yesterday. I get Googlebot and all the other ones each day.GK > >There is absolutely NO WAY a search engine is going to see any code in an > >HTML page BEFORE webcat interprets it. Think about it guys. The search > >engine is making page requests and your server is sending them the > >information EXACTLY THE SAME WAY AS NORMAL. It can't do it any other way. > > I don't pretend to understand how they see pages, but based my comments on personal experience with web logs and a small script as well as the entries in AllTheWeb. > > Try this: > Write a script that records page access and writes this to a db: > URL, Date, Time, IPAddress > > Stick it at the top of any page on your site that is available to a spider. Wait until your web logs show that the page has been hit again by a spider. There will be no record in the db indicating that a spider has seen the page. The inktomi spider is an exception, but Googlebot, FAST, the usual spam spiders, and others simply do not appear in the db. > > > > > >On the topic of what consititutes a dynamic page in terms of a search > >engine. A dynamic pages content will be different from one load to the next > >based on a series of variables passed to it. > > Right, but... > A page can have static (unchanging) content, such as title, headers, text, images, etc. and also have dynamic content, such as a search that returns price, quantities available, etc. > > In my case, I recently put up a site that uses WC and, after keeping it blocked for a while with the robots.txt file, I opened it up for spidering and asked Google and AllTheWeb to take a look. Some of the content on the pages is static. Some is dynamic. AllTheWeb crawled the site and followed 100% of the links that were written manually (they do contain ?cart=[cart] and maybe some other variable). It did not follow any link that was written dynamically. Searching AllTheWeb for words and phrases that were displayed dynamically produced zero returns, even when I used the URL in the search. > > Google has hit the robots.txt file several times, but has not yet spidered the site. > > That's my experience with dynamic pages that are available for the spiders and se's. Maybe your's is different. > > Glenn ------------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://search.smithmicro.com/ Associated Messages, from the most recent to the oldest:

    
  1. Re: Search Engine questions ... (Pedro Rivera 2002)
  2. Re: Search Engine questions ... (Wendell Kozak 2002)
  3. Re: Search Engine questions ... (dale's stuff 2002)
  4. Re: Search Engine questions ... (Gary Krockover 2002)
  5. Re: Search Engine questions ... (Gary Krockover 2002)
  6. Re: Search Engine questions ... (Kenneth Grome 2002)
  7. Re: Search Engine questions ... (Kenneth Grome 2002)
  8. Re: Search Engine questions ... (Kenneth Grome 2002)
  9. Re: Search Engine questions ... (Glenn Busbin 2002)
  10. Re: Search Engine questions ... (Kenneth Grome 2002)
  11. Re: Search Engine questions ... (Glenn Busbin 2002)
  12. Re: Search Engine questions ... (Kenneth Grome 2002)
  13. Re: Search Engine questions ... (Brian Fries 2002)
  14. Re: Search Engine questions ... (Glenn Busbin 2002)
  15. Re: Search Engine questions ... (Andrew Simpson 2002)
  16. Re: Search Engine questions ... (Glenn Busbin 2002)
  17. Re: Search Engine questions ... (Kenneth Grome 2002)
  18. Re: Search Engine questions ... (Glenn Busbin 2002)
  19. Re: Search Engine questions ... (Kenneth Grome 2002)
  20. Re: Search Engine questions ... (Glenn Busbin 2002)
  21. Re: Search Engine questions ... (dale's stuff 2002)
  22. Re: Search Engine questions ... (Glenn Busbin 2002)
  23. Re: Search Engine questions ... (John Peacock 2002)
  24. Re: Search Engine questions ... (Glenn Busbin 2002)
  25. Re: Search Engine questions ... (Alain Russell 2002)
  26. Re: Search Engine questions ... (Glenn Busbin 2002)
  27. Re: Search Engine questions ... (Dan Strong 2002)
  28. Re: Search Engine questions ... (Oleg Kremiansky 2002)
  29. Re: Search Engine questions ... (Glenn Busbin 2002)
  30. Re: Search Engine questions ... (Alain Russell 2002)
  31. Re: Search Engine questions ... (Kenneth Grome 2002)
  32. Re: Search Engine questions ... (Glenn Busbin 2002)
  33. Re: Search Engine questions ... (Glenn Busbin 2002)
  34. Re: Search Engine questions ... (Glenn Busbin 2002)
  35. Re: Search Engine questions ... (Glenn Busbin 2002)
  36. Re: Search Engine questions ... (Clayton Randall 2002)
  37. Re: Search Engine questions ... (Kenneth Grome 2002)
  38. Re: Search Engine questions ... (Glenn Busbin 2002)
  39. Re: Search Engine questions ... (Donovan Brooke 2002)
  40. Re: Search Engine questions ... (Alain Russell 2002)
  41. Re: Search Engine questions ... (Donovan Brooke 2002)
  42. Re: Search Engine questions ... (Alain Russell 2002)
  43. Search Engine questions ... (Kenneth Grome 2002)
I have this exact thing, and it is hit by spiders all the time.I added an agents list database that compares the header sent by the browser when the page is hit by anybody and if it is a spider that is on this agents list, it shows up.For example, this is on my webstat page from yesterday, where all I'm recording is the URL, date, Time, IPAddress *and* browsername, which you might be omitting and is why you aren't seeing spiders (agents): Date Time IP Address Browser Used Came From Page Visited 11/02/2002 14:44:27 216.122.066.074 SurveyBot/2.2 Whois Source http://www.whois.sc/ /index.htmlThat probably won't show very well in plain text, but you should be able to see that SurveyBot (agent) hit my index page yesterday. I get Googlebot and all the other ones each day.GK > >There is absolutely NO WAY a search engine is going to see any code in an > >HTML page BEFORE webcat interprets it. Think about it guys. The search > >engine is making page requests and your server is sending them the > >information EXACTLY THE SAME WAY AS NORMAL. It can't do it any other way. > > I don't pretend to understand how they see pages, but based my comments on personal experience with web logs and a small script as well as the entries in AllTheWeb. > > Try this: > Write a script that records page access and writes this to a db: > URL, Date, Time, IPAddress > > Stick it at the top of any page on your site that is available to a spider. Wait until your web logs show that the page has been hit again by a spider. There will be no record in the db indicating that a spider has seen the page. The inktomi spider is an exception, but Googlebot, FAST, the usual spam spiders, and others simply do not appear in the db. > > > > > >On the topic of what consititutes a dynamic page in terms of a search > >engine. A dynamic pages content will be different from one load to the next > >based on a series of variables passed to it. > > Right, but... > A page can have static (unchanging) content, such as title, headers, text, images, etc. and also have dynamic content, such as a search that returns price, quantities available, etc. > > In my case, I recently put up a site that uses WC and, after keeping it blocked for a while with the robots.txt file, I opened it up for spidering and asked Google and AllTheWeb to take a look. Some of the content on the pages is static. Some is dynamic. AllTheWeb crawled the site and followed 100% of the links that were written manually (they do contain ?cart=[cart] and maybe some other variable). It did not follow any link that was written dynamically. Searching AllTheWeb for words and phrases that were displayed dynamically produced zero returns, even when I used the URL in the search. > > Google has hit the robots.txt file several times, but has not yet spidered the site. > > That's my experience with dynamic pages that are available for the spiders and se's. Maybe your's is different. > > Glenn ------------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://search.smithmicro.com/ Gary Krockover

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

fixed date problem (1997) Cancel Subscription (1996) redirect within a showif (1998) [quantity] - spoke too soon (1997) WebCat b13 CGI -shownext- (1997) Multiple catalog databases and showcart (1997) Custom Convertchars Database (2004) Browser Check for WebDNA compatability (1999) SiteGuard Admin Feature ? (1997) Adding multiple items to Cart at one time, & append context problem (1998) WC Database Format (1997) Cart questions (1997) Transferring textareas (1997) Almost a there but..bye bye NetCloak (1997) String manipulation in Webcatalog (2001) Opinion: [input] should be called [output] ... (1997) Fun with dates (1997) format_to_days on NT (1997) off topic - dna snipets (1997) WebCat2b14MacPlugIn - [include] doesn't hide the search string (1997)