Re: Search Engine questions ...
This WebDNA talk-list message is from 2002
It keeps the original formatting.
numero = 44848
interpreted = N
texte = I have this exact thing, and it is hit by spiders all the time.I added an agents list database that compares the header sent by thebrowser when the page is hit by anybody and if it is a spider that is onthis agents list, it shows up.For example, this is on my webstat page from yesterday, where all I'mrecording is the URL, date, Time, IPAddress *and* browsername, which youmight be omitting and is why you aren't seeing spiders (agents): Date Time IP Address Browser Used Came From Page Visited 11/02/2002 14:44:27 216.122.066.074 SurveyBot/2.2 Whois Sourcehttp://www.whois.sc/ /index.htmlThat probably won't show very well in plain text, but you should be able tosee that SurveyBot (agent) hit my index page yesterday. I get Googlebot andall the other ones each day.GK> >There is absolutely NO WAY a search engine is going to see any code in an> >HTML page BEFORE webcat interprets it. Think about it guys. The search> >engine is making page requests and your server is sending them the> >information EXACTLY THE SAME WAY AS NORMAL. It can't do it any other way.>> I don't pretend to understand how they see pages, but based my comments onpersonal experience with web logs and a small script as well as the entriesin AllTheWeb.>> Try this:> Write a script that records page access and writes this to a db:> URL, Date, Time, IPAddress>> Stick it at the top of any page on your site that is available to aspider. Wait until your web logs show that the page has been hit again by aspider. There will be no record in the db indicating that a spider has seenthe page. The inktomi spider is an exception, but Googlebot, FAST, the usualspam spiders, and others simply do not appear in the db.>>> >> >On the topic of what consititutes a dynamic page in terms of a search> >engine. A dynamic pages content will be different from one load to thenext> >based on a series of variables passed to it.>> Right, but...> A page can have static (unchanging) content, such as title, headers, text,images, etc. and also have dynamic content, such as a search that returnsprice, quantities available, etc.>> In my case, I recently put up a site that uses WC and, after keeping itblocked for a while with the robots.txt file, I opened it up for spideringand asked Google and AllTheWeb to take a look. Some of the content on thepages is static. Some is dynamic. AllTheWeb crawled the site and followed100% of the links that were written manually (they do contain ?cart=[cart]and maybe some other variable). It did not follow any link that was writtendynamically. Searching AllTheWeb for words and phrases that were displayeddynamically produced zero returns, even when I used the URL in the search.>> Google has hit the robots.txt file several times, but has not yet spideredthe site.>> That's my experience with dynamic pages that are available for the spidersand se's. Maybe your's is different.>> Glenn-------------------------------------------------------------This message is sent to you because you are subscribed to the mailing list
.To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://search.smithmicro.com/
Associated Messages, from the most recent to the oldest:
I have this exact thing, and it is hit by spiders all the time.I added an agents list database that compares the header sent by thebrowser when the page is hit by anybody and if it is a spider that is onthis agents list, it shows up.For example, this is on my webstat page from yesterday, where all I'mrecording is the URL, date, Time, IPAddress *and* browsername, which youmight be omitting and is why you aren't seeing spiders (agents): Date Time IP Address Browser Used Came From Page Visited 11/02/2002 14:44:27 216.122.066.074 SurveyBot/2.2 Whois Sourcehttp://www.whois.sc/ /index.htmlThat probably won't show very well in plain text, but you should be able tosee that SurveyBot (agent) hit my index page yesterday. I get Googlebot andall the other ones each day.GK> >There is absolutely NO WAY a search engine is going to see any code in an> >HTML page BEFORE webcat interprets it. Think about it guys. The search> >engine is making page requests and your server is sending them the> >information EXACTLY THE SAME WAY AS NORMAL. It can't do it any other way.>> I don't pretend to understand how they see pages, but based my comments onpersonal experience with web logs and a small script as well as the entriesin AllTheWeb.>> Try this:> Write a script that records page access and writes this to a db:> URL, Date, Time, IPAddress>> Stick it at the top of any page on your site that is available to aspider. Wait until your web logs show that the page has been hit again by aspider. There will be no record in the db indicating that a spider has seenthe page. The inktomi spider is an exception, but Googlebot, FAST, the usualspam spiders, and others simply do not appear in the db.>>> >> >On the topic of what consititutes a dynamic page in terms of a search> >engine. A dynamic pages content will be different from one load to thenext> >based on a series of variables passed to it.>> Right, but...> A page can have static (unchanging) content, such as title, headers, text,images, etc. and also have dynamic content, such as a search that returnsprice, quantities available, etc.>> In my case, I recently put up a site that uses WC and, after keeping itblocked for a while with the robots.txt file, I opened it up for spideringand asked Google and AllTheWeb to take a look. Some of the content on thepages is static. Some is dynamic. AllTheWeb crawled the site and followed100% of the links that were written manually (they do contain ?cart=[cart]and maybe some other variable). It did not follow any link that was writtendynamically. Searching AllTheWeb for words and phrases that were displayeddynamically produced zero returns, even when I used the URL in the search.>> Google has hit the robots.txt file several times, but has not yet spideredthe site.>> That's my experience with dynamic pages that are available for the spidersand se's. Maybe your's is different.>> Glenn-------------------------------------------------------------This message is sent to you because you are subscribed to the mailing list .To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://search.smithmicro.com/
Gary Krockover
DOWNLOAD WEBDNA NOW!
Top Articles:
Talk List
The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...
Related Readings:
fixed date problem (1997)
Cancel Subscription (1996)
redirect within a showif (1998)
[quantity] - spoke too soon (1997)
WebCat b13 CGI -shownext- (1997)
Multiple catalog databases and showcart (1997)
Custom Convertchars Database (2004)
Browser Check for WebDNA compatability (1999)
SiteGuard Admin Feature ? (1997)
Adding multiple items to Cart at one time, & append context problem (1998)
WC Database Format (1997)
Cart questions (1997)
Transferring textareas (1997)
Almost a there but..bye bye NetCloak (1997)
String manipulation in Webcatalog (2001)
Opinion: [input] should be called [output] ... (1997)
Fun with dates (1997)
format_to_days on NT (1997)
off topic - dna snipets (1997)
WebCat2b14MacPlugIn - [include] doesn't hide the search string (1997)