Re: Search Engine questions ...

This WebDNA talk-list message is from

2002

It keeps the original formatting. numero = 44848
interpreted = N
texte = I have this exact thing, and it is hit by spiders all the time.I added an agents list database that compares the header sent by thebrowser when the page is hit by anybody and if it is a spider that is onthis agents list, it shows up.For example, this is on my webstat page from yesterday, where all I'mrecording is the URL, date, Time, IPAddress *and* browsername, which youmight be omitting and is why you aren't seeing spiders (agents): Date Time IP Address Browser Used Came From Page Visited 11/02/2002 14:44:27 216.122.066.074 SurveyBot/2.2 Whois Sourcehttp://www.whois.sc/ /index.htmlThat probably won't show very well in plain text, but you should be able tosee that SurveyBot (agent) hit my index page yesterday. I get Googlebot andall the other ones each day.GK> >There is absolutely NO WAY a search engine is going to see any code in an> >HTML page BEFORE webcat interprets it. Think about it guys. The search> >engine is making page requests and your server is sending them the> >information EXACTLY THE SAME WAY AS NORMAL. It can't do it any other way.>> I don't pretend to understand how they see pages, but based my comments onpersonal experience with web logs and a small script as well as the entriesin AllTheWeb.>> Try this:> Write a script that records page access and writes this to a db:> URL, Date, Time, IPAddress>> Stick it at the top of any page on your site that is available to aspider. Wait until your web logs show that the page has been hit again by aspider. There will be no record in the db indicating that a spider has seenthe page. The inktomi spider is an exception, but Googlebot, FAST, the usualspam spiders, and others simply do not appear in the db.>>> >> >On the topic of what consititutes a dynamic page in terms of a search> >engine. A dynamic pages content will be different from one load to thenext> >based on a series of variables passed to it.>> Right, but...> A page can have static (unchanging) content, such as title, headers, text,images, etc. and also have dynamic content, such as a search that returnsprice, quantities available, etc.>> In my case, I recently put up a site that uses WC and, after keeping itblocked for a while with the robots.txt file, I opened it up for spideringand asked Google and AllTheWeb to take a look. Some of the content on thepages is static. Some is dynamic. AllTheWeb crawled the site and followed100% of the links that were written manually (they do contain ?cart=[cart]and maybe some other variable). It did not follow any link that was writtendynamically. Searching AllTheWeb for words and phrases that were displayeddynamically produced zero returns, even when I used the URL in the search.>> Google has hit the robots.txt file several times, but has not yet spideredthe site.>> That's my experience with dynamic pages that are available for the spidersand se's. Maybe your's is different.>> Glenn-------------------------------------------------------------This message is sent to you because you are subscribed to the mailing list .To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://search.smithmicro.com/ Associated Messages, from the most recent to the oldest:

Re: Search Engine questions ... (Pedro Rivera 2002)
Re: Search Engine questions ... (Wendell Kozak 2002)
Re: Search Engine questions ... (dale's stuff 2002)
Re: Search Engine questions ... (Gary Krockover 2002)
Re: Search Engine questions ... (Gary Krockover 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Brian Fries 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Andrew Simpson 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (dale's stuff 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (John Peacock 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Alain Russell 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Dan Strong 2002)
Re: Search Engine questions ... (Oleg Kremiansky 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Alain Russell 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Clayton Randall 2002)
Re: Search Engine questions ... (Kenneth Grome 2002)
Re: Search Engine questions ... (Glenn Busbin 2002)
Re: Search Engine questions ... (Donovan Brooke 2002)
Re: Search Engine questions ... (Alain Russell 2002)
Re: Search Engine questions ... (Donovan Brooke 2002)
Re: Search Engine questions ... (Alain Russell 2002)
Search Engine questions ... (Kenneth Grome 2002)

I have this exact thing, and it is hit by spiders all the time.I added an agents list database that compares the header sent by thebrowser when the page is hit by anybody and if it is a spider that is onthis agents list, it shows up.For example, this is on my webstat page from yesterday, where all I'mrecording is the URL, date, Time, IPAddress *and* browsername, which youmight be omitting and is why you aren't seeing spiders (agents): Date Time IP Address Browser Used Came From Page Visited 11/02/2002 14:44:27 216.122.066.074 SurveyBot/2.2 Whois Sourcehttp://www.whois.sc/ /index.htmlThat probably won't show very well in plain text, but you should be able tosee that SurveyBot (agent) hit my index page yesterday. I get Googlebot andall the other ones each day.GK> >There is absolutely NO WAY a search engine is going to see any code in an> >HTML page BEFORE webcat interprets it. Think about it guys. The search> >engine is making page requests and your server is sending them the> >information EXACTLY THE SAME WAY AS NORMAL. It can't do it any other way.>> I don't pretend to understand how they see pages, but based my comments onpersonal experience with web logs and a small script as well as the entriesin AllTheWeb.>> Try this:> Write a script that records page access and writes this to a db:> URL, Date, Time, IPAddress>> Stick it at the top of any page on your site that is available to aspider. Wait until your web logs show that the page has been hit again by aspider. There will be no record in the db indicating that a spider has seenthe page. The inktomi spider is an exception, but Googlebot, FAST, the usualspam spiders, and others simply do not appear in the db.>>> >> >On the topic of what consititutes a dynamic page in terms of a search> >engine. A dynamic pages content will be different from one load to thenext> >based on a series of variables passed to it.>> Right, but...> A page can have static (unchanging) content, such as title, headers, text,images, etc. and also have dynamic content, such as a search that returnsprice, quantities available, etc.>> In my case, I recently put up a site that uses WC and, after keeping itblocked for a while with the robots.txt file, I opened it up for spideringand asked Google and AllTheWeb to take a look. Some of the content on thepages is static. Some is dynamic. AllTheWeb crawled the site and followed100% of the links that were written manually (they do contain ?cart=[cart]and maybe some other variable). It did not follow any link that was writtendynamically. Searching AllTheWeb for words and phrases that were displayeddynamically produced zero returns, even when I used the URL in the search.>> Google has hit the robots.txt file several times, but has not yet spideredthe site.>> That's my experience with dynamic pages that are available for the spidersand se's. Maybe your's is different.>> Glenn-------------------------------------------------------------This message is sent to you because you are subscribed to the mailing list .To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to Web Archive of this list is at: http://search.smithmicro.com/ Gary Krockover

DOWNLOAD WEBDNA NOW!

Re: Search Engine questions ...

2002

Top Articles:

Related Readings: