If the text is clickable, the search engine knows it's a link.sb_stefan wrote:So that the search engine spider/bot/robot (or whatever it's called) knows that it's a link, not just some text.
Ok, so if I put http://mysite.com and don't make it clickable, the search engine won't recognize it as a link?AdamR wrote: If the text is clickable, the search engine knows it's a link.
Nonsense. Search engines can easily scan for "http://" and then parse a URL out of the following text.Ger wrote:Think like this: a search engine is designed to screen sites for information on it and presenting it to the one who searches. The search engine can't figure out on it's own that something is supposed to be something while it's displayed as something else. Why would a search engine think that something is a link while you display it like plain text?
They could, but as far as I know, they don't.Pony99CA wrote:Search engines can easily scan for "http://" and then parse a URL out of the following text.
Hey, I never said regexes don't exist. I just said search engines don't use them.Pony99CA wrote: Using your logic, this URL (http://phpbb.com) won't be treated as a hyperlink by phpBB because I didn't wrap it in URL tags. However, the phpBB developers were smart enough to scan for URLs and automatically convert them to links. They were even smart enough to auto-link things like http://www.phpbb.com (I didn't type http:// there).
And I bet many don't. Why not? Simple. They are designed to interpret HTML in the purest form so it's probably most like your own browser should interpret it. Why would a search engine "change" the content of the document read? If the author meant it to be a link, he would have made it a link. The authour clearly didn't make a link of it, but left it to be plain text. He probably has a reason for this (think of spam-control in things like blog-comments and forum threads, or whatever else you can think of). When search engines would still interpret this as a link, then it would become quite hard to let the search engine know it is not. It would require a special <nolink> tag, and that just for search engines? That's not very handy, is it?Pony99CA wrote: So why do you think search engine developers wouldn't do that, too? Maybe all search engines don't do it, but I'll bet that many do.
Code: Select all
<html>
<body>
<p>http://www.example.com/</p>
<p>www.example.com/</p>
<p><a href="http://www.example.com/">http://www.example.com/</a></p>
</body>
</html>
That doesn't help a thing here, because phpBB already parses it to be a valid link.Pony99CA wrote: That said, I would always put an http:// in front of my URLs just to be sure. (See my signature, where I include it in the link text even though I used the URL tags.)
Code: Select all
<a href="http://www.svpocketpc.com" class="postlink">http://www.svpocketpc.com</a>
Code: Select all
<a href="http://www.svpocketpc.com" class="postlink">www.svpocketpc.com</a>
Code: Select all
<a href="http://www.svpocketpc.com" class="postlink">Silicon Valley Pocket PC</a>
Whether "many" don't is irrelevant. The issue is whether any do, in which case putting the http:// in the text would have benefit, which is what the OP wanted to know.Ger wrote:And I bet many don't.Pony99CA wrote: So why do you think search engine developers wouldn't do that, too? Maybe all search engines don't do it, but I'll bet that many do.
Not necessarily. Search engines are designed to find Web pages, not render them. So finding URLs could have some benefit. (And what is HTML's "purest" form? )Ger wrote:Why not? Simple. They are designed to interpret HTML in the purest form so it's probably most like your own browser should interpret it.
It wouldn't be "changing" anything. It would just be picking URLs out of a stream of text.Ger wrote:Why would a search engine "change" the content of the document read?
No special markup would be needed. The search engine would parse the URL and, if it wasn't valid, it would get an HTTP error or time out and the search engine would drop the link from its index.Ger wrote:If the author meant it to be a link, he would have made it a link. The authour clearly didn't make a link of it, but left it to be plain text. He probably has a reason for this (think of spam-control in things like blog-comments and forum threads, or whatever else you can think of). When search engines would still interpret this as a link, then it would become quite hard to let the search engine know it is not. It would require a special <nolink> tag, and that just for search engines? That's not very handy, is it?
That's an interesting tool, but I'm not sure it proves what you think it does. I put my Web page in, which has multiple links to the same locations (in the header and footer nav tables), and it appears to have only counted those links once. So even if the tool did parse the text URLs, because your URLs were all pointing to the same "page", only one external link would be shown. You need code like the following:Ger wrote:You can check this here. I made a simple test page with the HTML code:The result is just 1 link, everything else is considered to be plain text, just as I meant it to be. That goes for every user agent that's available there.Code: Select all
<html> <body> <p>http://www.example.com/</p> <p>www.example.com/</p> <p><a href="http://www.example.com/">http://www.example.com/</a></p> </body> </html>
Code: Select all
<html>
<body>
<p>http://www1.example.com/</p>
<p>www2.example.com/</p>
<p><a href="http://www3.example.com/">http://www.example.com/</a></p>
</body>
</html>
I know what phpBB does (I said that in my first post) and I didn't say that it helped; I just said that I did it.Ger wrote:That doesn't help a thing here, because phpBB already parses it to be a valid link.Pony99CA wrote: That said, I would always put an http:// in front of my URLs just to be sure. (See my signature, where I include it in the link text even though I used the URL tags.)
And if I cared a whit about SEO, I might do that. But I don't. I wanted to show both the name of my site and the URL, and I thought two links (one like you showed and one auto-linked by phpBB) would be stupid. (Now can we drop that digression? I just mentioned my signature to indicate that I included the http://; I didn't want it deconstructed.)Ger wrote:Actually, for SEO it's better to change it to:Because now the search engine knows that this link is probably very relevant on the subject "Silicon Valley Pocket PC".Code: Select all
<a href="http://www.svpocketpc.com" class="postlink">Silicon Valley Pocket PC</a>
True, but as an author you simply wish search engines don't find some urls. That's also what robots.txt is for.Pony99CA wrote:Not necessarily. Search engines are designed to find Web pages, not render them. So finding URLs could have some benefitGer wrote:Why not? Simple. They are designed to interpret HTML in the purest form so it's probably most like your own browser should interpret it.
I didn't know the right term (english isn't my mother tongue), but what I mean is that search engines simply read the HTML and nothing else. They don't interpret, they don't read CSS or JS, etc. That's one of the reasons semantic HTML is so important for SEO.Pony99CA wrote:(And what is HTML's "purest" form? )
And how do I add a nofollow to this:Pony99CA wrote:It wouldn't be "changing" anything. It would just be picking URLs out of a stream of text.
(...)
And, by the way, there already is at least one attribute designed to affect the behavior of search engines -- the nofollow attribute (which is used on A tags). That's very similar to what your NOLINK tag would do -- essentially tell browsers to ignore URLs.
Code: Select all
<p>Hey dudes! Please don't check out this url, it's dangerous! http://www.evil-warez-site.com
I tell you, don't go there!!</p>
Now it's:Pony99CA wrote:That's an interesting tool, but I'm not sure it proves what you think it does. I put my Web page in, which has multiple links to the same locations (in the header and footer nav tables), and it appears to have only counted those links once. So even if the tool did parse the text URLs, because your URLs were all pointing to the same "page", only one external link would be shown.
Code: Select all
<html>
<body>
<p>http://www.example1.com/</p>
<p>www.example2.net/</p>
<p><a href="http://www.example3.org/">http://www.example3.org/</a></p>
</body>
</html>
Well, I don't know how to prove it. It's quite a respected tool as far as I know, and delivers expected results. You'd have to contact Google, MSN and Yahoo to be sure...Pony99CA wrote:Also, you're assuming that the tool really works exactly like a search engine does. That remains to be proven.
Just going on with your example - case closed.Pony99CA wrote:(Now can we drop that digression? I just mentioned my signature to indicate that I included the http://; I didn't want it deconstructed.)
I don't know why I would use a URL that I wouldn't want a legitimate search engine (as opposed to a spam bot) to find. However, if you really don't want a search engine to find a URL, mung it like people do with E-mail addresses (for example, dubdubdub dot example dot com).Ger wrote:True, but as an author you simply wish search engines don't find some urls. That's also what robots.txt is for.Pony99CA wrote:Not necessarily. Search engines are designed to find Web pages, not render them. So finding URLs could have some benefitGer wrote:Why not? Simple. They are designed to interpret HTML in the purest form so it's probably most like your own browser should interpret it.
Except, by your own admission, they do interpret the HTML. Finding URLs in my theoretical search engine is easier than in yours because yours has to look for A tags, then find the HREF attribute (or, for image search engines, the IMG tag and the SRC attribute). That's "interpretation".Ger wrote:I didn't know the right term (english isn't my mother tongue), but what I mean is that search engines simply read the HTML and nothing else. They don't interpret, they don't read CSS or JS, etc. That's one of the reasons semantic HTML is so important for SEO.Pony99CA wrote:(And what is HTML's "purest" form? )
I understand the limitations. I was just saying that somebody could define a NOLINK tag for such URLs and was citing the NOFOLLOW attribute value as a precedent. I think the NOFOLLOW attribute value was created to solve a problem (spammers linking to their Web sites in blogs), so somebody could equally well define a tag for URLs that should not be treated as links (or, as you pointed out, they could add the REL attribute to other tags).Ger wrote:And how do I add a nofollow to this:Pony99CA wrote:It wouldn't be "changing" anything. It would just be picking URLs out of a stream of text.
(...)
And, by the way, there already is at least one attribute designed to affect the behavior of search engines -- the nofollow attribute (which is used on A tags). That's very similar to what your NOLINK tag would do -- essentially tell browsers to ignore URLs.Code: Select all
<p>Hey dudes! Please don't check out this url, it's dangerous! http://www.evil-warez-site.com I tell you, don't go there!!</p>
Since it has no (anchor-)tag, it can't have a (rel-)attribute. No rel-attribute can be set on a <p> or a <span>, so as an author you don't have any possibility to tell the search engine not to go there. Also, nofollow doens't always work as one might expect.
That's kind of my point. Unless that tool has intimate knowledge of every (major) search engine, they're just guessing how the search engines work.Ger wrote:Well, I don't know how to prove it. It's quite a respected tool as far as I know, and delivers expected results. You'd have to contact Google, MSN and Yahoo to be sure...Pony99CA wrote:Also, you're assuming that the tool really works exactly like a search engine does. That remains to be proven.
You want that, but can you expect that everyone wants that? I guess not, because if that's the case, why wouldn't you want the human visitors to be able to click it? That difference between writing for search engines and for humans holds no logic for me.Pony99CA wrote:I'm sure there are some case you could come up with (like the examples we're using here), but for the most part, if something is a URL, I'd want search engines to find it.
Why would that be harder? You don't even need a regex for it.Pony99CA wrote:Except, by your own admission, they do interpret the HTML. Finding URLs in my theoretical search engine is easier than in yours because yours has to look for A tags, then find the HREF attribute (or, for image search engines, the IMG tag and the SRC attribute).
That would mean redefining HTML. Could be a solution for the future, but we're talking about present day now.Pony99CA wrote:I understand the limitations. I was just saying that somebody could define a NOLINK tag for such URLs and was citing the NOFOLLOW attribute value as a precedent. I think the NOFOLLOW attribute value was created to solve a problem (spammers linking to their Web sites in blogs), so somebody could equally well define a tag for URLs that should not be treated as links (or, as you pointed out, they could add the REL attribute to other tags).
You would, but you are probably an expierenced webbuilder. 99% of the community-visitors and posters are not. You can't expect them to mung it.Pony99CA wrote: As for your example, I would never specify a full URL to a known dangerous site. I would say "Don't go to evil-warez.com!" And, if I were worried that even a domain name might somehow be indexed, I'd mung that, too.
True, but then again: you could be questioning everything. I know a couple of more SEO-browsers and they all serve similar (if not equal) results. Just google for SEO browser and use a few.Pony99CA wrote:That's kind of my point. Unless that tool has intimate knowledge of every (major) search engine, they're just guessing how the search engines work.
And the point I try to make is that it has no use.Pony99CA wrote:Anyway, this is dragging on. The point I was trying to make to the OP was that you should include http:// in URLs. It doesn't hurt and it might help.