<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>ThinkRobot &#187; urls</title>
	<atom:link href="http://think-robot.com/tag/urls/feed/" rel="self" type="application/rss+xml" />
	<link>http://think-robot.com</link>
	<description>Design &#38; Development Blog</description>
	<pubDate>Sun, 05 Sep 2010 23:09:50 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Regex for Autolinking URLs</title>
		<link>http://think-robot.com/2009/04/regex-for-autolinking-urls/</link>
		<comments>http://think-robot.com/2009/04/regex-for-autolinking-urls/#comments</comments>
		<pubDate>Mon, 13 Apr 2009 12:04:42 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
		
		<category><![CDATA[Code]]></category>

		<category><![CDATA[lookaround]]></category>

		<category><![CDATA[php]]></category>

		<category><![CDATA[regex]]></category>

		<category><![CDATA[tip]]></category>

		<category><![CDATA[urls]]></category>

		<guid isPermaLink="false">http://think-robot.com/?p=145</guid>
		<description><![CDATA[For a recent project I was wanting to perform an exceptionally common task. Converting things that look like URLs in the text into clickable links. However wherever I&#8217;ve seen this implemented before I&#8217;ve always encountered the same annoying problem, namely the links break when the user types a URL and adds some punctuation at the [...]]]></description>
			<content:encoded><![CDATA[<p>For a recent project I was wanting to perform an exceptionally common task. Converting things that look like URLs in the text into clickable links. However wherever I&#8217;ve seen this implemented before I&#8217;ve always encountered the same annoying problem, namely the links break when the user types a URL and adds some punctuation at the end since the punctuation gets captured as part of the link.</p>
<p><span id="more-145"></span></p>
<p><strong>For example:</strong></p>
<p><em>If you like the Seattle Mariners and enjoy learning about baseball statistics you should really visit </em><strong><em>http://ussmariner.com.</em></strong></p>
<p>It struck me that this should be something that would be solvable with <a href="http://www.regular-expressions.info/lookaround.html">Lookahead</a>, that is we could check whether the punctuation character was followed directly by a character that could be part of the URL.</p>
<p>The following regex attempts to do this. It will allow . , ! ? ; and : to appear at the end of a URL without including them in the match.</p>
<pre><code><code>$str = preg_replace('%(https?://(([^ .,!?;:"\'\(\)\r\n\t])|((\.|,|!|\?|;|:)(?=[_a-z0-9])))+)%i', '&lt;a href="\\1"&gt;\\1&lt;/a&gt;', $str);</code></code></pre>
<p>It works using an <a href="http://www.regular-expressions.info/alternation.html" target="_blank">alternation</a>. The first side of the alternation checks for any character that isn&#8217;t in our punctuation list (and certain other characters such as quote marks and spaces that we simply don&#8217;t want to allow in a URL). The other half checks the characters in the punctuation list and uses lookahead to ensure that they&#8217;re followed by another character.</p>
<p>I&#8217;m sure there&#8217;s room for improvement here, but I&#8217;ve been pretty please with how this has worked so far.</p>
<div id="crp_related"><h2>Related Articles:</h2><ul><li><a href="http://think-robot.com/2008/11/nested-sortable-using-jtree-clickable-links/" rel="bookmark">Nested sortable using jTree - clickable links</a></li><li><a href="http://think-robot.com/2009/02/how-to-use-the-strong-ownership-list/" rel="bookmark">How To Use the Strong Ownership List</a></li><li><a href="http://think-robot.com/2009/02/firefox-ignores-tabs-but-not-spaces-in-a-pre-tag/" rel="bookmark">Firefox ignores tabs but not spaces in a pre tag</a></li><li><a href="http://think-robot.com/2008/12/strong-ownership-list-approach/" rel="bookmark">Strong ownership list approach</a></li><li><a href="http://think-robot.com/2009/12/zend_date-time-part-and-gmt/" rel="bookmark">Zend_Date time part and GMT</a></li></ul></div>]]></content:encoded>
			<wfw:commentRss>http://think-robot.com/2009/04/regex-for-autolinking-urls/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
