Friday, December 7, 2012

Detecting and removing web addresses from user entered text

Problem
I'm working on a website where users enter product descriptions. Some people put their own website in the description. This is a problem because it may potentially lead to customers buying directly from their website, thus the website loses revenue.

Solution

//ex: http://www.google.com
     $txt = preg_replace('#https?://www\.[a-z\.0-9]+/?[\w|/.]*#i', '', $txt);
//ex: www.google.com
   $txt = preg_replace('#www?\.[a-z\.0-9]+/?[\w|/.]*#i', '', $txt);
//ex: http://google.com
    $txt = preg_replace('#https?://[a-z\.0-9]+/?[\w|/.]*#i', '', $txt);

the part at the end, /?\S*, gets everything after an initial slash and stops after whitespace. For example: www.google.com/hello/hi/bye

References
1. http://regexpal.com/
2. http://stackoverflow.com/questions/13756938/remove-all-urls-from-text-with-php 

No comments:

Post a Comment

There was an error in this gadget