Quick and Dirty Content Filtering with PHP
The PHP language includes lots of helpful functions for easily filtering, cleaning and manipulating content, all of which are excellent tools in the hands of a skilled developer. A solid knowledge of these filtering tools will help you achieve enhanced security and functionality in your projects.
Today, I’m going to give you a crash course on PHP’s basic filtering functions so that by the end of the tutorial you’ll be able to easily escape data, strip tags, remove words and more.
First up is string escaping, implemented with what is probably the most basic of PHP’s filtering functions – addslashes(). This function escapes single quotes, double quotes and backslashes for you, allowing you to (more) safely accept form data, etc. Say for example you have an input field (named ‘title’)and someone types "Suzie's Blog". Those double and single quotes can cause some problems, but not for long:
$title = addslashes($_POST['title']); //$title is now safe to use! echo $title; //outputs \"Suzie\'s Blog\"
As you might guess, addslashes() has an inverse function: stripslashes(). On a side note, in case you ever find yourself developing a custom WordPress plugin, stripslashes() is incredibly useful for removing the slashes that WordPress adds to saved options values.
So all this is pretty handy, but for MySQL queries it’s smart to use something a bit more powerful. Up next-
Escaping MySQL Queries
MySQL injection attacks are a very real concern, making data sanitation a must for any web developer. Thankfully, mysql_real_escape_string() provides a way to easily and safely escape dangerous characters from a MySQL query before executing it. This is perhaps the most often used PHP sanitation function. Here’s an example:
$title = $_POST['title']; //$title could be anything, including an injection $title = mysql_real_escape_string($title); //It's now safe: mysql_query('INSERT INTO blogs(title) VALUES($title)');
This function is one that anyone working with PHP and MySQL will use quite often – it’s elegant and potent (it even works on binary data).
Encoding HTML Entities
Htmlentities() is another fun and useful function. It will take automatically encode character entities like < (&) and “ ("). It's most useful for taking non-malicious user input that simply has special characters in it and formatting them for display. Here’s how you might use it, supposing someone submitted a title called Me & My Dog, "Buddyquot; > An Essay:
$title = $_POST['title']; $title = htmlentities($title); //encode the string echo $title; //outputs a correctly encoded title
This function isn’t designed to be a security filter (for filtering malicious data), it’s simply a convenient way to make sure user data is encoded correctly. It also has an inverse function, html_entity_decode().
Sometimes you don’t want to just encode html tags, you want to strip them out completely. PHP’s strip_tags() is the perfect solution, doing just what the function name implies. Say someone sends in malicious data:
$title = $_POST['title']; //$title's value = "Happy <script src='http://evilsite.com/hack.js'></script> Birthday!" $title = strip_tags($title); //remove dangerous tags echo $title; //outputs "Happy Birthday"
That’s it – all tags are removed just like that. A useful function indeed. But what about if you want to strip some tags (like script, img) but leave some (strong, a, p). Read on!
Advanced Data Filtering
These functions that we’ve just been through will work the majority of the time, but there will be situations where they aren’t quite versatile or powerful enough. Thanfully, we have regular expressions. Using some regexp patterns and the powerful PHP function preg_replace(), we can filter, strip, replace, or remove pretty much anything we want without much trouble at all. Believe me, this thing is powerful.
You can check out more about preg_replace() here, but the basic idea is that it accepts two arguments – what to look for (called a needle) and what to look in (called a haystack). The needle and haystack can be strings or arrays (if you have multiple phrases/words/patterns to search for).
Here’s an example of how you’d set up preg_replace to strip all script tags and leave other tags:
You could also set it up to strip out a series of forbidden words (profanity, spam words, etc.) like this:
$forbidden = array('forbidden1','forbidden2','forbiddenN'); //these words are the ones that will be stripped out $fixed_content = preg_replace($forbidden, '', $_POST['comment_text']); //goodbye, forbidden words
As you can see, it’s actually surprisingly easy to manipulate data with PHP and prepare it for use. Nothing stands in your way!
Find Out More
Before you go, here are some more great tutorials on PHP filtering, validation and sanitation:
- Create a Simple Input Sanitation Function with PHP
- Sanitize and Validate Data with PHP Filters
- Data Filtering with PHP
- Getting Started with Regular Expressions
Thanks for reading, and please share your feedback in the comments!