Michael Girouard has a post on his blog about something that’s becoming more and more wide-spread in the PHP community (thankfully) - filtering input from users and escaping the output to ensure the safety of your application. Previously he showed how, using an interceptor method in PHP5, you could build “collections of data”. He uses the same sort of method here, appling custom filters to the data based on the output call. Code is included for both the filtering interface and two example filters - one for SQL and the other for HTML.

Note: Before getting into this, it may be easier to just download the file, run the code, then read this post.

Thanks to the efforts of Chris Shiflett and other PHP security experts, Filter Input/Escape Output (FIEO) is now a commonplace technique to increase web application security.

The idea itself is simple. When data comes into your application, it must be filtered prior to it actually being used for any reason. This means all data. Form values, URL values, and yes, even the values in the forever useful $_SERVER superglobal. If you expect an integer, cast it as such. Maybe you are expecting plain text, anything that looks like HTML should obviously be removed. Whitelists also apply here, too.

Before leaving your application, data should be properly escaped with the specific output medium in mind. For example, if you are passing it off to a MySQL database, make sure quotes (or any other tricky character) is prefixed with a slash. Another example would be if you are rendering data out to a web page, htmlentities() (or something similar) should be used to transform html-specific characters into their proper entities.

In my last post I showed how PHP 5’s interceptor methods can be used to build collections of data. That same technique can be applied to FIEO. This is what it looks like implemented:

<?php $sampleData = array(     ‘title' => ‘Someone & Someone Else',     ‘content' => ‘Someone said to someone else, "Hello, World!"' ); $sampleContexts = array(     ‘html' => ‘MikeG_HtmlFilter',     'sql' => ‘MikeG_SqlFilter' ); $Filter = new Panda_DataFilter($sampleData, $sampleContexts); $Filter->html->setCharset(‘ISO-8859-1′); ?> <h1><?php echo $Filter->html->title ?></h1> <p><?php echo $Filter->html->content ?></p> <p><?php echo $Filter->sql->content ?></p>

And the output is:

<h1>Someone &amp; Someone Else</h1> <p>Someone said to someone else, &quot;Hello, World!&quot;</p> <p>Someone said to someone else, \"Hello, World!\"</p>

Pretty neat huh? I think so too.

The idea is simple: Instances of Panda_DataFilter contains the tainted data. You load contexts into Panda_DataFilter and they govern the input and output of the tainted data. In the example above, I output data using the html and sql contexts. Whenever you set a variable inside of one of the loaded contexts, __set will intercept the assignment operation and run it’s specific input filtering routine. Likewise, whenver you read a value from one of the contexts, __get intercepts the retrieval operation and runs its specific output escaping routine.

Here’s the actual code:

<?php final class Panda_DataFilter {     protected $data = array();     public function __construct(array $data, array $contexts)     {         $this->data = $data;         foreach ($contexts as $contextName => $context) {             $this->{$contextName} = new $context($this->data);         }     } } abstract class Panda_DataFilter_Context {     protected $data = array();     public function __construct(array $data)     {         foreach ($data as $key => $value) {             $this->{$key} = $value;         }     }     abstract public function __get($name);     abstract public function __set($name, $value); } ?>

And the html and sql contexts:

<?php class MikeG_HtmlFilter extends Panda_DataFilter_Context {     private $charset = ‘UTF-8′;     private $quotes = ENT_COMPAT;     public function __get($name)     {         return htmlentities($this->data[$name], $this->quotes, $this->charset);     }     public function __set($name, $value)     {         $this->data[$name] = $value;     }     public function setCharset($charset)     {         $this->charset = $charset;     }     public function setQuoteStyle($quoteStyle)     {         $this->quotes = $quoteStyle;     } } class MikeG_SqlFilter extends Panda_DataFilter_Context {     public function __get($name)     {         return mysql_escape_string($this->data[$name]);     }     public function __set($name, $value)     {         $this->data[$name] = $value;     } } ?>

Tags: .NET, Build, Code, Content, HTML, MySQL, Perl, PHP, PHP5, Security, SQL