CakePHP HTMLPurifier Component
I needed to use HMTLPurifier on my CakePHP application. So I just saved it under the vendors folder inside the application folder. This how the directory strucure looked like.
==
+ myApplication |-----+ config/ |-----+ controllers/ |-----+ models/ |-----+ plugins/ |-----+ tmp/ |-----+ vendors/ | |----- HTMLPurifier/ | |----- HTMLPurifier.php | |-----+ views/ |-----+ webroot/ |-----+ .htaccess |-----+ index.php
==
But before including the vendor component, I needed to add the to HTMLPurifier so Cake can find it. So, I added something to HTMLPurifier.php, somewhere before the require_once() statements:
==
// START edit -dchx //Add the path to the vendors folder where HTMLPurifier is located if (function_exists('ini_set')) { ini_set('include_path', ini_get('include_path') . PATH_SEPARATOR . dirname(__FILE__)); } // END edit -dchx require_once 'HTMLPurifier/ConfigDef.php'; require_once 'HTMLPurifier/Config.php'; require_once 'HTMLPurifier/Lexer.php'; require_once 'HTMLPurifier/HTMLDefinition.php'; require_once 'HTMLPurifier/Generator.php'; require_once 'HTMLPurifier/Strategy/Core.php'; require_once 'HTMLPurifier/Encoder.php';
==
Now I’m all set. I just need to to include the component using the CakePHP function uses().
*UPDATE*: Some little update on this. When using HTMLPurifier inside CakePHP (or even in other apps), make sure that the character encoding of the output page is UTF-8. I encountered this little bug where a paragraph tag (p) containing only a non-breaking space was converted into another character. But I checked on my html page and the meta tag Content-type was set to UTF-8 (and of course I’m using XHTML 1.0 Transitional DocType). I fixed it by sending a *content-type header*. In CakePHP, you can do this inside the __beforeFilter()__ function of your controller.
==
class MyController extends AppController { //... the usual function beforeFilter() { header('Content-type:text/html;charset=UTF-8'); } }
==
HTML Purifier has the ability to use different character encodings, rather, UTF-8 is the default. I recommend that you not blindly set the application to UTF-8 and make sure that it is, indeed, UTF-8 aware. For instance, if you have ISO-8895-1 characters already, they probably will get mangled.
[…] Rediscoverer » Blog Archive » CakePHP HTMLPurifier Component (tags: cake) […]
You might be interested in htmLawed, a 45-kb, single-file, non-OOP, GPLv3-licensed script with low basal memory usage (0.5 MB) to filter illegal/disallowed HTML (tags, attributes, etc.) from user input. It also reduces XSS vulnerabilities, balances tags, etc.
Visit the htmLawed website.