CakePHP HTMLPurifier Component
I needed to use HMTLPurifier on my CakePHP application. So I just saved it under the vendors folder inside the application folder. This how the directory strucure looked like.
==
+ myApplication
|-----+ config/
|-----+ controllers/
|-----+ models/
|-----+ plugins/
|-----+ tmp/
|-----+ vendors/
| |----- HTMLPurifier/
| |----- HTMLPurifier.php
|
|-----+ views/
|-----+ webroot/
|-----+ .htaccess
|-----+ index.php
==
But before including the vendor component, I needed to add the to HTMLPurifier so Cake can find it. So, I added something to HTMLPurifier.php, somewhere before the require_once() statements:
==
// START edit -dchx
//Add the path to the vendors folder where HTMLPurifier is located
if (function_exists('ini_set')) {
ini_set('include_path', ini_get('include_path') . PATH_SEPARATOR . dirname(__FILE__));
}
// END edit -dchx
require_once 'HTMLPurifier/ConfigDef.php';
require_once 'HTMLPurifier/Config.php';
require_once 'HTMLPurifier/Lexer.php';
require_once 'HTMLPurifier/HTMLDefinition.php';
require_once 'HTMLPurifier/Generator.php';
require_once 'HTMLPurifier/Strategy/Core.php';
require_once 'HTMLPurifier/Encoder.php';
==
Now I’m all set. I just need to to include the component using the CakePHP function uses().
*UPDATE*: Some little update on this. When using HTMLPurifier inside CakePHP (or even in other apps), make sure that the character encoding of the output page is UTF-8. I encountered this little bug where a paragraph tag (p) containing only a non-breaking space was converted into another character. But I checked on my html page and the meta tag Content-type was set to UTF-8 (and of course I’m using XHTML 1.0 Transitional DocType). I fixed it by sending a *content-type header*. In CakePHP, you can do this inside the __beforeFilter()__ function of your controller.
==
class MyController extends AppController {
//... the usual
function beforeFilter()
{
header('Content-type:text/html;charset=UTF-8');
}
}
==


HTML Purifier has the ability to use different character encodings, rather, UTF-8 is the default. I recommend that you not blindly set the application to UTF-8 and make sure that it is, indeed, UTF-8 aware. For instance, if you have ISO-8895-1 characters already, they probably will get mangled.
[…] Rediscoverer » Blog Archive » CakePHP HTMLPurifier Component (tags: cake) […]
You might be interested in htmLawed, a 45-kb, single-file, non-OOP, GPLv3-licensed script with low basal memory usage (0.5 MB) to filter illegal/disallowed HTML (tags, attributes, etc.) from user input. It also reduces XSS vulnerabilities, balances tags, etc.
Visit the htmLawed website.