CakePHP HTMLPurifier Component

I needed to use HMTLPurifier on my CakePHP application. So I just saved it under the vendors folder inside the application folder. This how the directory strucure looked like.

+ myApplication
     |-----+ config/
     |-----+ controllers/
     |-----+ models/
     |-----+ plugins/
     |-----+ tmp/
     |-----+ vendors/
     |       |----- HTMLPurifier/
     |       |----- HTMLPurifier.php
     |-----+ views/
     |-----+ webroot/
     |-----+ .htaccess
     |-----+ index.php

But before including the vendor component, I needed to add the to HTMLPurifier so Cake can find it. So, I added something to HTMLPurifier.php, somewhere before the require_once() statements:

// START edit -dchx
//Add the path to the vendors folder where HTMLPurifier is located
if (function_exists('ini_set')) {
ini_set('include_path', ini_get('include_path') . PATH_SEPARATOR . dirname(__FILE__));

// END edit -dchx

require_once 'HTMLPurifier/ConfigDef.php';
require_once 'HTMLPurifier/Config.php';
require_once 'HTMLPurifier/Lexer.php';
require_once 'HTMLPurifier/HTMLDefinition.php';
require_once 'HTMLPurifier/Generator.php';
require_once 'HTMLPurifier/Strategy/Core.php';
require_once 'HTMLPurifier/Encoder.php';

Now I'm all set. I just need to to include the component using the CakePHP function uses().

UPDATE: Some little update on this. When using HTMLPurifier inside CakePHP (or even in other apps), make sure that the character encoding of the output page is UTF-8. I encountered this little bug where a paragraph tag (p) containing only a non-breaking space was converted into another character. But I checked on my html page and the meta tag Content-type was set to UTF-8 (and of course I'm using XHTML 1.0 Transitional DocType). I fixed it by sending a content-type header. In CakePHP, you can do this inside the beforeFilter() function of your controller.

class MyController extends AppController {

//... the usual

function beforeFilter()

Comments: 3

Leave a reply »


HTML Purifier has the ability to use different character encodings, rather, UTF-8 is the default. I recommend that you not blindly set the application to UTF-8 and make sure that it is, indeed, UTF-8 aware. For instance, if you have ISO-8895-1 characters already, they probably will get mangled.


[...] Rediscoverer » Blog Archive » CakePHP HTMLPurifier Component (tags: cake) [...]


You might be interested in htmLawed, a 45-kb, single-file, non-OOP, GPLv3-licensed script with low basal memory usage (0.5 MB) to filter illegal/disallowed HTML (tags, attributes, etc.) from user input. It also reduces XSS vulnerabilities, balances tags, etc.

Visit the htmLawed website.


Leave a Reply

(will not be published)