HTML Purifier

I'm currently doing an article submission application. Wanting to give the users more power over their articles, I've planned on using a WYSIWYG text editor for the article submission form. Using that kind of editor, users can format their articles easily, even if they have little experience with html. I tried using TinyMCE, an Open-Source WYSIWYG editor that runs using Javascript and I'm quite happy with the results. It provided some "MS Word"-like interface. It also has some mechanism that filters disallowed html tags like and other potentially dangerous tags that could make the application vulnerable to XSS attacks.

But what if javascript was disabled by the user? Expecting that the input would be processed by TinyMCE, the application won't be doing some input checking. If javascript is disabled, TinyMCE won't be able to do its job. The disallowed html code will be freely included and the application will be left open to attacks. PHP's Built-in input filtering functions isn't much of use here, since all they do is strip the tags or convert special characters like < and > into their equivalent entities and will no longer be recognized as mark-up. I wanted some PHP functioality that can do the filtering for me.

So I consulted sir Google and after searching some possible solutions, I found HTML Purifier and gave it a test run. Yep, it worked. I tried it with TinyMCE on, and the html fomartting was still intact after purification. Now I tried it with TinyMCE on, but then disabled javascript and inserted some not-so-malicious code and the purifier caught it. Nice! If I have time, I'll test it further. I just need to make the application fully functional before doing detailed testing and debugging.


Comments: 2

Leave a reply »


Hi, I'm glad to see that you find my library and liked it! If, during your usage, you have any recommendations or bug reports, feel free to drop me a line (user response has been lukewarm at best).


Of course! If I find something that would help improve your library, I'll inform you. I'm sure web developers will find their way to your library. Thanks for coming up with this great tool. 😀


Leave a Reply

(will not be published)