Parsing user input, creative data, and HTML is hard. I'm sure you've whipped up some pretty creative regular expressions thatalmost work most of the time when dealing with these things, but it's often easy for attackers, or even non-maliciousend-users to trip up your carefully crafted regex and make your pages look horrible (or worse). In this talk, we'll discuss a fewpractical examples of how taking a token-based approach to parsing code and markup can save you plenty of time in the long run, andmore importantly, will actually prevent your replacements from failing.
By clicking this button you are declaring that you are the speaker responsible for it and a claim request will be sent to the administrator of the event.
If the claim is approved you will be able to edit the information for this talk.
Well presented and good information, but the day-to-day applicability of parsing PHP tokens was harder to see. Knowing how to build parsers and tokenizers for any language (especially one's own DSL's) would be a good extension to this talk.
I think you had the same problem in this session as you had in the "Stupid Browser Tricks" ... first session after lunch is about as bad as the first session in the morning! I was a bit confused (might have been related to food induced sleepiness) but I'll do more digging as I can find time.
This talk mostly dealt with the PHP Tokenizer extension which seems to be primarily useful for refactoring or other source code operations rather than user input processing which the session description indicated it would be about.
The most useful gem I saw in this talk was the brief mention of HTMLPurifier, which I think will come in handy.
Know of an event happening? Let us know! We love to get the word out about events the community would be interested in and you can help us spread the word!
30.Sep.2009 at 14:01 by Benjamin Young
Well presented and good information, but the day-to-day applicability of parsing PHP tokens was harder to see. Knowing how to build parsers and tokenizers for any language (especially one's own DSL's) would be a good extension to this talk.