Highly interesting book (in the making) by Simon Pieters, on how HTML parsers work:
The HTML parser is a piece of software that processes HTML markup and produces an in-memory tree representation (known as the DOM).
The HTML parser has many strange behaviors. This book will highlight the ins and outs of the HTML parser, and contains almost-impossible quizzes.
Not for beginning audiences!