Lessons learned:
Think four times before doing stream-based XML processing, even though it appears to be more efficient than tree-based.
But if you have to do stream-based processing, make sure to use robust, fairly scaleable tools like XML::Templates, not sgmlspl. Of course it cannot be as pleasant as tree-based XML processing, but examine db2x_manxml, db2x_texixml and docbook2manxml.
Don't use XML::DOM directly for stylesheets. At least look at some of the XPath modules out there. Ideally, use a real stylesheet language like XSLT. A C-based implementation of XSLT is faster than any Perl hack you can come up with.
XSLT extensions sound bad, but it is not difficult to make most portions of a stylesheet portable between all XSLT processors.
Perl is not as good at XML as it's hyped to be. Too many !@#$% characters when using objects in Perl. Unicode support not thorough enough for some parts of DocBook processing. SAX not well-maintained.
Don't be afraid to use XML intermediate formats for converting to other markup languages. The rules for these formats are made for human consumption, not on purely logical considerations. It is difficult for XML tools to write “perfect” output in these formats: standard stylesheets (XSLT, DSSSL) cannot be used for converting to other markup languages, and embedding the markup rules into the conversion tool increases its complexity to unmanageable proportions.
You may offer to make a separate class that hides all this complexity from the rest of the conversion program, but the result will be the same as an XML intermediate format except for the plain text representation. However, being able to view the XML intermediate format in plain text makes the whole thing easier to debug.
Design the XML intermediate format to be easy to use from the standpoint of the conversion tool, and similar to how XML document types work in general. e.g. abstract the paragraphs of a document, rather than their paragraph breaks (the latter is typical of traditional markup languages, but not of XML).
Internationalize as soon as possible. That is much easier than adding it in later.
Same advice for build system.
Writing good documentation takes skill. This manual has has been revised substantially at least four times [2] , with the author consciously trying to condense information each time.