Class XhtmlBaseParser

    • Field Detail

      • scriptBlock

        private boolean scriptBlock
        True if a <script></script> or <style></style> block is read. CDATA sections within are handled as rawText.
      • isLink

        private boolean isLink
        Used to distinguish <a href=""> from <a name="">.
      • isAnchor

        private boolean isAnchor
        Used to distinguish <a href=""> from <a name="">.
      • orderedListDepth

        private int orderedListDepth
        Used for nested lists.
      • sectionLevel

        private int sectionLevel
        Counts section level.
      • inVerbatim

        private boolean inVerbatim
        Verbatim flag, true whenever we are inside a <pre> tag.
      • inFigure

        private boolean inFigure
        Used to recognize the case of img inside figure.
      • hasDefinitionListItem

        boolean hasDefinitionListItem
        Used to wrap the definedTerm with its definition, even when one is omitted
      • warnMessages

        private java.util.Map<java.lang.String,​java.util.Set<java.lang.String>> warnMessages
        Map of warn messages with a String as key to describe the error type and a Set as value. Using to reduce warn messages.
    • Constructor Detail

      • XhtmlBaseParser

        public XhtmlBaseParser()
    • Method Detail

      • parse

        public void parse​(java.io.Reader source,
                          Sink sink)
                   throws ParseException
        Parses the given source model and emits Doxia events into the given sink.
        Specified by:
        parse in interface Parser
        Overrides:
        parse in class AbstractXmlParser
        Parameters:
        source - not null reader that provides the source document. You could use newReader methods from ReaderFactory.
        sink - A sink that consumes the Doxia events.
        Throws:
        ParseException - if the model could not be parsed.
      • initXmlParser

        protected void initXmlParser​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser)
                              throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Initializes the parser with custom entities or other options. Adds all XHTML (HTML 4.0) entities to the parser so that they can be recognized and resolved without additional DTD.
        Overrides:
        initXmlParser in class AbstractXmlParser
        Parameters:
        parser - A parser, not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem initializing the parser
      • baseStartTag

        protected boolean baseStartTag​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                       Sink sink)

        Goes through a common list of possible html start tags. These include only tags that can go into the body of a xhtml document and so should be re-usable by different xhtml-based parsers.

        The currently handled tags are:

        <h2>, <h3>, <h4>, <h5>, <h6>, <p>, <pre>, <ul>, <ol>, <li>, <dl>, <dt>, <dd>, <b>, <strong>, <i>, <em>, <code>, <samp>, <tt>, <a>, <table>, <tr>, <th>, <td>, <caption>, <br/>, <hr/>, <img/>.

        Parameters:
        parser - A parser.
        sink - the sink to receive the events.
        Returns:
        True if the event has been handled by this method, i.e. the tag was recognized, false otherwise.
      • baseEndTag

        protected boolean baseEndTag​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                     Sink sink)

        Goes through a common list of possible html end tags. These should be re-usable by different xhtml-based parsers. The tags handled here are the same as for baseStartTag(XmlPullParser,Sink), except for the empty elements (<br/>, <hr/>, <img/>).

        Parameters:
        parser - A parser.
        sink - the sink to receive the events.
        Returns:
        True if the event has been handled by this method, false otherwise.
      • handleStartTag

        protected void handleStartTag​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                      Sink sink)
                               throws org.codehaus.plexus.util.xml.pull.XmlPullParserException,
                                      MacroExecutionException
        Goes through the possible start tags. Just calls baseStartTag(XmlPullParser,Sink), this should be overridden by implementing parsers to include additional tags.
        Specified by:
        handleStartTag in class AbstractXmlParser
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
        MacroExecutionException - if there's a problem executing a macro
      • handleEndTag

        protected void handleEndTag​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                    Sink sink)
                             throws org.codehaus.plexus.util.xml.pull.XmlPullParserException,
                                    MacroExecutionException
        Goes through the possible end tags. Just calls baseEndTag(XmlPullParser,Sink), this should be overridden by implementing parsers to include additional tags.
        Specified by:
        handleEndTag in class AbstractXmlParser
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
        MacroExecutionException - if there's a problem executing a macro
      • handleText

        protected void handleText​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                  Sink sink)
                           throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handles text events.

        This is a default implementation, if the parser points to a non-empty text element, it is emitted as a text event into the specified sink.

        Overrides:
        handleText in class AbstractXmlParser
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events. Not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      • handleComment

        protected void handleComment​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                     Sink sink)
                              throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handles comments.

        This is a default implementation, all data are emitted as comment events into the specified sink.

        Overrides:
        handleComment in class AbstractXmlParser
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events. Not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      • handleCdsect

        protected void handleCdsect​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                    Sink sink)
                             throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handles CDATA sections.

        This is a default implementation, all data are emitted as text events into the specified sink.

        Overrides:
        handleCdsect in class AbstractXmlParser
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events. Not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      • consecutiveSections

        protected void consecutiveSections​(int newLevel,
                                           Sink sink)
        Make sure sections are nested consecutively.

        HTML doesn't have any sections, only sectionTitles (<h2> etc), that means we have to open close any sections that are missing in between.

        For instance, if the following sequence is parsed:

         <h3></h3>
         <h6></h6>
         

        we have to insert two section starts before we open the <h6>. In the following sequence

         <h6></h6>
         <h3></h3>
         

        we have to close two sections before we open the <h3>.

        The current level is set to newLevel afterwards.

        Parameters:
        newLevel - the new section level, all upper levels have to be closed.
        sink - the sink to receive the events.
      • closeOpenSections

        private void closeOpenSections​(int newLevel,
                                       Sink sink)
        Close open sections.
        Parameters:
        newLevel - the new section level, all upper levels have to be closed.
        sink - the sink to receive the events.
      • openMissingSections

        private void openMissingSections​(int newLevel,
                                         Sink sink)
        Open missing sections.
        Parameters:
        newLevel - the new section level, all lower levels have to be opened.
        sink - the sink to receive the events.
      • getSectionLevel

        protected int getSectionLevel()
        Return the current section level.
        Returns:
        the current section level.
      • setSectionLevel

        protected void setSectionLevel​(int newLevel)
        Set the current section level.
        Parameters:
        newLevel - the new section level.
      • verbatim_

        protected void verbatim_()
        Stop verbatim mode.
      • verbatim

        protected void verbatim()
        Start verbatim mode.
      • isVerbatim

        protected boolean isVerbatim()
        Checks if we are currently inside a <pre> tag.
        Returns:
        true if we are currently in verbatim mode.
      • isScriptBlock

        protected boolean isScriptBlock()
        Checks if we are currently inside a <script> tag.
        Returns:
        true if we are currently inside <script> tags.
        Since:
        1.1.1.
      • validAnchor

        protected java.lang.String validAnchor​(java.lang.String id)
        Checks if the given id is a valid Doxia id and if not, returns a transformed one.
        Parameters:
        id - The id to validate.
        Returns:
        A transformed id or the original id if it was already valid.
        See Also:
        DoxiaUtils.encodeId(String)
      • handleAEnd

        private void handleAEnd​(Sink sink)
      • handleAStart

        private void handleAStart​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                  Sink sink,
                                  SinkEventAttributeSet attribs)
      • handleDivStart

        private boolean handleDivStart​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                       SinkEventAttributeSet attribs,
                                       Sink sink)
      • handleFigureCaptionEnd

        private void handleFigureCaptionEnd​(Sink sink)
      • handleImgStart

        private void handleImgStart​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                    Sink sink,
                                    SinkEventAttributeSet attribs)
      • handleListItemEnd

        private void handleListItemEnd​(Sink sink)
      • handleOLStart

        private void handleOLStart​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                   Sink sink,
                                   SinkEventAttributeSet attribs)
      • handleTableStart

        private void handleTableStart​(Sink sink,
                                      SinkEventAttributeSet attribs,
                                      org.codehaus.plexus.util.xml.pull.XmlPullParser parser)
      • logMessage

        private void logMessage​(java.lang.String key,
                                java.lang.String msg)
        If debug mode is enabled, log the msg as is, otherwise add unique msg in warnMessages.
        Parameters:
        key - not null
        msg - not null
        Since:
        1.1.1
        See Also:
        parse(Reader, Sink)
      • logWarnings

        private void logWarnings()
        Since:
        1.1.1