javax.swing.text.html.parser
public class Parser extends Object implements DTDConstants
A simple error-tolerant HTML parser that uses a DTD document to access data on the possible tokens, arguments and syntax.
The parser reads an HTML content from a Reader and calls various notifying methods (which should be overridden in a subclass) when tags or data are encountered.
Some HTML elements need no opening or closing tags. The task of this parser is to invoke the tag handling methods also when the tags are not explicitly specified and must be supposed using information, stored in the DTD. For example, parsing the document
<table><tr><td>a<td>b<td>c</tr>
will invoke exactly the handling methods exactly in the same order
(and with the same parameters) as if parsing the document:
<html><head></head><body><table><
tbody><tr><td>a</td><td>b
</td><td>c</td></tr><
/tbody></table></body></html>
Modifier and Type | Field and Description |
---|---|
protected DTD |
dtd
The document template description that will be used to parse the documents.
|
protected boolean |
strict
The value of this field determines whether or not the Parser will be
strict in enforcing SGML compatibility.
|
ANY, CDATA, CONREF, CURRENT, DEFAULT, EMPTY, ENDTAG, ENTITIES, ENTITY, FIXED, GENERAL, ID, IDREF, IDREFS, IMPLIED, MD, MODEL, MS, NAME, NAMES, NMTOKEN, NMTOKENS, NOTATION, NUMBER, NUMBERS, NUTOKEN, NUTOKENS, PARAMETER, PI, PUBLIC, RCDATA, REQUIRED, SDATA, STARTTAG, SYSTEM
Constructor and Description |
---|
Parser(DTD a_dtd)
Creates a new parser that uses the given DTD to access data on the
possible tokens, arguments and syntax.
|
Modifier and Type | Method and Description |
---|---|
protected void |
endTag(boolean omitted)
The method is called when the HTML end (closing) tag is found or if
the parser concludes that the one should be present in the
current position.
|
protected void |
error(String msg)
Invokes the error handler.
|
protected void |
error(String msg,
String invalid)
Invokes the error handler.
|
protected void |
error(String parm1,
String parm2,
String parm3)
Invokes the error handler.
|
protected void |
error(String parm1,
String parm2,
String parm3,
String parm4)
Invokes the error handler.
|
protected void |
flushAttributes()
In this implementation, this is never called and returns without action.
|
protected SimpleAttributeSet |
getAttributes()
Get the attributes of the current tag.
|
protected int |
getCurrentLine()
Get the number of the document line being parsed.
|
protected int |
getCurrentPos()
Get the current position in the document being parsed.
|
protected void |
handleComment(char[] comment)
Handle HTML comment.
|
protected void |
handleEmptyTag(TagElement tag)
Handle the tag with no content, like <br>.
|
protected void |
handleEndTag(TagElement tag)
The method is called when the HTML closing tag ((like </table>)
is found or if the parser concludes that the one should be present
in the current position.
|
protected void |
handleEOFInComment()
This is additionally called in when the HTML content terminates
without closing the HTML comment.
|
protected void |
handleError(int line,
String message) |
protected void |
handleStartTag(TagElement tag)
The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position.
|
protected void |
handleText(char[] text)
Handle the text section.
|
protected void |
handleTitle(char[] title)
Handle HTML <title> tag.
|
protected TagElement |
makeTag(Element element)
Constructs the tag from the given element.
|
protected TagElement |
makeTag(Element element,
boolean isSupposed)
Constructs the tag from the given element.
|
protected void |
markFirstTime(Element element)
This is called when the tag, representing the given element,
occurs first time in the document.
|
void |
parse(Reader reader)
Parse the HTML text, calling various methods in response to the
occurence of the corresponding HTML constructions.
|
String |
parseDTDMarkup()
Parses DTD markup declaration.
|
protected boolean |
parseMarkupDeclarations(StringBuffer strBuff)
Parse DTD document declarations.
|
protected void |
startTag(TagElement tag)
The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position.
|
protected boolean strict
public Parser(DTD a_dtd)
HTMLEditorKit.getParser()
.a_dtd
- A DTD to use.public void parse(Reader reader) throws IOException
reader
- The reader to read the source HTML from.IOException
- If the reader throws one.public String parseDTDMarkup() throws IOException
IOException
protected boolean parseMarkupDeclarations(StringBuffer strBuff) throws IOException
strBuff
- IOException
protected SimpleAttributeSet getAttributes()
protected int getCurrentLine()
protected int getCurrentPos()
protected void endTag(boolean omitted)
omitted
- True if the tag is no actually present in the document,
but is supposed by the parser (like </html> at the end of the
document).protected void error(String msg)
protected void error(String msg, String invalid)
protected void error(String parm1, String parm2, String parm3)
protected void error(String parm1, String parm2, String parm3, String parm4)
protected void flushAttributes()
protected void handleComment(char[] comment)
comment
- The comment being handledprotected void handleEOFInComment()
protected void handleEmptyTag(TagElement tag) throws ChangedCharSetException
tag
- The tag being handled.ChangedCharSetException
protected void handleEndTag(TagElement tag)
tag
- The tag being handledprotected void handleError(int line, String message)
protected void handleStartTag(TagElement tag)
tag
- The tag being handledprotected void handleText(char[] text)
For non-preformatted section, the parser replaces \t, \r and \n by spaces and then multiple spaces by a single space. Additionaly, all whitespace around tags is discarded.
For pre-formatted text (inside TEXAREA and PRE), the parser preserves all tabs and spaces, but removes one bounding \r, \n or \r\n, if it is present. Additionally, it replaces each occurence of \r or \r\n by a single \n.
text
- A section text.protected void handleTitle(char[] title)
title
- The title text.protected TagElement makeTag(Element element)
element
- the base element of the tag.protected TagElement makeTag(Element element, boolean isSupposed)
element
- the tag base Element
isSupposed
- true if the tag is not actually present in the
html input, but the parser supposes that it should to occur in
the current location.protected void markFirstTime(Element element)
element
- protected void startTag(TagElement tag) throws ChangedCharSetException
tag
- The tagChangedCharSetException