public final class ScannerImpl extends java.lang.Object implements Scanner
Scanner produces tokens of the following types: STREAM-START STREAM-END COMMENT DIRECTIVE(name, value) DOCUMENT-START DOCUMENT-END BLOCK-SEQUENCE-START BLOCK-MAPPING-START BLOCK-END FLOW-SEQUENCE-START FLOW-MAPPING-START FLOW-SEQUENCE-END FLOW-MAPPING-END BLOCK-ENTRY FLOW-ENTRY KEY VALUE ALIAS(value) ANCHOR(value) TAG(value) SCALAR(value, plain, style) Read comments in the Scanner code for more details.
Modifier and Type | Class and Description |
---|---|
private static class |
ScannerImpl.Chomping
Chomping the tail may have 3 values - yes, no, not defined.
|
Modifier and Type | Field and Description |
---|---|
private boolean |
allowSimpleKey
A simple key is a key that is not denoted by the '?' indicator.
|
private boolean |
done |
static java.util.Map<java.lang.Character,java.lang.Integer> |
ESCAPE_CODES
A mapping from a character to a number of bytes to read-ahead for that escape sequence.
|
static java.util.Map<java.lang.Character,java.lang.String> |
ESCAPE_REPLACEMENTS
A mapping from an escaped character in the input stream to the string representation that they
should be replaced with.
|
private int |
flowLevel |
private int |
indent |
private ArrayStack<java.lang.Integer> |
indents |
private Token |
lastToken |
private LoaderOptions |
loaderOptions |
private static java.util.regex.Pattern |
NOT_HEXA
A regular expression matching characters which are not in the hexadecimal set (0-9, A-F, a-f).
|
private boolean |
parseComments |
private java.util.Map<java.lang.Integer,SimpleKey> |
possibleSimpleKeys |
private StreamReader |
reader |
private java.util.List<Token> |
tokens |
private int |
tokensTaken |
Constructor and Description |
---|
ScannerImpl(StreamReader reader) |
ScannerImpl(StreamReader reader,
LoaderOptions options) |
Modifier and Type | Method and Description |
---|---|
private void |
addAllTokens(java.util.List<Token> tokens) |
private boolean |
addIndent(int column)
Check if we need to increase indentation.
|
private void |
addToken(int index,
Token token) |
private void |
addToken(Token token) |
private boolean |
atEndOfPlain() |
private boolean |
checkBlockEntry()
Returns true if the next thing on the reader is a block token.
|
private boolean |
checkDirective()
Returns true if the next thing on the reader is a directive, given that the leading '%' has
already been checked.
|
private boolean |
checkDocumentEnd()
Returns true if the next thing on the reader is a document-end ("...").
|
private boolean |
checkDocumentStart()
Returns true if the next thing on the reader is a document-start ("---").
|
private boolean |
checkKey()
Returns true if the next thing on the reader is a key token.
|
private boolean |
checkPlain()
Returns true if the next thing on the reader is a plain token.
|
boolean |
checkToken(Token.ID... choices)
Check whether the next token is one of the given types.
|
private boolean |
checkValue()
Returns true if the next thing on the reader is a value token.
|
private java.lang.String |
escapeChar(java.lang.String chRepresentation)
This is implemented in CharConstants in SnakeYAML Engine
|
private void |
fetchAlias()
Fetch an alias, which is a reference to an anchor.
|
private void |
fetchAnchor()
Fetch an anchor.
|
private void |
fetchBlockEntry()
Fetch an entry in the block style.
|
private void |
fetchBlockScalar(char style)
Fetch a block scalar (literal or folded).
|
private void |
fetchDirective()
Fetch a YAML directive.
|
private void |
fetchDocumentEnd()
Fetch a document-end token ("...").
|
private void |
fetchDocumentIndicator(boolean isDocumentStart)
Fetch a document indicator, either "---" for "document-start", or else "..." for "document-end.
|
private void |
fetchDocumentStart()
Fetch a document-start token ("---").
|
private void |
fetchDouble()
Fetch a double-quoted (") scalar.
|
private void |
fetchFlowCollectionEnd(boolean isMappingEnd)
Fetch a flow-style collection end, which is either a sequence or a mapping.
|
private void |
fetchFlowCollectionStart(boolean isMappingStart)
Fetch a flow-style collection start, which is either a sequence or a mapping.
|
private void |
fetchFlowEntry()
Fetch an entry in the flow style.
|
private void |
fetchFlowMappingEnd() |
private void |
fetchFlowMappingStart() |
private void |
fetchFlowScalar(char style)
Fetch a flow scalar (single- or double-quoted).
|
private void |
fetchFlowSequenceEnd() |
private void |
fetchFlowSequenceStart() |
private void |
fetchFolded()
Fetch a folded scalar, denoted with a greater-than sign.
|
private void |
fetchKey()
Fetch a key in a block-style mapping.
|
private void |
fetchLiteral()
Fetch a literal scalar, denoted with a vertical-bar.
|
private void |
fetchMoreTokens()
Fetch one or more tokens from the StreamReader.
|
private void |
fetchPlain()
Fetch a plain scalar.
|
private void |
fetchSingle()
Fetch a single-quoted (') scalar.
|
private void |
fetchStreamEnd() |
private void |
fetchStreamStart()
We always add STREAM-START as the first token and STREAM-END as the last token.
|
private void |
fetchTag()
Fetch a tag.
|
private void |
fetchValue()
Fetch a value in a block-style mapping.
|
Token |
getToken()
Return the next token, removing it from the queue.
|
boolean |
isParseComments()
Deprecated.
|
private java.util.List<Token> |
makeTokenList(Token... tokens) |
private boolean |
needMoreTokens()
Returns true if more tokens should be scanned.
|
private int |
nextPossibleSimpleKey()
Return the number of the nearest possible simple key.
|
Token |
peekToken()
Return the next token, but do not delete it from the queue.
|
private void |
removePossibleSimpleKey()
Remove the saved possible key position at the current flow level.
|
private void |
savePossibleSimpleKey()
The next token may start a simple key.
|
private Token |
scanAnchor(boolean isAnchor)
The YAML 1.1 specification does not restrict characters for anchors and
aliases.
|
private java.util.List<Token> |
scanBlockScalar(char style) |
private java.lang.Object[] |
scanBlockScalarBreaks(int indent) |
private CommentToken |
scanBlockScalarIgnoredLine(Mark startMark)
Scan to the end of the line after a block scalar has been scanned; the only things that are
permitted at this time are comments and spaces.
|
private java.lang.Object[] |
scanBlockScalarIndentation()
Scans for the indentation of a block scalar implicitly.
|
private ScannerImpl.Chomping |
scanBlockScalarIndicators(Mark startMark)
Scan a block scalar indicator.
|
private CommentToken |
scanComment(CommentType type) |
private java.util.List<Token> |
scanDirective() |
private CommentToken |
scanDirectiveIgnoredLine(Mark startMark) |
private java.lang.String |
scanDirectiveName(Mark startMark)
Scan a directive name.
|
private Token |
scanFlowScalar(char style)
Scan a flow-style scalar.
|
private java.lang.String |
scanFlowScalarBreaks(Mark startMark) |
private java.lang.String |
scanFlowScalarNonSpaces(boolean doubleQuoted,
Mark startMark)
Scan some number of flow-scalar non-space characters.
|
private java.lang.String |
scanFlowScalarSpaces(Mark startMark) |
private java.lang.String |
scanLineBreak()
Scan a line break, transforming:
|
private Token |
scanPlain()
Scan a plain scalar.
|
private java.lang.String |
scanPlainSpaces()
See the specification for details.
|
private Token |
scanTag()
Scan a Tag property.
|
private java.lang.String |
scanTagDirectiveHandle(Mark startMark)
Scan a %TAG directive's handle.
|
private java.lang.String |
scanTagDirectivePrefix(Mark startMark)
Scan a %TAG directive's prefix.
|
private java.util.List<java.lang.String> |
scanTagDirectiveValue(Mark startMark)
Read a %TAG directive value:
|
private java.lang.String |
scanTagHandle(java.lang.String name,
Mark startMark)
Scan a Tag handle.
|
private java.lang.String |
scanTagUri(java.lang.String name,
Mark startMark)
Scan a Tag URI.
|
private void |
scanToNextToken()
We ignore spaces, line breaks and comments.
|
private java.lang.String |
scanUriEscapes(java.lang.String name,
Mark startMark)
Scan a sequence of %-escaped URI escape codes and convert them into a String representing the
unescaped values.
|
private java.lang.Integer |
scanYamlDirectiveNumber(Mark startMark)
Read a %YAML directive number: this is either the major or the minor part.
|
private java.util.List<java.lang.Integer> |
scanYamlDirectiveValue(Mark startMark) |
ScannerImpl |
setParseComments(boolean parseComments)
Deprecated.
|
private void |
stalePossibleSimpleKeys()
Remove entries that are no longer possible simple keys.
|
private void |
unwindIndent(int col)
* Handle implicitly ending multiple levels of block nodes by decreased indentation.
|
private static final java.util.regex.Pattern NOT_HEXA
public static final java.util.Map<java.lang.Character,java.lang.String> ESCAPE_REPLACEMENTS
public static final java.util.Map<java.lang.Character,java.lang.Integer> ESCAPE_CODES
\xHH : escaped 8-bit Unicode character \uHHHH : escaped 16-bit Unicode character \UHHHHHHHH : escaped 32-bit Unicode character
private final StreamReader reader
private boolean done
private int flowLevel
private final java.util.List<Token> tokens
private Token lastToken
private int tokensTaken
private int indent
private final ArrayStack<java.lang.Integer> indents
private boolean parseComments
private final LoaderOptions loaderOptions
private boolean allowSimpleKey
A simple key is a key that is not denoted by the '?' indicator. Example of simple keys: --- block simple key: value ? not a simple key: : { flow simple key: value } We emit the KEY token before all keys, so when we find a potential simple key, we try to locate the corresponding ':' indicator. Simple keys should be limited to a single line and 1024 characters. Can a simple key start at the current position? A simple key may start: - at the beginning of the line, not counting indentation spaces (in block context), - after '{', '[', ',' (in the flow context), - after '?', ':', '-' (in the block context). In the block context, this flag also signifies if a block collection may start at the current position.
private final java.util.Map<java.lang.Integer,SimpleKey> possibleSimpleKeys
public ScannerImpl(StreamReader reader)
public ScannerImpl(StreamReader reader, LoaderOptions options)
@Deprecated public ScannerImpl setParseComments(boolean parseComments)
CommentToken
.parseComments
- true
to parse; false
to ignore@Deprecated public boolean isParseComments()
public boolean checkToken(Token.ID... choices)
checkToken
in interface Scanner
choices
- token IDs to match withtrue
if the next token is one of the given types. Returns
false
if no more tokens are available.public Token peekToken()
peekToken
in interface Scanner
Scanner.getToken()
public Token getToken()
private void addToken(Token token)
private void addToken(int index, Token token)
private void addAllTokens(java.util.List<Token> tokens)
private boolean needMoreTokens()
private void fetchMoreTokens()
private java.lang.String escapeChar(java.lang.String chRepresentation)
private int nextPossibleSimpleKey()
private void stalePossibleSimpleKeys()
Remove entries that are no longer possible simple keys. According to the YAML specification, simple keys - should be limited to a single line, - should be no longer than 1024 characters. Disabling this procedure will allow simple keys of any length and height (may cause problems if indentation is broken though).
private void savePossibleSimpleKey()
private void removePossibleSimpleKey()
private void unwindIndent(int col)
1) book one: 2) part one: 3) chapter one 4) part two: 5) chapter one 6) chapter two 7) book two:In flow context, tokens should respect indentation. Actually the condition should be `self.indent >= column` according to the spec. But this condition will prohibit intuitively correct constructions such as key : { }
private boolean addIndent(int column)
private void fetchStreamStart()
private void fetchStreamEnd()
private void fetchDirective()
private void fetchDocumentStart()
private void fetchDocumentEnd()
private void fetchDocumentIndicator(boolean isDocumentStart)
private void fetchFlowSequenceStart()
private void fetchFlowMappingStart()
private void fetchFlowCollectionStart(boolean isMappingStart)
isMappingStart
- private void fetchFlowSequenceEnd()
private void fetchFlowMappingEnd()
private void fetchFlowCollectionEnd(boolean isMappingEnd)
private void fetchFlowEntry()
private void fetchBlockEntry()
private void fetchKey()
private void fetchValue()
private void fetchAlias()
*(anchor name)
private void fetchAnchor()
&(anchor name)
private void fetchTag()
private void fetchLiteral()
private void fetchFolded()
private void fetchBlockScalar(char style)
style
- private void fetchSingle()
private void fetchDouble()
private void fetchFlowScalar(char style)
style
- private void fetchPlain()
private boolean checkDirective()
private boolean checkDocumentStart()
private boolean checkDocumentEnd()
private boolean checkBlockEntry()
private boolean checkKey()
private boolean checkValue()
private boolean checkPlain()
private void scanToNextToken()
We ignore spaces, line breaks and comments. If we find a line break in the block context, we set the flag `allow_simple_key` on. The byte order mark is stripped if it's the first character in the stream. We do not yet support BOM inside the stream as the specification requires. Any such mark will be considered as a part of the document. TODO: We need to make tab handling rules more sane. A good rule is Tabs cannot precede tokens BLOCK-SEQUENCE-START, BLOCK-MAPPING-START, BLOCK-END, KEY(block), VALUE(block), BLOCK-ENTRY So the checking code is if <TAB>: self.allow_simple_keys = False We also need to add the check for `allow_simple_keys == True` to `unwind_indent` before issuing BLOCK-END. Scanners for block, flow, and plain scalars need to be modified.
private CommentToken scanComment(CommentType type)
private java.util.List<Token> scanDirective()
private java.lang.String scanDirectiveName(Mark startMark)
private java.util.List<java.lang.Integer> scanYamlDirectiveValue(Mark startMark)
private java.lang.Integer scanYamlDirectiveNumber(Mark startMark)
private java.util.List<java.lang.String> scanTagDirectiveValue(Mark startMark)
Read a %TAG directive value:
s-ignored-space+ c-tag-handle s-ignored-space+ ns-tag-prefix s-l-comments
private java.lang.String scanTagDirectiveHandle(Mark startMark)
startMark
- - beginning of the handleprivate java.lang.String scanTagDirectivePrefix(Mark startMark)
private CommentToken scanDirectiveIgnoredLine(Mark startMark)
private Token scanAnchor(boolean isAnchor)
The YAML 1.1 specification does not restrict characters for anchors and aliases. This may lead to problems. see https://bitbucket.org/snakeyaml/snakeyaml/issues/485/alias-names-are-too-permissive-compared-to This implementation tries to follow https://github.com/yaml/yaml-spec/blob/master/rfc/RFC-0003.md
private Token scanTag()
Scan a Tag property. A Tag property may be specified in one of three ways: c-verbatim-tag, c-ns-shorthand-tag, or c-ns-non-specific-tag
c-verbatim-tag takes the form !<ns-uri-char+> and must be delivered verbatim (as-is) to the application. In particular, verbatim tags are not subject to tag resolution.
c-ns-shorthand-tag is a valid tag handle followed by a non-empty suffix. If the tag handle is a c-primary-tag-handle ('!') then the suffix must have all exclamation marks properly URI-escaped (%21); otherwise, the string will look like a named tag handle: !foo!bar would be interpreted as (handle="!foo!", suffix="bar").
c-ns-non-specific-tag is always a lone '!'; this is only useful for plain scalars, where its specification means that the scalar MUST be resolved to have type tag:yaml.org,2002:str.
TODO SnakeYaml incorrectly ignores c-ns-non-specific-tag right now.private java.util.List<Token> scanBlockScalar(char style)
private ScannerImpl.Chomping scanBlockScalarIndicators(Mark startMark)
private CommentToken scanBlockScalarIgnoredLine(Mark startMark)
private java.lang.Object[] scanBlockScalarIndentation()
private java.lang.Object[] scanBlockScalarBreaks(int indent)
private Token scanFlowScalar(char style)
See the specification for details. Note that we loose indentation rules for quoted scalars. Quoted scalars don't need to adhere indentation because " and ' clearly mark the beginning and the end of them. Therefore we are less restrictive then the specification requires. We only need to check that document separators are not included in scalars.
private java.lang.String scanFlowScalarNonSpaces(boolean doubleQuoted, Mark startMark)
private java.lang.String scanFlowScalarSpaces(Mark startMark)
private java.lang.String scanFlowScalarBreaks(Mark startMark)
private Token scanPlain()
See the specification for details. We add an additional restriction for the flow context: plain scalars in the flow context cannot contain ',', ':' and '?'. We also keep track of the `allow_simple_key` flag here. Indentation rules are loosed for the flow context.
private boolean atEndOfPlain()
private java.lang.String scanPlainSpaces()
private java.lang.String scanTagHandle(java.lang.String name, Mark startMark)
Scan a Tag handle. A Tag handle takes one of three forms:
"!" (c-primary-tag-handle) "!!" (ns-secondary-tag-handle) "!(name)!" (c-named-tag-handle)Where (name) must be formatted as an ns-word-char.
private java.lang.String scanTagUri(java.lang.String name, Mark startMark)
Scan a Tag URI. This scanning is valid for both local and global tag directives, because both appear to be valid URIs as far as scanning is concerned. The difference may be distinguished later, in parsing. This method will scan for ns-uri-char*, which covers both cases.
This method performs no verification that the scanned URI conforms to any particular kind of URI specification.
private java.lang.String scanUriEscapes(java.lang.String name, Mark startMark)
Scan a sequence of %-escaped URI escape codes and convert them into a String representing the unescaped values.
FIXME This method fails for more than 256 bytes' worth of URI-encoded characters in a row. Is this possible? Is this a use-case?private java.lang.String scanLineBreak()
'\r\n' : '\n' '\r' : '\n' '\n' : '\n' '\x85' : '\n' default : ''