sbt/src/sphinx/Detailed-Topics/Parsing-Input.rst

173 lines
6.3 KiB
ReStructuredText

==========================
Parsing and tab completion
==========================
This page describes the parser combinators in sbt. These parser
combinators are typically used to parse user input and provide tab
completion for :doc:`/Extending/Input-Tasks` and :doc:`/Extending/Commands`. If you are already
familiar with Scala's parser combinators, the methods are mostly the
same except that their arguments are strict. There are two additional
methods for controlling tab completion that are discussed at the end of
the section.
Parser combinators build up a parser from smaller parsers. A
`Parser[T]` in its most basic usage is a function
`String => Option[T]`. It accepts a `String` to parse and produces a
value wrapped in `Some` if parsing succeeds or `None` if it fails.
Error handling and tab completion make this picture more complicated,
but we'll stick with `Option` for this discussion.
The following examples assume the imports: ::
import sbt._
import complete.DefaultParsers._
Basic parsers
-------------
The simplest parser combinators match exact inputs:
::
// A parser that succeeds if the input is 'x', returning the Char 'x'
// and failing otherwise
val singleChar: Parser[Char] = 'x'
// A parser that succeeds if the input is "blue", returning the String "blue"
// and failing otherwise
val litString: Parser[String] = "blue"
In these examples, implicit conversions produce a literal `Parser`
from a `Char` or `String`. Other basic parser constructors are the
`charClass`, `success` and `failure` methods:
::
// A parser that succeeds if the character is a digit, returning the matched Char
// The second argument, "digit", describes the parser and is used in error messages
val digit: Parser[Char] = charClass( (c: Char) => c.isDigit, "digit")
// A parser that produces the value 3 for an empty input string, fails otherwise
val alwaysSucceed: Parser[Int] = success( 3 )
// Represents failure (always returns None for an input String).
// The argument is the error message.
val alwaysFail: Parser[Nothing] = failure("Invalid input.")
Built-in parsers
----------------
sbt comes with several built-in parsers defined in `sbt.complete.DefaultParsers <../../api/sbt/complete/DefaultParsers$.html>`_.
Some commonly used built-in parsers are:
* `Space`, `NotSpace`, `OptSpace`, and `OptNotSpace` for parsing spaces or non-spaces, required or not.
* `StringBasic` for parsing text that may be quoted.
* `IntBasic` for parsing a signed Int value.
* `Digit` and `HexDigit` for parsing a single decimal or hexadecimal digit.
* `Bool` for parsing a `Boolean` value
See the `DefaultParsers API <../../api/sbt/complete/DefaultParsers$.html>`_ for details.
Combining parsers
-----------------
We build on these basic parsers to construct more interesting parsers.
We can combine parsers in a sequence, choose between parsers, or repeat
a parser.
::
// A parser that succeeds if the input is "blue" or "green",
// returning the matched input
val color: Parser[String] = "blue" | "green"
// A parser that matches either "fg" or "bg"
val select: Parser[String] = "fg" | "bg"
// A parser that matches "fg" or "bg", a space, and then the color, returning the matched values.
// ~ is an alias for Tuple2.
val setColor: Parser[String ~ Char ~ String] =
select ~ ' ' ~ color
// Often, we don't care about the value matched by a parser, such as the space above
// For this, we can use ~> or <~, which keep the result of
// the parser on the right or left, respectively
val setColor2: Parser[String ~ String] = select ~ (' ' ~> color)
// Match one or more digits, returning a list of the matched characters
val digits: Parser[Seq[Char]] = charClass(_.isDigit, "digit").+
// Match zero or more digits, returning a list of the matched characters
val digits0: Parser[Seq[Char]] = charClass(_.isDigit, "digit").*
// Optionally match a digit
val optDigit: Parser[Option[Char]] = charClass(_.isDigit, "digit").?
Transforming results
--------------------
A key aspect of parser combinators is transforming results along the way
into more useful data structures. The fundamental methods for this are
`map` and `flatMap`. Here are examples of `map` and some
convenience methods implemented on top of `map`.
::
// Apply the `digits` parser and apply the provided function to the matched
// character sequence
val num: Parser[Int] = digits map { (chars: Seq[Char]) => chars.mkString.toInt }
// Match a digit character, returning the matched character or return '0' if the input is not a digit
val digitWithDefault: Parser[Char] = charClass(_.isDigit, "digit") ?? '0'
// The previous example is equivalent to:
val digitDefault: Parser[Char] =
charClass(_.isDigit, "digit").? map { (d: Option[Char]) => d getOrElse '0' }
// Succeed if the input is "blue" and return the value 4
val blue = "blue" ^^^ 4
// The above is equivalent to:
val blueM = "blue" map { (s: String) => 4 }
Controlling tab completion
--------------------------
Most parsers have reasonable default tab completion behavior. For
example, the string and character literal parsers will suggest the
underlying literal for an empty input string. However, it is impractical
to determine the valid completions for `charClass`, since it accepts
an arbitrary predicate. The `examples` method defines explicit
completions for such a parser:
::
val digit = charClass(_.isDigit, "digit").examples("0", "1", "2")
Tab completion will use the examples as suggestions. The other method
controlling tab completion is `token`. The main purpose of `token`
is to determine the boundaries for suggestions. For example, if your
parser is:
::
("fg" | "bg") ~ ' ' ~ ("green" | "blue")
then the potential completions on empty input are:
`console fg green fg blue bg green bg blue`
Typically, you want to suggest smaller segments or the number of
suggestions becomes unmanageable. A better parser is:
::
token( ("fg" | "bg") ~ ' ') ~ token("green" | "blue")
Now, the initial suggestions would be (with \_ representing a space):
`console fg_ bg_`
Be careful not to overlap or nest tokens, as in
`token("green" ~ token("blue"))`. The behavior is unspecified (and
should generate an error in the future), but typically the outer most
token definition will be used.