View source on GitHub |
A basic pattern matching system.
The following is a self-contained pattern matching system based on a pattern
matcher written for
MIT class 6.945 in 2009.
The API closesly mirrors the Scheme API found in
rules
by Alexey Radul. The star matching
implementation is based on the one found in
Autoconj
by Matthew D. Hoffman, Matthew J. Johnson, and Dustin Tran. Specifically, the
signature for matcher
resembles the corresponding signature in rules
and
autoconj
, but instead of returning a sequences matches from matcher
directly, this library uses generators to return an iterator over matches.
This library is related to the
matchpy
library, but additionally supports
the Choice
and Star
patterns which aren't yet supported in matchpy
, along
with a different approach to extensibility (single-dispatch functions). However,
matchpy
additionally supports faster many-to-one matchers and native support
for associative and commutative operators.
Usage
By default, expressions are matched by Python equality. To build more
sophisticated patterns that can match expressions that are not just equal
to each other, we provide match combinators, or "higher-order patterns", like
Not
, Star
and Choice
.
match
The match
method takes in a pattern and an expression to match.
If the expression does not match the pattern, a MatchError
is thrown.
match
returns a dictionary of "bindings", where a binding is a mapping of a
name to a value. The bindings can be substituted into the pattern in order to
make the pattern and expression equal. The Var
pattern described below
demonstrates how to create bindings in a match.
match(1, 1) # ==> {}
match('abc', 'abc') # ==> {}
match((1, 2), (1, 2)) # ==> {}
match({'a': 1, 'b': 2}, {'a': 1, 'b': 2}) # ==> {}
match(1, 2) # ==> MatchError!
A Sequence
pattern matches a Sequence
expression if each of its elements
matches.
Similarly, a dict
pattern matches a dict
expression if their keys are
the same and the corresponding values match.
Also provided are is_match(pattern, expr)
, which returns True
if the
expression matches the pattern and False
if not, and
match_all(pattern, expr)
, which returns an iterator over all possible
bindings that successfully make the pattern and expression equal.
Var
We can use Var
s to define wildcard values that can match any expression.
If a pattern containing Var
s matches an expression, match
will return a
dictionary of bindings that maps the name of the Var
s to the expressions that
makes the pattern and expression equal. If the same Var
appears multiple times
in a pattern, the value needs to be the same across each of the instances for a
successful match.
match(Var('x'), 1) # ==> {'x': 1}
match((Var('x'), Var('x')), (1, 1)) # ==> {'x': 1}
match((Var('x'), Var('x')), (1, 2)) # ==> MatchError!
match({'a': 1, 'b': Var('x'}), {'a': 1, 'b': 2}) # ==> {'x': 2}
We can apply restrictions to Var
s to condition the types of values they can
match.
is_even = lambda x: x % 2 == 0
x = Var('x', restrictions=[is_even])
match(x, 2) # ==> {'x': 2}
match(x, 3) # ==> MatchError!
Dot
is a Var
that will match any expression without binding a name
to a value. It is equivalent to Var(None)
.
match(Dot, 2) # ==> {}
match(Dot, (1, 2, 3)) # ==> {}
Not
Not
takes a pattern and successfully matches an expression if the provided
pattern does not match it.
match(Not(1), 2) # ==> {}
match(Not(2), 2) # ==> MatchError!
match(Not(Var('x')), 2) # ==> MatchError!
Choice
Choice
takes in a sequence of patterns and tries matching each of them to an
expression, and on first success, returns the successful match. Choice
also takes in an optional name that is used when binding the successful match.
match(Choice(1, 2), 1) # ==> {}
match(Choice(1, 2), 2) # ==> {}
match(Choice(Var('x'), 2), 2) # ==> {'x': 2}
match(Choice((1, 2), (2, 1), name='x'), (2, 1)) # ==> {'x': (2, 1)}
Star
, Plus
, and Segment
The Star
, Plus
and Segment
patterns are used when matching sequences.
Specifically, they can match variable-length slices of a sequence. Star
takes
in a pattern and matches a sequence slice if all of its elements match the
pattern.
Star
must be used inside of a sequence.
match((Star(1),), ()) # ==> {}
match((Star(1),), (1,)) # ==> {}
match((Star(1),), (1, 1)) # ==> {}
match((Star(1),), (1, 2)) # ==> MatchError!
match((3, Star(1), 2), (3, 1, 1, 1, 2)) # ==> {}
Plus
is exactly like Star
, but will not match zero-length slices.
match((Plus(1),), (1,)) # ==> {}
match((Plus(1),), ()) # ==> MatchError!
Star
and Plus
optionally take in a name
that will be used to bind the
matched slice. Like Var
, if the same name appears in multiple places, the same
bound value must be able to be substituted into the pattern in each element
of the slice to make the expression match.
match((Star(1, name='x'),), (1, 1, 1)) # ==> {'x': (1, 1, 1)}
match((Star(1, name='x'), 2), (1, 1, 2)) # ==> {'x': (1, 1)}
match((Star(Dot, name='x'), 4), (1, 2, 3, 4)) # ==> {'x': (1, 2, 3)}
If the pattern inside of a Star
has any names in it (from Var
s or
nested Star
s), the same value needs to match for the entire slice. If name
'x'
needs to be bound to 1
to make an element of a slice match, it needs to
be 1
for every subsequent element of the sequence. For example, in the
following
snippets, x
cannot be bound to multiple values to make the match succeed.
match((Star(Var('x')),), (1, 1)) # ==> {'x': 1}
match((Star(Var('x')),), (1, 2)) # ==> MatchError!
Alternatively, we can "accumulate" the bindings for a name inside of a Star
by providing the name to the accumulate
argument to Star
. Star
tries to
match each element of a slice to its pattern. For Var
s inside of a Star
,
this means that the value bound to its name must match across the slice.
When we accumulate for a name inside of a Star
, rather than enforcing that the
match is the same across the slice, we collect the individual matches into a
sequence and bind the name to the sequence.
match((Star(Var('x'), accumulate=['x']),), (1, 2, 3)) # ==> {'x': (1, 2, 3)}
match((Star((Var('x'), Var('y')), accumulate=['y']),), ((1, 2), (1, 3)))
# ==> {'x': 1, 'y': (2, 3)}
Another example of using an accumulating Star
is to match expressions that
exhibit the distributive property, i.e. a * b + a * c
.
Mul = lambda *args: ('*', *args)
Add = lambda *args: ('+', *args)
distributive = Add(Star(Mul(Var('x'), Var('y')), accumulate=['y']))
match(distributive, Add(Mul('a', 'b'), Mul('a', 'c')))
# ==> {'x': 'a', 'y': ('b', 'c')}
match(distributive, Add(Mul('a', 'b'), Mul('c', 'd'))) # MatchError!
A Segment
is shorthand for a named Star
that has the Dot
pattern,
meaning it matches slices of a sequence regardless of the individual values.
Specifically, Segment(name)
matches the same expressions as
Star(Dot, name=name)
.
match((Segment('a'),), (1, 1, 1)) # ==> {'a': (1, 1, 1)}
Extending the system
The pattern matching library is based on a single-dispatch function, matcher
.
matcher
takes in a "pattern", which can be any Python object. It returns a
function match(expr, bindings, succeed)
, where expr
is an expression to
match, bindings
is a dictionary mapping names to values, and succeed
is a function that takes in bindings
(that could be extended by the match)
and returns an iterator over results.
To extend the system to include new patterns, there are two options:
We can register a custom type with
matcher
.matcher
is a single-dispatch function, so we can override its behavior for custom types. By default,matcher
returns amatch
that only succeeds if the pattern and expression are equal according to Python equality. To extend it for a custom typeFoo
that we'd like to match against, we can callmatcher.register(Foo)(custom_matcher)
to define a custom matcher forFoo
objects. This is how we define matchers for sequences and dictionaries.Alternatively, we provide the
Pattern
class which is the parent class of the various combinators (Choice
,Not
,Star
, etc.). By subclassingPattern
and implementing thematch(self, expr, bindings, succeed)
method, the class is automatically registered withmatcher
.
Classes
class Choice
: Tries to match any of a sequence of patterns.
class MatchError
: Raised when unable to find a pattern match.
class Not
: Matches successfully if the input pattern is not matched.
class Pattern
: Objects that are auto-registered with matcher
.
class Plus
: A pattern for repeated sub-patterns inside of a sequence.
class Segment
: Matches any slice of a sequence.
class Star
: A pattern for repeated sub-patterns inside of a sequence.
class Var
: Adds a named "wildcard" pattern that can match anything.
Functions
is_match(...)
: Returns whether or not an expression matches a pattern.
match(...)
: Returns a single match for pattern and expression or errors otherwise.
match_all(...)
: Returns an iterator over all bindings matching a pattern to an expression.
matcher(...)
: Returns a function that determines if an expression matches the given pattern.
Other Members | |
---|---|
Dot |
Instance of oryx.experimental.matching.matcher.Var
|