Module: oryx.experimental.matching.matcher

A basic pattern matching system.

The following is a self-contained pattern matching system based on a pattern matcher written for MIT class 6.945 in 2009. The API closesly mirrors the Scheme API found in rules by Alexey Radul. The star matching implementation is based on the one found in Autoconj by Matthew D. Hoffman, Matthew J. Johnson, and Dustin Tran. Specifically, the signature for matcher resembles the corresponding signature in rules and autoconj, but instead of returning a sequences matches from matcher directly, this library uses generators to return an iterator over matches.

This library is related to the matchpy library, but additionally supports the Choice and Star patterns which aren't yet supported in matchpy, along with a different approach to extensibility (single-dispatch functions). However, matchpy additionally supports faster many-to-one matchers and native support for associative and commutative operators.

Usage

By default, expressions are matched by Python equality. To build more sophisticated patterns that can match expressions that are not just equal to each other, we provide match combinators, or "higher-order patterns", like Not, Star and Choice.

match

The match method takes in a pattern and an expression to match. If the expression does not match the pattern, a MatchError is thrown. match returns a dictionary of "bindings", where a binding is a mapping of a name to a value. The bindings can be substituted into the pattern in order to make the pattern and expression equal. The Var pattern described below demonstrates how to create bindings in a match.

match(1, 1)  # ==> {}
match('abc', 'abc')  # ==> {}
match((1, 2), (1, 2))  # ==> {}
match({'a': 1, 'b': 2}, {'a': 1, 'b': 2})  # ==> {}
match(1, 2) # ==> MatchError!

A Sequence pattern matches a Sequence expression if each of its elements matches. Similarly, a dict pattern matches a dict expression if their keys are the same and the corresponding values match.

Also provided are is_match(pattern, expr), which returns True if the expression matches the pattern and False if not, and match_all(pattern, expr), which returns an iterator over all possible bindings that successfully make the pattern and expression equal.

Var

We can use Vars to define wildcard values that can match any expression. If a pattern containing Vars matches an expression, match will return a dictionary of bindings that maps the name of the Vars to the expressions that makes the pattern and expression equal. If the same Var appears multiple times in a pattern, the value needs to be the same across each of the instances for a successful match.

match(Var('x'), 1)  # ==> {'x': 1}
match((Var('x'), Var('x')), (1, 1))  # ==> {'x': 1}
match((Var('x'), Var('x')), (1, 2))  # ==> MatchError!
match({'a': 1, 'b': Var('x'}), {'a': 1, 'b': 2})  # ==> {'x': 2}

We can apply restrictions to Vars to condition the types of values they can match.

is_even = lambda x: x % 2 == 0
x = Var('x', restrictions=[is_even])
match(x, 2) # ==> {'x': 2}
match(x, 3) # ==> MatchError!

Dot is a Var that will match any expression without binding a name to a value. It is equivalent to Var(None).

match(Dot, 2) # ==> {}
match(Dot, (1, 2, 3)) # ==> {}

Not

Not takes a pattern and successfully matches an expression if the provided pattern does not match it.

match(Not(1), 2) # ==> {}
match(Not(2), 2) # ==> MatchError!
match(Not(Var('x')), 2) # ==> MatchError!

Choice

Choice takes in a sequence of patterns and tries matching each of them to an expression, and on first success, returns the successful match. Choice also takes in an optional name that is used when binding the successful match.

match(Choice(1, 2), 1) # ==> {}
match(Choice(1, 2), 2) # ==> {}
match(Choice(Var('x'), 2), 2) # ==> {'x': 2}
match(Choice((1, 2), (2, 1), name='x'), (2, 1)) # ==> {'x': (2, 1)}

Star, Plus, and Segment

The Star, Plus and Segment patterns are used when matching sequences. Specifically, they can match variable-length slices of a sequence. Star takes in a pattern and matches a sequence slice if all of its elements match the pattern. Star must be used inside of a sequence.

match((Star(1),), ()) # ==> {}
match((Star(1),), (1,)) # ==> {}
match((Star(1),), (1, 1)) # ==> {}
match((Star(1),), (1, 2)) # ==> MatchError!
match((3, Star(1), 2), (3, 1, 1, 1, 2)) # ==> {}

Plus is exactly like Star, but will not match zero-length slices.

match((Plus(1),), (1,)) # ==> {}
match((Plus(1),), ()) # ==> MatchError!

Star and Plus optionally take in a name that will be used to bind the matched slice. Like Var, if the same name appears in multiple places, the same bound value must be able to be substituted into the pattern in each element of the slice to make the expression match.

match((Star(1, name='x'),), (1, 1, 1)) # ==> {'x': (1, 1, 1)}
match((Star(1, name='x'), 2), (1, 1, 2)) # ==> {'x': (1, 1)}
match((Star(Dot, name='x'), 4), (1, 2, 3, 4))  # ==> {'x': (1, 2, 3)}

If the pattern inside of a Star has any names in it (from Vars or nested Stars), the same value needs to match for the entire slice. If name 'x' needs to be bound to 1 to make an element of a slice match, it needs to be 1 for every subsequent element of the sequence. For example, in the following snippets, x cannot be bound to multiple values to make the match succeed.

match((Star(Var('x')),), (1, 1)) # ==> {'x': 1}
match((Star(Var('x')),), (1, 2)) # ==> MatchError!

Alternatively, we can "accumulate" the bindings for a name inside of a Star by providing the name to the accumulate argument to Star. Star tries to match each element of a slice to its pattern. For Vars inside of a Star, this means that the value bound to its name must match across the slice. When we accumulate for a name inside of a Star, rather than enforcing that the match is the same across the slice, we collect the individual matches into a sequence and bind the name to the sequence.

match((Star(Var('x'), accumulate=['x']),), (1, 2, 3)) # ==> {'x': (1, 2, 3)}
match((Star((Var('x'), Var('y')), accumulate=['y']),), ((1, 2), (1, 3)))
# ==> {'x': 1, 'y': (2, 3)}

Another example of using an accumulating Star is to match expressions that exhibit the distributive property, i.e. a * b + a * c.

Mul = lambda *args: ('*', *args)
Add = lambda *args: ('+', *args)
distributive = Add(Star(Mul(Var('x'), Var('y')), accumulate=['y']))
match(distributive, Add(Mul('a', 'b'), Mul('a', 'c')))
# ==> {'x': 'a', 'y': ('b', 'c')}
match(distributive, Add(Mul('a', 'b'), Mul('c', 'd'))) # MatchError!

A Segment is shorthand for a named Star that has the Dot pattern, meaning it matches slices of a sequence regardless of the individual values. Specifically, Segment(name) matches the same expressions as Star(Dot, name=name).

match((Segment('a'),), (1, 1, 1)) # ==> {'a': (1, 1, 1)}

Extending the system

The pattern matching library is based on a single-dispatch function, matcher. matcher takes in a "pattern", which can be any Python object. It returns a function match(expr, bindings, succeed), where expr is an expression to match, bindings is a dictionary mapping names to values, and succeed is a function that takes in bindings (that could be extended by the match) and returns an iterator over results.

To extend the system to include new patterns, there are two options:

  1. We can register a custom type with matcher. matcher is a single-dispatch function, so we can override its behavior for custom types. By default, matcher returns a match that only succeeds if the pattern and expression are equal according to Python equality. To extend it for a custom type Foo that we'd like to match against, we can call matcher.register(Foo)(custom_matcher) to define a custom matcher for Foo objects. This is how we define matchers for sequences and dictionaries.

  2. Alternatively, we provide the Pattern class which is the parent class of the various combinators (Choice, Not, Star, etc.). By subclassing Pattern and implementing the match(self, expr, bindings, succeed) method, the class is automatically registered with matcher.

Classes

class Choice: Tries to match any of a sequence of patterns.

class MatchError: Raised when unable to find a pattern match.

class Not: Matches successfully if the input pattern is not matched.

class Pattern: Objects that are auto-registered with matcher.

class Plus: A pattern for repeated sub-patterns inside of a sequence.

class Segment: Matches any slice of a sequence.

class Star: A pattern for repeated sub-patterns inside of a sequence.

class Var: Adds a named "wildcard" pattern that can match anything.

Functions

is_match(...): Returns whether or not an expression matches a pattern.

match(...): Returns a single match for pattern and expression or errors otherwise.

match_all(...): Returns an iterator over all bindings matching a pattern to an expression.

matcher(...): Returns a function that determines if an expression matches the given pattern.

Dot Instance of oryx.experimental.matching.matcher.Var