Calculates the fingerprint of files in a URI matching split patterns.
tfx.components.example_gen.utils.calculate_splits_fingerprint_span_and_version(
input_base_uri: Text,
splits: Iterable[example_gen_pb2.Input.Split],
range_config: Optional[range_config_pb2.RangeConfig] = None
) -> Tuple[Text, int, Optional[int]]
If a pattern has the {SPAN} placeholder or the Date spec placeholders, {YYYY},
{MM}, and {DD}, and optionally, the {VERSION} placeholder, attempts to find
aligned values that results in all splits having the target span and most
recent version for that span.
Args |
input_base_uri
|
The base path from which files will be searched.
|
splits
|
An iterable collection of example_gen_pb2.Input.Split objects.
|
range_config
|
An instance of range_config_pb2.RangeConfig, which specifies
which spans to consider when finding the most recent span and version. If
unset, search for latest span number with no restrictions.
|
Returns |
A Tuple of [fingerprint, select_span, select_version], where select_span
is either the value matched with the {SPAN} placeholder, the value mapped
from matching the calendar date with the date placeholders {YYYY}, {MM},
{DD} or 0 if a placeholder wasn't specified, and where select_version is
either the value matched with the {VERSION} placeholder, or None if the
placeholder wasn't specified. Note that this function will update the
{SPAN} or Date tags as well as the {VERSION} tags in the split configs to
actual Span and Version numbers.
|