|View source on GitHub|
Prepares input for graph-based Neural Structured Learning and persists it.
nsl.tools.pack_nbrs( labeled_examples_path, unlabeled_examples_path, graph_path, output_training_data_path, add_undirected_edges=False, max_nbrs=None, id_feature_name='id' )
Used in the tutorials:
In particular, this function merges into each labeled training example the features from its out-edge neighbor examples according to a supplied similarity graph, and persists the resulting (augmented) training data.
tf.train.Example read from the files identified by
unlabeled_examples_path is expected to have a
feature that contains its ID (represented as a singleton
the name of this feature is specified by the value of
Each edge in the graph specified by
graph_path is identified by a source
instance ID, a target instance ID, and an optional edge weight. These edges
are specified by TSV lines of the following form:
edge_weight is specified, it defaults to 1.0. If the input graph is
not symmetric and if
True, then all edges will be
treated as bi-directional. To build a graph based on the similarity of
instances' dense embeddings, see
This function merges into each labeled example the features of that example's
out-edge neighbors according to that instance's in-edges in the graph. If a
value is specified for
max_nbrs, then at most that many neighbors' features
are merged into each labeled instance (based on which neighbors have the
largest edge weights, with ties broken using instance IDs).
Here's how the merging process works. For each labeled example, the features
i'th out-edge neighbor will be prefixed by
i in the half-open interval
[0, K), where K is the minimum of
max_nbrs and the number of the labeled example's out-edges in the graph. A
NL_nbr_<i>_weight will also be merged into the labeled example
whose value will be the neighbor's corresponding edge weight. The top
neighbors to use in this process are selected by consulting the input graph
and selecting the labeled example's out-edge neighbors with the largest edge
weight; ties are broken by preferring neighbor IDs with larger lexicographic
order. Finally, a feature named
NL_num_nbrs is set on the result (a
int64_list) denoting the number of neighbors
K merged into the
Finally, the merged examples are written to a TFRecord file named by
Note that this function can also be invoked as a binary from a shell. Sample usage:
python -m neural_structured_learning.tools.pack_nbrs [flags]
labeled.tfr unlabeled.tfr graph.tsv output.tfr
For details about this program's flags, run:
python -m neural_structured_learning.tools.pack_nbrs.py --help
labeled_examples_path: Names a TFRecord file containing labeled
unlabeled_examples_path: Names a TFRecord file containing unlabeled
tf.train.Exampleinstances. This can be an empty string if there are no unlabeled examples.
graph_path: Names a TSV file that specifies a graph as a set of edges representing similarity relationships.
output_training_data_path: Path to a file where the resulting augmented training data in the form of
tf.train.Exampleinstances will be persisted in the TFRecord format.
Booleanindicating whether or not to treat adges as bi-directional.
max_nbrs: The maximum number of neighbors to use to generate the augmented training data for downstream training.
id_feature_name: The name of the feature in the input labeled and unlabeled
tf.train.Exampleobjects representing the ID of examples.