Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

Module: nsl.tools.pack_nbrs

View source on GitHub

Tool that prepares input for graph-based Neural Structured Learning.

In particular, this tool merges into each labeled training example the features from its out-edge neighbor examples according to a supplied similarity graph.

USAGE:

python pack_nbrs.py [flags] labeled.tfr unlabeled.tfr graph.tsv output.tfr

The labeled.tfr command-line argument is expected to name a TFRecord file containing labeled tf.train.Examples, while the unlabeled.tfr command-line argument is expected to name a TFRecord file containing unlabeled examples. The unlabeled.tfr argument can be an empty string ('' or "" as the shell command-line argument) if there are no unlabeled examples. Each example read from either of those files is expected to have a feature that contains its ID (represented as a singleton bytes_list value); the name of this feature is specified by the value of the --id_feature_name flag (default: 'id').

The graph.tsv command-line argument is expected to name a TSV file that specifies a graph as a set of edges representing similarity relationships between the labeled and unlabeled Examples. Each graph edge is identified by a source instance ID, a target instance ID, and an optional edge weight. These edges are specified by TSV lines of the following form:

source_id<TAB>target_id[<TAB>edge_weight]

If no edge_weight is specified, it defaults to 1.0. If your input graph is not symmetric and you'd like all edges in it to be treated as bi-directional, you can use the --add_undirected_edges flag to accomplish that. To build a graph based on the similarity of your instances' dense embeddings, you can use the build_graph.py tool included in the Neural Structured Learning package.

This program merges into each labeled example the features of that example's out-edge neighbors according to that instance's in-edges in the graph. If a value is specified for the --max_nbrs flag, then at most that many neighbors' features are merged into each labeled instance (based on which neighbors have the largest edge weights, with ties broken using instance IDs).

Here's how the merging process works. For each labeled example, the features of its i'th out-edge neighbor will be prefixed by NL_nbr_<i>_, with indexes i in the half-open interval [0, K), where K is the minimum of --max_nbrs and the number of the labeled example's out-edges in the graph. A feature named NL_nbr_<i>_weight will also be merged into the labeled example whose value will be the neighbor's corresponding edge weight. The top neighbors to use in this process are selected by consulting the input graph and selecting the labeled example's out-edge neighbors with the largest edge weight; ties are broken by preferring neighbor IDs with larger lexicographic order. Finally, a feature named NL_num_nbrs is set on the result (a singleton int64_list) denoting the number of neighbors K merged into the labeled example.

Finally, the merged examples are written to a TFRecord file named by the output.tfr command-line argument.

For details about this program's flags, run python pack_nbrs.py --help.