Embeddings from a language model trained on the 1 Billion Word Benchmark.

## Overview

Computes contextualized word representations using character-based word representations and bidirectional LSTMs, as described in the paper "Deep contextualized word representations" [1].

This modules supports inputs both in the form of raw text strings or tokenized text strings.

The module outputs fixed embeddings at each LSTM layer, a learnable aggregation of the 3 layers, and a fixed mean-pooled vector representation of the input.

The complex architecture achieves state of the art results on several benchmarks. Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.

#### Trainable parameters

The module exposes 4 trainable scalar weights for layer aggregation.

#### Example use

elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)
embeddings = elmo(
["the cat is on the mat", "dogs are in the fog"],
signature="default",
as_dict=True)["elmo"]

elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]
embeddings = elmo(
inputs={
"tokens": tokens_input,
"sequence_len": tokens_length
},
signature="tokens",
as_dict=True)["elmo"]


We set the trainable parameter to True when creating the module so that the 4 scalar weights (as described in the paper) can be trained. In this setting, the module still keeps all other parameters fixed.

### Inputs

The module defines two signatures: default, and tokens.

With the default signature, the module takes untokenized sentences as input. The input tensor is a string tensor with shape [batch_size]. The module tokenizes each string by splitting on spaces.

With the tokens signature, the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.

### Outputs

The output dictionary contains:

• word_emb: the character-based word representations with shape [batch_size, max_length, 512].
• lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024].
• lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024].
• elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024]
• default: a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].

## Changelog

#### Version 1

• Initial release.

#### Version 2

• Restricted trainable variables to the 4 scalar weights as described in the paper.

#### References

[1] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. Deep contextualized word representations. arXiv preprint arXiv:1802.05365, 2018.