Python Modules

sockeye.arguments module

sockeye.average module

sockeye.callback module

sockeye.checkpoint_decoder module

sockeye.coverage module

Defines the dynamic source encodings (‘coverage’ mechanisms) for encoder/decoder networks as used in Tu et al. (2016).

class sockeye.coverage.ActivationCoverage(coverage_num_hidden, activation, layer_normalization)[source]

Bases: sockeye.coverage.Coverage

Implements a coverage mechanism whose updates are performed by a Perceptron with configurable activation function.

Parameters:
  • coverage_num_hidden (int) – Number of hidden units for coverage vectors.
  • activation (str) – Type of activation for Perceptron.
  • layer_normalization (bool) – If true, applies layer normalization before non-linear activation.
on(source, source_length, source_seq_len)[source]

Returns callable to be used for updating coverage vectors in a sequence decoder.

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Coverage callable.

class sockeye.coverage.CountCoverage[source]

Bases: sockeye.coverage.Coverage

Coverage class that accumulates the attention weights for each source word.

on(source, source_length, source_seq_len)[source]

Returns callable to be used for updating coverage vectors in a sequence decoder.

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Coverage callable.

class sockeye.coverage.Coverage(prefix='cov_')[source]

Bases: object

Generic coverage class. Similar to Attention classes, a coverage instance returns a callable, update_coverage(), function when self.on() is called.

on(source, source_length, source_seq_len)[source]

Returns callable to be used for updating coverage vectors in a sequence decoder.

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Coverage callable.

class sockeye.coverage.CoverageConfig(type, num_hidden, layer_normalization)[source]

Bases: sockeye.config.Config

Coverage configuration.

Parameters:
  • type (str) – Coverage name.
  • num_hidden (int) – Number of hidden units for coverage networks.
  • layer_normalization (bool) – Apply layer normalization to coverage networks.
class sockeye.coverage.GRUCoverage(coverage_num_hidden, layer_normalization)[source]

Bases: sockeye.coverage.Coverage

Implements a GRU whose state is the coverage vector.

TODO: This implementation is slightly inefficient since the source is fed in at every step. It would be better to pre-compute the mapping of the source but this will likely mean opening up the GRU.

Parameters:
  • coverage_num_hidden (int) – Number of hidden units for coverage vectors.
  • layer_normalization (bool) – If true, applies layer normalization for each gate in the GRU cell.
on(source, source_length, source_seq_len)[source]

Returns callable to be used for updating coverage vectors in a sequence decoder.

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Coverage callable.

sockeye.coverage.get_coverage(config)[source]

Returns a Coverage instance.

Parameters:config (CoverageConfig) – Coverage configuration.
Return type:Coverage
Returns:Instance of Coverage.
sockeye.coverage.mask_coverage(coverage, source_length)[source]

Masks all coverage scores that are outside the actual sequence.

Parameters:
  • coverage (Symbol) – Input coverage vector. Shape: (batch_size, seq_len, coverage_num_hidden).
  • source_length (Symbol) – Source length. Shape: (batch_size,).
Return type:

Symbol

Returns:

Masked coverage vector. Shape: (batch_size, seq_len, coverage_num_hidden).

sockeye.data_io module

sockeye.decoder module

Decoders for sequence-to-sequence models.

class sockeye.decoder.ConvolutionalDecoder(config, prefix='decoder_')[source]

Bases: sockeye.decoder.Decoder

Convolutional decoder similar to Gehring et al. 2017.

The decoder consists of an embedding layer, positional embeddings, and layers of convolutional blocks with residual connections.

Notable differences to Gehring et al. 2017:
  • Here the context vectors are created from the last encoder state (instead of using the last encoder state as the key and the sum of the encoder state and the source embedding as the value)
  • The encoder gradients are not scaled down by 1/(2 * num_attention_layers).
  • Residual connections are not scaled down by math.sqrt(0.5).
  • Attention is computed in the hidden dimension instead of the embedding dimension (removes need for training several projection matrices)
Parameters:
decode_sequence(source_encoded, source_encoded_lengths, source_encoded_max_length, target_embed, target_embed_lengths, target_embed_max_length)[source]

Decodes a sequence of embedded target words and returns sequence of last decoder representations for each time step.

Parameters:
  • source_encoded (Symbol) – Encoded source: (batch_size, source_encoded_max_length, encoder_depth).
  • source_encoded_lengths (Symbol) – Lengths of encoded source sequences. Shape: (batch_size,).
  • source_encoded_max_length (int) – Size of encoder time dimension.
  • target_embed (Symbol) – Embedded target sequence. Shape: (batch_size, target_embed_max_length, target_num_embed).
  • target_embed_lengths (Symbol) – Lengths of embedded target sequences. Shape: (batch_size,).
  • target_embed_max_length (int) – Dimension of the embedded target sequence.
Return type:

Symbol

Returns:

Decoder data. Shape: (batch_size, target_embed_max_length, decoder_depth).

decode_step(step, target_embed_prev, source_encoded_max_length, *states)[source]

Decodes a single time step given the current step, the previous embedded target word, and previous decoder states. Returns decoder representation for the next prediction, attention probabilities, and next decoder states. Implementations can maintain an arbitrary number of states.

Parameters:
  • step (int) – Global step of inference procedure, starts with 1.
  • target_embed_prev (Symbol) – Previous target word embedding. Shape: (batch_size, target_num_embed).
  • source_encoded_max_length (int) – Length of encoded source time dimension.
  • states (Symbol) – Arbitrary list of decoder states.
Return type:

Tuple[Symbol, Symbol, List[Symbol]]

Returns:

logit inputs, attention probabilities, next decoder states.

get_num_hidden()[source]
Return type:int
Returns:The representation size of this decoder.
init_states(source_encoded, source_encoded_lengths, source_encoded_max_length)[source]

Returns a list of symbolic states that represent the initial states of this decoder. Used for inference.

Parameters:
  • source_encoded (Symbol) – Encoded source. Shape: (batch_size, source_encoded_max_length, encoder_depth).
  • source_encoded_lengths (Symbol) – Lengths of encoded source sequences. Shape: (batch_size,).
  • source_encoded_max_length (int) – Size of encoder time dimension.
Return type:

List[Symbol]

Returns:

List of symbolic initial states.

state_shapes(batch_size, target_max_length, source_encoded_max_length, source_encoded_depth)[source]

Returns a list of shape descriptions given batch size, encoded source max length and encoded source depth. Used for inference.

Parameters:
  • batch_size (int) – Batch size during inference.
  • target_max_length (int) – Current target sequence length.
  • source_encoded_max_length (int) – Size of encoder time dimension.
  • source_encoded_depth (int) – Depth of encoded source.
Return type:

List[DataDesc]

Returns:

List of shape descriptions.

state_variables(target_max_length)[source]

Returns the list of symbolic variables for this decoder to be used during inference.

Parameters:target_max_length (int) – Current target sequence lengths.
Return type:List[Symbol]
Returns:List of symbolic variables.
class sockeye.decoder.ConvolutionalDecoderConfig(cnn_config, max_seq_len_target, num_embed, encoder_num_hidden, num_layers, positional_embedding_type, project_qkv=False, hidden_dropout=0.0, dtype='float32')[source]

Bases: sockeye.config.Config

Convolutional decoder configuration.

Parameters:
  • cnn_config (ConvolutionConfig) – Configuration for the convolution block.
  • max_seq_len_target (int) – Maximum target sequence length.
  • num_embed (int) – Target word embedding size.
  • encoder_num_hidden (int) – Number of hidden units of the encoder.
  • num_layers (int) – The number of convolutional layers.
  • positional_embedding_type (str) – The type of positional embedding.
  • hidden_dropout (float) – Dropout probability on next decoder hidden state.
  • dtype (str) – Data type.
class sockeye.decoder.Decoder(dtype)[source]

Bases: abc.ABC

Generic decoder interface. A decoder needs to implement code to decode a target sequence known in advance (decode_sequence), and code to decode a single word given its decoder state (decode_step). The latter is typically used for inference graphs in beam search. For the inference module to be able to keep track of decoder’s states a decoder provides methods to return initial states (init_states), state variables and their shapes.

Parameters:dtype – Data type.
decode_sequence(source_encoded, source_encoded_lengths, source_encoded_max_length, target_embed, target_embed_lengths, target_embed_max_length)[source]

Decodes a sequence of embedded target words and returns sequence of last decoder representations for each time step.

Parameters:
  • source_encoded (Symbol) – Encoded source: (batch_size, source_encoded_max_length, encoder_depth).
  • source_encoded_lengths (Symbol) – Lengths of encoded source sequences. Shape: (batch_size,).
  • source_encoded_max_length (int) – Size of encoder time dimension.
  • target_embed (Symbol) – Embedded target sequence. Shape: (batch_size, target_embed_max_length, target_num_embed).
  • target_embed_lengths (Symbol) – Lengths of embedded target sequences. Shape: (batch_size,).
  • target_embed_max_length (int) – Dimension of the embedded target sequence.
Return type:

Symbol

Returns:

Decoder data. Shape: (batch_size, target_embed_max_length, decoder_depth).

decode_step(step, target_embed_prev, source_encoded_max_length, *states)[source]

Decodes a single time step given the current step, the previous embedded target word, and previous decoder states. Returns decoder representation for the next prediction, attention probabilities, and next decoder states. Implementations can maintain an arbitrary number of states.

Parameters:
  • step (int) – Global step of inference procedure, starts with 1.
  • target_embed_prev (Symbol) – Previous target word embedding. Shape: (batch_size, target_num_embed).
  • source_encoded_max_length (int) – Length of encoded source time dimension.
  • states (Symbol) – Arbitrary list of decoder states.
Return type:

Tuple[Symbol, Symbol, List[Symbol]]

Returns:

logit inputs, attention probabilities, next decoder states.

classmethod get_decoder(config, prefix)[source]

Creates decoder based on config type.

Parameters:
Return type:

Decoder

Returns:

Decoder instance.

get_max_seq_len()[source]
Return type:Optional[int]
Returns:The maximum length supported by the decoder if such a restriction exists.
get_num_hidden()[source]
Return type:int
Returns:The representation size of this decoder.
init_states(source_encoded, source_encoded_lengths, source_encoded_max_length)[source]

Returns a list of symbolic states that represent the initial states of this decoder. Used for inference.

Parameters:
  • source_encoded (Symbol) – Encoded source. Shape: (batch_size, source_encoded_max_length, encoder_depth).
  • source_encoded_lengths (Symbol) – Lengths of encoded source sequences. Shape: (batch_size,).
  • source_encoded_max_length (int) – Size of encoder time dimension.
Return type:

List[Symbol]

Returns:

List of symbolic initial states.

classmethod register(config_type, suffix)[source]

Registers decoder type for configuration. Suffix is appended to decoder prefix.

Parameters:
  • config_type (Type[Union[Recurrentdecoderconfig, TransformerConfig, Convolutionaldecoderconfig]]) – Configuration type for decoder.
  • suffix (str) – String to append to decoder prefix.
Returns:

Class decorator.

reset()[source]

Reset decoder method. Used for inference.

state_shapes(batch_size, target_max_length, source_encoded_max_length, source_encoded_depth)[source]

Returns a list of shape descriptions given batch size, encoded source max length and encoded source depth. Used for inference.

Parameters:
  • batch_size (int) – Batch size during inference.
  • target_max_length (int) – Current target sequence length.
  • source_encoded_max_length (int) – Size of encoder time dimension.
  • source_encoded_depth (int) – Depth of encoded source.
Return type:

List[DataDesc]

Returns:

List of shape descriptions.

state_variables(target_max_length)[source]

Returns the list of symbolic variables for this decoder to be used during inference.

Parameters:target_max_length (int) – Current target sequence lengths.
Return type:List[Symbol]
Returns:List of symbolic variables.
class sockeye.decoder.RecurrentDecoder(config, prefix='decoder_rnn_')[source]

Bases: sockeye.decoder.Decoder

RNN Decoder with attention. The architecture is based on Luong et al, 2015: Effective Approaches to Attention-based Neural Machine Translation.

Parameters:
decode_sequence(source_encoded, source_encoded_lengths, source_encoded_max_length, target_embed, target_embed_lengths, target_embed_max_length)[source]

Decodes a sequence of embedded target words and returns sequence of last decoder representations for each time step.

Parameters:
  • source_encoded (Symbol) – Encoded source: (batch_size, source_encoded_max_length, encoder_depth).
  • source_encoded_lengths (Symbol) – Lengths of encoded source sequences. Shape: (batch_size,).
  • source_encoded_max_length (int) – Size of encoder time dimension.
  • target_embed (Symbol) – Embedded target sequence. Shape: (batch_size, target_embed_max_length, target_num_embed).
  • target_embed_lengths (Symbol) – Lengths of embedded target sequences. Shape: (batch_size,).
  • target_embed_max_length (int) – Dimension of the embedded target sequence.
Return type:

Symbol

Returns:

Decoder data. Shape: (batch_size, target_embed_max_length, decoder_depth).

decode_step(step, target_embed_prev, source_encoded_max_length, *states)[source]

Decodes a single time step given the current step, the previous embedded target word, and previous decoder states. Returns decoder representation for the next prediction, attention probabilities, and next decoder states. Implementations can maintain an arbitrary number of states.

Parameters:
  • step (int) – Global step of inference procedure, starts with 1.
  • target_embed_prev (Symbol) – Previous target word embedding. Shape: (batch_size, target_num_embed).
  • source_encoded_max_length (int) – Length of encoded source time dimension.
  • states (Symbol) – Arbitrary list of decoder states.
Return type:

Tuple[Symbol, Symbol, List[Symbol]]

Returns:

logit inputs, attention probabilities, next decoder states.

get_initial_state(source_encoded, source_encoded_length)[source]

Computes initial states of the decoder, hidden state, and one for each RNN layer. Optionally, init states for RNN layers are computed using 1 non-linear FC with the last state of the encoder as input.

Parameters:
  • source_encoded (Symbol) – Concatenated encoder states. Shape: (batch_size, source_seq_len, encoder_num_hidden).
  • source_encoded_length (Symbol) – Lengths of source sequences. Shape: (batch_size,).
Return type:

RecurrentDecoderState

Returns:

Decoder state.

get_num_hidden()[source]
Return type:int
Returns:The representation size of this decoder.
get_rnn_cells()[source]

Returns a list of RNNCells used by this decoder.

Return type:List[BaseRNNCell]
init_states(source_encoded, source_encoded_lengths, source_encoded_max_length)[source]

Returns a list of symbolic states that represent the initial states of this decoder. Used for inference.

Parameters:
  • source_encoded (Symbol) – Encoded source. Shape: (batch_size, source_encoded_max_length, encoder_depth).
  • source_encoded_lengths (Symbol) – Lengths of encoded source sequences. Shape: (batch_size,).
  • source_encoded_max_length (int) – Size of encoder time dimension.
Return type:

List[Symbol]

Returns:

List of symbolic initial states.

reset()[source]

Calls reset on the RNN cell.

state_shapes(batch_size, target_max_length, source_encoded_max_length, source_encoded_depth)[source]

Returns a list of shape descriptions given batch size, encoded source max length and encoded source depth. Used for inference.

Parameters:
  • batch_size (int) – Batch size during inference.
  • target_max_length (int) – Current target sequence length.
  • source_encoded_max_length (int) – Size of encoder time dimension.
  • source_encoded_depth (int) – Depth of encoded source.
Return type:

List[DataDesc]

Returns:

List of shape descriptions.

state_variables(target_max_length)[source]

Returns the list of symbolic variables for this decoder to be used during inference.

Parameters:target_max_length (int) – Current target sequence lengths.
Return type:List[Symbol]
Returns:List of symbolic variables.
class sockeye.decoder.RecurrentDecoderConfig(max_seq_len_source, rnn_config, attention_config, hidden_dropout=0.0, state_init='last', context_gating=False, layer_normalization=False, attention_in_upper_layers=False, dtype='float32')[source]

Bases: sockeye.config.Config

Recurrent decoder configuration.

Parameters:
  • max_seq_len_source (int) – Maximum source sequence length
  • rnn_config (RNNConfig) – RNN configuration.
  • attention_config (AttentionConfig) – Attention configuration.
  • hidden_dropout (float) – Dropout probability on next decoder hidden state.
  • state_init (str) – Type of RNN decoder state initialization: zero, last, average.
  • context_gating (bool) – Whether to use context gating.
  • layer_normalization (bool) – Apply layer normalization.
  • attention_in_upper_layers (bool) – Pass the attention value to all layers in the decoder.
  • dtype (str) – Data type.
class sockeye.decoder.RecurrentDecoderState(hidden, layer_states)

Bases: tuple

RecurrentDecoder state.

Parameters:
  • hidden – Hidden state after attention mechanism. Shape: (batch_size, num_hidden).
  • layer_states – Hidden states for RNN layers of RecurrentDecoder. Shape: List[(batch_size, rnn_num_hidden)]
hidden

Alias for field number 0

layer_states

Alias for field number 1

class sockeye.decoder.TransformerDecoder(config, prefix='decoder_transformer_')[source]

Bases: sockeye.decoder.Decoder

Transformer decoder as in Vaswani et al, 2017: Attention is all you need. In training, computation scores for each position of the known target sequence are compouted in parallel, yielding most of the speedup. At inference time, the decoder block is evaluated again and again over a maximum length input sequence that is initially filled with zeros and grows during beam search with predicted tokens. Appropriate masking at every time-step ensures correct self-attention scores and is updated with every step.

Parameters:
  • config (TransformerConfig) – Transformer configuration.
  • prefix (str) – Name prefix for symbols of this decoder.
decode_sequence(source_encoded, source_encoded_lengths, source_encoded_max_length, target_embed, target_embed_lengths, target_embed_max_length)[source]

Decodes a sequence of embedded target words and returns sequence of last decoder representations for each time step.

Parameters:
  • source_encoded (Symbol) – Encoded source: (batch_size, source_encoded_max_length, encoder_depth).
  • source_encoded_lengths (Symbol) – Lengths of encoded source sequences. Shape: (batch_size,).
  • source_encoded_max_length (int) – Size of encoder time dimension.
  • target_embed (Symbol) – Embedded target sequence. Shape: (batch_size, target_embed_max_length, target_num_embed).
  • target_embed_lengths (Symbol) – Lengths of embedded target sequences. Shape: (batch_size,).
  • target_embed_max_length (int) – Dimension of the embedded target sequence.
Return type:

Symbol

Returns:

Decoder data. Shape: (batch_size, target_embed_max_length, decoder_depth).

decode_step(step, target_embed_prev, source_encoded_max_length, *states)[source]

Decodes a single time step given the current step, the previous embedded target word, and previous decoder states. Returns decoder representation for the next prediction, attention probabilities, and next decoder states. Implementations can maintain an arbitrary number of states.

Parameters:
  • step (int) – Global step of inference procedure, starts with 1.
  • target_embed_prev (Symbol) – Previous target word embedding. Shape: (batch_size, target_num_embed).
  • source_encoded_max_length (int) – Length of encoded source time dimension.
  • states (Symbol) – Arbitrary list of decoder states.
Return type:

Tuple[Symbol, Symbol, List[Symbol]]

Returns:

logit inputs, attention probabilities, next decoder states.

get_num_hidden()[source]
Return type:int
Returns:The representation size of this decoder.
init_states(source_encoded, source_encoded_lengths, source_encoded_max_length)[source]

Returns a list of symbolic states that represent the initial states of this decoder. Used for inference.

Parameters:
  • source_encoded (Symbol) – Encoded source. Shape: (batch_size, source_encoded_max_length, encoder_depth).
  • source_encoded_lengths (Symbol) – Lengths of encoded source sequences. Shape: (batch_size,).
  • source_encoded_max_length (int) – Size of encoder time dimension.
Return type:

List[Symbol]

Returns:

List of symbolic initial states.

state_shapes(batch_size, target_max_length, source_encoded_max_length, source_encoded_depth)[source]

Returns a list of shape descriptions given batch size, encoded source max length and encoded source depth. Used for inference.

Parameters:
  • batch_size (int) – Batch size during inference.
  • target_max_length (int) – Current target sequence length.
  • source_encoded_max_length (int) – Size of encoder time dimension.
  • source_encoded_depth (int) – Depth of encoded source.
Return type:

List[DataDesc]

Returns:

List of shape descriptions.

state_variables(target_max_length)[source]

Returns the list of symbolic variables for this decoder to be used during inference.

Parameters:target_max_length (int) – Current target sequence length.
Return type:List[Symbol]
Returns:List of symbolic variables.

sockeye.embeddings module

sockeye.encoder module

Encoders for sequence-to-sequence models.

class sockeye.encoder.AddLearnedPositionalEmbeddings(num_embed, max_seq_len, prefix, embed_weight=None, dtype='float32')[source]

Bases: sockeye.encoder.PositionalEncoder

Takes an encoded sequence and adds positional embeddings to it, which are learned jointly. Note that this will limited the maximum sentence length during decoding.

Parameters:
  • num_embed (int) – Embedding size.
  • max_seq_len (int) – Maximum sequence length.
  • prefix (str) – Name prefix for symbols of this encoder.
  • embed_weight (Optional[Symbol]) – Optionally use an existing embedding matrix instead of creating a new one.
  • dtype (str) – Data type.
encode(data, data_length, seq_len)[source]
Parameters:
  • data (Symbol) – (batch_size, source_seq_len, num_embed)
  • data_length (Optional[Symbol]) – (batch_size,)
  • seq_len (int) – sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

(batch_size, source_seq_len, num_embed)

encode_positions(positions, data)[source]
Parameters:
  • positions (Symbol) – (batch_size,)
  • data (Symbol) – (batch_size, num_embed)
Return type:

Symbol

Returns:

(batch_size, num_embed)

class sockeye.encoder.AddSinCosPositionalEmbeddings(num_embed, prefix, scale_up_input, scale_down_positions, dtype='float32')[source]

Bases: sockeye.encoder.PositionalEncoder

Takes an encoded sequence and adds fixed positional embeddings as in Vaswani et al, 2017 to it.

Parameters:
  • num_embed (int) – Embedding size.
  • prefix (str) – Name prefix for symbols of this encoder.
  • scale_up_input (bool) – If True, scales input data up by num_embed ** 0.5.
  • scale_down_positions (bool) – If True, scales positional embeddings down by num_embed ** -0.5.
  • dtype (str) – Data type.
encode(data, data_length, seq_len)[source]
Parameters:
  • data (Symbol) – (batch_size, source_seq_len, num_embed)
  • data_length (Optional[Symbol]) – (batch_size,)
  • seq_len (int) – sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

(batch_size, source_seq_len, num_embed)

encode_positions(positions, data)[source]
Parameters:
  • positions (Symbol) – (batch_size,)
  • data (Symbol) – (batch_size, num_embed)
Return type:

Symbol

Returns:

(batch_size, num_embed)

class sockeye.encoder.BiDirectionalRNNEncoder(rnn_config, prefix='encoder_birnn_', layout='TNC', encoder_class=<class 'sockeye.encoder.RecurrentEncoder'>)[source]

Bases: sockeye.encoder.Encoder

An encoder that runs a forward and a reverse RNN over input data. States from both RNNs are concatenated together.

Parameters:
  • rnn_config (RNNConfig) – RNN configuration.
  • prefix – Prefix for variable names.
  • layout – Data layout.
  • encoder_class (Callable) – Recurrent encoder class to use.
encode(data, data_length, seq_len)[source]

Encodes data given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data.
  • data_length (Symbol) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded versions of input data (data, data_length, seq_len).

get_num_hidden()[source]

Return the representation size of this encoder.

Return type:int
get_rnn_cells()[source]

Returns a list of RNNCells used by this encoder.

Return type:List[BaseRNNCell]
class sockeye.encoder.ConvertLayout(target_layout, num_hidden, dtype='float32')[source]

Bases: sockeye.encoder.Encoder

Converts batch major data to time major by swapping the first dimension and setting the __layout__ attribute.

Parameters:
  • target_layout (str) – The target layout to convert to (C.BATCH_MAJOR or C.TIMEMAJOR).
  • num_hidden (int) – The number of hidden units of the previous encoder.
  • dtype (str) – Data type.
encode(data, data_length, seq_len)[source]

Encodes data given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data.
  • data_length (Optional[Symbol]) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded versions of input data (data, data_length, seq_len).

class sockeye.encoder.ConvolutionalEmbeddingConfig(num_embed, output_dim=None, max_filter_width=8, num_filters=(200, 200, 250, 250, 300, 300, 300, 300), pool_stride=5, num_highway_layers=4, dropout=0.0, add_positional_encoding=False, dtype='float32')[source]

Bases: sockeye.config.Config

Convolutional embedding encoder configuration.

Parameters:
  • num_embed (int) – Input embedding size.
  • output_dim (Optional[int]) – Output segment embedding size.
  • max_filter_width (int) – Maximum filter width for convolutions.
  • num_filters (Tuple[int, …]) – Number of filters of each width.
  • pool_stride (int) – Stride for pooling layer after convolutions.
  • num_highway_layers (int) – Number of highway layers for segment embeddings.
  • dropout (float) – Dropout probability.
  • add_positional_encoding (bool) – Dropout probability.
  • dtype (str) – Data type.
class sockeye.encoder.ConvolutionalEmbeddingEncoder(config, prefix='encoder_char_')[source]

Bases: sockeye.encoder.Encoder

An encoder developed to map a sequence of character embeddings to a shorter sequence of segment embeddings using convolutional, pooling, and highway layers. More generally, it maps a sequence of input embeddings to a sequence of span embeddings.

Parameters:
encode(data, data_length, seq_len)[source]

Encodes data given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data.
  • data_length (Symbol) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded versions of input data data, data_length, seq_len.

get_encoded_seq_len(seq_len)[source]

Returns the size of the encoded sequence.

Return type:int
get_num_hidden()[source]

Return the representation size of this encoder.

Return type:int
class sockeye.encoder.ConvolutionalEncoder(config, prefix='encoder_cnn_')[source]

Bases: sockeye.encoder.Encoder

Encoder that uses convolution instead of recurrent connections, similar to Gehring et al. 2017.

Parameters:
encode(data, data_length, seq_len)[source]

Encodes data with a stack of Convolution+GLU blocks given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data. Shape: (batch_size, seq_len, input_num_hidden).
  • data_length (Symbol) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded version of the data.

class sockeye.encoder.ConvolutionalEncoderConfig(num_embed, max_seq_len_source, cnn_config, num_layers, positional_embedding_type, dtype='float32')[source]

Bases: sockeye.config.Config

Convolutional encoder configuration.

Parameters:
  • cnn_config (ConvolutionConfig) – CNN configuration.
  • num_layers (int) – The number of convolutional layers on top of the embeddings.
  • positional_embedding_type (str) – The type of positional embedding.
  • dtype (str) – Data type.
class sockeye.encoder.Embedding(config, prefix, embed_weight=None, is_source=False)[source]

Bases: sockeye.encoder.Encoder

Thin wrapper around MXNet’s Embedding symbol. Works with both time- and batch-major data layouts.

Parameters:
  • config (EmbeddingConfig) – Embedding config.
  • prefix (str) – Name prefix for symbols of this encoder.
  • embed_weight (Optional[Symbol]) – Optionally use an existing embedding matrix instead of creating a new one.
  • is_source (bool) – Whether this is the source embedding instance. Default: False.
encode(data, data_length, seq_len)[source]

Encodes data given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data.
  • data_length (Optional[Symbol]) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded versions of input data (data, data_length, seq_len).

get_num_hidden()[source]

Return the representation size of this encoder.

Return type:int
class sockeye.encoder.Encoder(dtype)[source]

Bases: abc.ABC

Generic encoder interface.

Parameters:dtype – Data type.
encode(data, data_length, seq_len)[source]

Encodes data given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data.
  • data_length (Optional[Symbol]) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded versions of input data (data, data_length, seq_len).

get_encoded_seq_len(seq_len)[source]
Return type:int
Returns:The size of the encoded sequence.
get_max_seq_len()[source]
Return type:Optional[int]
Returns:The maximum length supported by the encoder if such a restriction exists.
get_num_hidden()[source]
Return type:int
Returns:The representation size of this encoder.
class sockeye.encoder.EncoderSequence(encoders, dtype='float32')[source]

Bases: sockeye.encoder.Encoder

A sequence of encoders is itself an encoder.

Parameters:
  • encoders (List[Encoder]) – List of encoders.
  • dtype (str) – Data type.
append(cls, infer_hidden=False, **kwargs)[source]

Extends sequence with new Encoder. ‘dtype’ gets passed into Encoder instance if not present in parameters and supported by specific Encoder type.

Parameters:
  • cls – Encoder type.
  • infer_hidden (bool) – If number of hidden should be inferred from previous encoder.
  • kwargs – Named arbitrary parameters for Encoder.
Return type:

Encoder

Returns:

Instance of Encoder.

encode(data, data_length, seq_len)[source]

Encodes data given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data.
  • data_length (Symbol) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded versions of input data (data, data_length, seq_len).

get_encoded_seq_len(seq_len)[source]

Returns the size of the encoded sequence.

Return type:int
get_max_seq_len()[source]
Return type:Optional[int]
Returns:The maximum length supported by the encoder if such a restriction exists.
get_num_hidden()[source]

Return the representation size of this encoder.

Return type:int
class sockeye.encoder.NoOpPositionalEmbeddings(num_embed, dtype='float32')[source]

Bases: sockeye.encoder.PositionalEncoder

Simple NoOp pos embedding. It does not modify the data, but avoids lots of if statements.

Parameters:dtype (str) – Data type.
class sockeye.encoder.RecurrentEncoder(rnn_config, prefix='encoder_rnn_', layout='TNC')[source]

Bases: sockeye.encoder.Encoder

Uni-directional (multi-layered) recurrent encoder.

Parameters:
  • rnn_config (RNNConfig) – RNN configuration.
  • prefix (str) – Prefix for variable names.
  • layout (str) – Data layout.
encode(data, data_length, seq_len)[source]

Encodes data given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data.
  • data_length (Optional[Symbol]) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded versions of input data (data, data_length, seq_len).

get_num_hidden()[source]

Return the representation size of this encoder.

get_rnn_cells()[source]

Returns RNNCells used in this encoder.

class sockeye.encoder.RecurrentEncoderConfig(rnn_config, conv_config=None, reverse_input=False, dtype='float32')[source]

Bases: sockeye.config.Config

Recurrent encoder configuration.

Parameters:
  • rnn_config (RNNConfig) – RNN configuration.
  • conv_config (Optional[ConvolutionalEmbeddingConfig]) – Optional configuration for convolutional embedding.
  • reverse_input (bool) – Reverse embedding sequence before feeding into RNN.
  • dtype (str) – Data type.
class sockeye.encoder.ReverseSequence(num_hidden, dtype='float32')[source]

Bases: sockeye.encoder.Encoder

Reverses the input sequence. Requires time-major layout.

Parameters:dtype (str) – Data type.
class sockeye.encoder.TransformerEncoder(config, prefix='encoder_transformer_')[source]

Bases: sockeye.encoder.Encoder

Non-recurrent encoder based on the transformer architecture in:

Attention Is All You Need, Figure 1 (left) Vaswani et al. (https://arxiv.org/pdf/1706.03762.pdf).

Parameters:
  • config (TransformerConfig) – Configuration for transformer encoder.
  • prefix (str) – Name prefix for operations in this encoder.
encode(data, data_length, seq_len)[source]

Encodes data given sequence lengths of individual examples and maximum sequence length.

Parameters:
  • data (Symbol) – Input data.
  • data_length (Symbol) – Vector with sequence lengths.
  • seq_len (int) – Maximum sequence length.
Return type:

Tuple[Symbol, Symbol, int]

Returns:

Encoded versions of input data data, data_length, seq_len.

get_num_hidden()[source]

Return the representation size of this encoder.

Return type:int
sockeye.encoder.get_convolutional_encoder(config, prefix)[source]

Creates a convolutional encoder.

Parameters:
Return type:

Encoder

Returns:

Encoder instance.

sockeye.encoder.get_recurrent_encoder(config, prefix)[source]

Returns an encoder stack with a bi-directional RNN, and a variable number of uni-directional forward RNNs.

Parameters:
Return type:

Encoder

Returns:

Encoder instance.

sockeye.encoder.get_transformer_encoder(config, prefix)[source]

Returns a Transformer encoder, consisting of an embedding layer with positional encodings and a TransformerEncoder instance.

Parameters:
  • config (TransformerConfig) – Configuration for transformer encoder.
  • prefix (str) – Prefix for variable names.
Return type:

Encoder

Returns:

Encoder instance.

sockeye.inference module

sockeye.initializer module

sockeye.initializer.get_initializer(default_init_type, default_init_scale, default_init_xavier_rand_type, default_init_xavier_factor_type, embed_init_type, embed_init_sigma, rnn_init_type)[source]

Returns a mixed MXNet initializer.

Parameters:
  • default_init_type (str) – The default weight initializer type.
  • default_init_scale (float) – The scale used for default weight initialization (only used with uniform initialization).
  • default_init_xavier_rand_type (str) – Xavier random number generator type.
  • default_init_xavier_factor_type (str) – Xavier factor type.
  • embed_init_type (str) – Embedding matrix initialization type.
  • embed_init_sigma (float) – Sigma for normal initialization of embedding matrix.
  • rnn_init_type (str) – Initialization type for RNN h2h matrices.
Return type:

Initializer

Returns:

Mixed initializer.

sockeye.layers module

class sockeye.layers.LayerNormalization(num_hidden, prefix=None, scale=None, shift=None, scale_init=1.0, shift_init=0.0)[source]

Bases: object

Implements Ba et al, Layer Normalization (https://arxiv.org/abs/1607.06450).

Parameters:
  • num_hidden (int) – Number of hidden units of layer to be normalized.
  • prefix (Optional[str]) – Optional prefix of layer name.
  • scale (Optional[Symbol]) – Optional variable for scaling of shape (num_hidden,). Will be created if None.
  • shift (Optional[Symbol]) – Optional variable for shifting of shape (num_hidden,). Will be created if None.
  • scale_init (float) – Initial value of scale variable if scale is None. Default 1.0.
  • shift_init (float) – Initial value of shift variable if shift is None. Default 0.0.
static moments(inputs)[source]

Computes mean and variance of the last dimension of a Symbol.

Parameters:inputs (Symbol) – Shape: (d0, …, dn, hidden).
Return type:Tuple[Symbol, Symbol]
Returns:mean, var: Shape: (d0, …, dn, 1).
normalize(inputs, eps=1e-06)[source]

Normalizes hidden units of inputs as follows:

inputs = scale * (inputs - mean) / sqrt(var + eps) + shift

Normalization is performed over the last dimension of the input data.

Parameters:
  • inputs (Symbol) – Inputs to normalize. Shape: (d0, …, dn, num_hidden).
  • eps (float) – Variance epsilon.
Return type:

Symbol

Returns:

inputs_norm: Normalized inputs. Shape: (d0, …, dn, num_hidden).

class sockeye.layers.MultiHeadAttention(prefix, depth_att=512, heads=8, depth_out=512, dropout=0.0)[source]

Bases: sockeye.layers.MultiHeadAttentionBase

Multi-head attention layer for queries independent from keys/values.

Parameters:
  • prefix (str) – Attention prefix.
  • depth_att (int) – Attention depth / number of hidden units.
  • heads (int) – Number of attention heads.
  • depth_out (int) – Output depth / number of output units.
  • dropout (float) – Dropout probability on attention scores
class sockeye.layers.MultiHeadAttentionBase(prefix, depth_att=512, heads=8, depth_out=512, dropout=0.0)[source]

Bases: object

Base class for Multi-head attention.

Parameters:
  • prefix (str) – Attention prefix.
  • depth_att (int) – Attention depth / number of hidden units.
  • heads (int) – Number of attention heads.
  • depth_out (int) – Output depth / number of output units.
  • dropout (float) – Dropout probability on attention scores
class sockeye.layers.MultiHeadSelfAttention(prefix, depth_att=512, heads=8, depth_out=512, dropout=0.0)[source]

Bases: sockeye.layers.MultiHeadAttentionBase

Multi-head self-attention. Independent linear projections of inputs serve as queries, keys, and values for the attention.

Parameters:
  • prefix (str) – Attention prefix.
  • depth_att (int) – Attention depth / number of hidden units.
  • heads (int) – Number of attention heads.
  • depth_out (int) – Output depth / number of output units.
  • dropout (float) – Dropout probability on attention scores
class sockeye.layers.OutputLayer(hidden_size, vocab_size, weight, weight_normalization, prefix='target_output_')[source]

Bases: object

Defines the output layer of Sockeye decoders. Supports weight tying and weight normalization.

Parameters:
  • hidden_size (int) – Decoder hidden size.
  • vocab_size (int) – Target vocabulary size.
  • weight_normalization (bool) – Whether to apply weight normalization.
  • prefix (str) – Prefix used for naming.
class sockeye.layers.PlainDotAttention[source]

Bases: object

Dot attention layer for queries independent from keys/values.

class sockeye.layers.ProjectedDotAttention(prefix, num_hidden)[source]

Bases: object

Dot attention layer for queries independent from keys/values.

Parameters:
  • prefix (str) – Attention prefix.
  • num_hidden – Attention depth / number of hidden units.
class sockeye.layers.WeightNormalization(weight, num_hidden, ndim=2, prefix='')[source]

Bases: object

Implements Weight Normalization, see Salimans & Kingma 2016 (https://arxiv.org/abs/1602.07868). For a given tensor the normalization is done per hidden dimension.

Parameters:
  • weight – Weight tensor of shape: (num_hidden, d1, d2, …).
  • num_hidden – Size of the first dimension.
  • ndim – The total number of dimensions of the weight tensor.
  • prefix (str) – The prefix used for naming.
sockeye.layers.activation(data, act_type)[source]

Apply custom or standard activation.

Custom activation types include: - Swish-1, also called Sigmoid-Weighted Linear Unit (SiLU): Ramachandran et

Parameters:
  • data (Symbol) – input Symbol of any shape.
  • act_type (str) – Type of activation.
Return type:

Symbol

Returns:

output Symbol with same shape as input.

sockeye.layers.broadcast_to_heads(x, num_heads, ndim, fold_heads=True)[source]

Broadcasts batch-major input of shape (batch, d1 … dn-1) to (batch*heads, d1 … dn-1).

Parameters:
  • x (Symbol) – Batch-major input. Shape: (batch, d1 … dn-1).
  • num_heads (int) – Number of heads.
  • ndim (int) – Number of dimensions in x.
  • fold_heads (bool) – Whether to fold heads dimension into batch dimension.
Return type:

Symbol

Returns:

Tensor with each sample repeated heads-many times. Shape: (batch * heads, d1 … dn-1) if fold_heads == True, (batch, heads, d1 … dn-1) else.

sockeye.layers.combine_heads(x, depth_per_head, heads)[source]

Returns a symbol with both batch & length, and head & depth dimensions combined.

Parameters:
  • x (Symbol) – Symbol of shape (batch * heads, length, depth_per_head).
  • depth_per_head (int) – Depth per head.
  • heads (int) – Number of heads.
Return type:

Symbol

Returns:

Symbol of shape (batch, length, depth).

sockeye.layers.dot_attention(queries, keys, values, lengths=None, dropout=0.0, bias=None, prefix='')[source]

Computes dot attention for a set of queries, keys, and values.

Parameters:
  • queries (Symbol) – Attention queries. Shape: (n, lq, d).
  • keys (Symbol) – Attention keys. Shape: (n, lk, d).
  • values (Symbol) – Attention values. Shape: (n, lk, dv).
  • lengths (Optional[Symbol]) – Optional sequence lengths of the keys. Shape: (n,).
  • dropout (float) – Dropout probability.
  • bias (Optional[Symbol]) – Optional 3d bias tensor.
  • prefix (Optional[str]) – Optional prefix
Returns:

‘Context’ vectors for each query. Shape: (n, lq, dv).

sockeye.layers.split_heads(x, depth_per_head, heads)[source]

Returns a symbol with head dimension folded into batch and depth divided by the number of heads.

Parameters:
  • x (Symbol) – Symbol of shape (batch, length, depth).
  • depth_per_head (int) – Depth per head.
  • heads (int) – Number of heads.
Return type:

Symbol

Returns:

Symbol of shape (batch * heads, length, depth_per_heads).

sockeye.lexicon module

sockeye.log module

sockeye.log.setup_main_logger(name, file_logging=True, console=True, path=None)[source]

Return a logger that configures logging for the main application.

Parameters:
  • name (str) – Name of the returned logger.
  • file_logging – Whether to log to a file.
  • console – Whether to log to the console.
  • path (Optional[str]) – Optional path to write logfile to.
Return type:

Logger

sockeye.loss module

Functions to generate loss symbols for sequence-to-sequence models.

class sockeye.loss.CrossEntropyLoss(loss_config)[source]

Bases: sockeye.loss.Loss

Computes the cross-entropy loss.

Parameters:loss_config (LossConfig) – Loss configuration.
get_loss(logits, labels)[source]

Returns loss and softmax output symbols given logits and integer-coded labels.

Parameters:
  • logits (Symbol) – Shape: (batch_size * target_seq_len, target_vocab_size).
  • labels (Symbol) – Shape: (batch_size * target_seq_len,).
Return type:

List[Symbol]

Returns:

List of loss symbol.

class sockeye.loss.Loss[source]

Bases: abc.ABC

Generic Loss interface. get_loss() method should return a loss symbol and the softmax outputs. The softmax outputs (named C.SOFTMAX_NAME) are used by EvalMetrics to compute various metrics, e.g. perplexity, accuracy. In the special case of cross_entropy, the SoftmaxOutput symbol provides softmax outputs for forward() AND cross_entropy gradients for backward().

create_metric()[source]

Create an instance of the EvalMetric that corresponds to this Loss function.

Return type:EvalMetric
get_loss(logits, labels)[source]

Returns loss and softmax output symbols given logits and integer-coded labels.

Parameters:
  • logits (Symbol) – Shape: (batch_size * target_seq_len, target_vocab_size).
  • labels (Symbol) – Shape: (batch_size * target_seq_len,).
Return type:

List[Symbol]

Returns:

List of loss and softmax output symbols.

class sockeye.loss.LossConfig(name, vocab_size, normalization_type, label_smoothing=0.0)[source]

Bases: sockeye.config.Config

Loss configuration.

Parameters:
  • name (str) – Loss name.
  • vocab_size (int) – Target vocab size.
  • normalization_type (str) – How to normalize the loss.
  • label_smoothing (float) – Optional smoothing constant for label smoothing.
sockeye.loss.get_loss(loss_config)[source]

Returns Loss instance.

Parameters:loss_config (LossConfig) – Loss configuration.
Return type:Loss

sockeye.lr_scheduler module

class sockeye.lr_scheduler.AdaptiveLearningRateScheduler(warmup=0)[source]

Bases: sockeye.lr_scheduler.LearningRateScheduler

Learning rate scheduler that implements new_evaluation_result and accordingly adaptively adjust the learning rate.

new_evaluation_result(has_improved)[source]

Returns true if the parameters should be reset to the ones with the best validation score.

Parameters:has_improved (bool) – Whether the model improved on held-out validation data.
Return type:bool
Returns:True if parameters should be reset to the ones with best validation score.
class sockeye.lr_scheduler.LearningRateSchedulerFixedStep(schedule, updates_per_checkpoint)[source]

Bases: sockeye.lr_scheduler.AdaptiveLearningRateScheduler

Use a fixed schedule of learning rate steps: lr_1 for N steps, lr_2 for M steps, etc.

Parameters:
  • schedule (List[Tuple[float, int]]) – List of learning rate step tuples in the form (rate, num_updates).
  • updates_per_checkpoint (int) – Updates per checkpoint.
new_evaluation_result(has_improved)[source]

Returns true if the parameters should be reset to the ones with the best validation score.

Parameters:has_improved (bool) – Whether the model improved on held-out validation data.
Return type:bool
Returns:True if parameters should be reset to the ones with best validation score.
static parse_schedule_str(schedule_str)[source]

Parse learning schedule string.

Parameters:schedule_str (str) – String in form rate1:num_updates1[,rate2:num_updates2,…]
Return type:List[Tuple[float, int]]
Returns:List of tuples (learning_rate, num_updates).
class sockeye.lr_scheduler.LearningRateSchedulerInvSqrtT(updates_per_checkpoint, half_life, warmup=0)[source]

Bases: sockeye.lr_scheduler.LearningRateScheduler

Learning rate schedule: lr / sqrt(1 + factor * t). Note: The factor is calculated from the half life of the learning rate.

Parameters:
  • updates_per_checkpoint (int) – Number of batches between checkpoints.
  • half_life (int) – Half life of the learning rate in number of checkpoints.
  • warmup (int) – Number of (linear) learning rate increases to warm-up.
class sockeye.lr_scheduler.LearningRateSchedulerInvT(updates_per_checkpoint, half_life, warmup=0)[source]

Bases: sockeye.lr_scheduler.LearningRateScheduler

Learning rate schedule: lr / (1 + factor * t). Note: The factor is calculated from the half life of the learning rate.

Parameters:
  • updates_per_checkpoint (int) – Number of batches between checkpoints.
  • half_life (int) – Half life of the learning rate in number of checkpoints.
class sockeye.lr_scheduler.LearningRateSchedulerPlateauReduce(reduce_factor, reduce_num_not_improved, warmup=0)[source]

Bases: sockeye.lr_scheduler.AdaptiveLearningRateScheduler

Lower the learning rate as soon as the validation score plateaus.

Parameters:
  • reduce_factor (float) – Factor to reduce learning rate with.
  • reduce_num_not_improved (int) – Number of checkpoints with no improvement after which learning rate is reduced.
new_evaluation_result(has_improved)[source]

Returns true if the parameters should be reset to the ones with the best validation score.

Parameters:has_improved (bool) – Whether the model improved on held-out validation data.
Return type:bool
Returns:True if parameters should be reset to the ones with best validation score.
sockeye.lr_scheduler.get_lr_scheduler(scheduler_type, updates_per_checkpoint, learning_rate_half_life, learning_rate_reduce_factor, learning_rate_reduce_num_not_improved, learning_rate_schedule=None, learning_rate_warmup=0)[source]

Returns a learning rate scheduler.

Parameters:
  • scheduler_type (str) – Scheduler type.
  • updates_per_checkpoint (int) – Number of batches between checkpoints.
  • learning_rate_half_life (int) – Half life of the learning rate in number of checkpoints.
  • learning_rate_reduce_factor (float) – Factor to reduce learning rate with.
  • learning_rate_reduce_num_not_improved (int) – Number of checkpoints with no improvement after which learning rate is reduced.
  • learning_rate_schedule (Optional[List[Tuple[float, int]]]) – Optional fixed learning rate schedule.
  • learning_rate_warmup (Optional[int]) – Number of batches that the learning rate is linearly increased.
Raises:

ValueError if unknown scheduler_type

Return type:

Optional[LearningRateScheduler]

Returns:

Learning rate scheduler.

sockeye.model module

sockeye.output_handler module

sockeye.rnn module

class sockeye.rnn.RNNConfig(cell_type, num_hidden, num_layers, dropout_inputs, dropout_states, dropout_recurrent=0, residual=False, first_residual_layer=2, forget_bias=0.0, dtype='float32')[source]

Bases: sockeye.config.Config

RNN configuration.

Parameters:
  • cell_type (str) – RNN cell type.
  • num_hidden (int) – Number of RNN hidden units.
  • num_layers (int) – Number of RNN layers.
  • dropout_inputs (float) – Dropout probability on RNN inputs (Gal, 2015).
  • dropout_states (float) – Dropout probability on RNN states (Gal, 2015).
  • dropout_recurrent (float) – Dropout probability on cell update (Semeniuta, 2016).
  • residual (bool) – Whether to add residual connections between multi-layered RNNs.
  • first_residual_layer (int) – First layer with a residual connection (1-based indexes). Default is to start at the second layer.
  • forget_bias (float) – Initial value of forget biases.
  • dtype (str) – Data type.
sockeye.rnn.get_stacked_rnn(config, prefix, parallel_inputs=False, layers=None)[source]

Returns (stacked) RNN cell given parameters.

Parameters:
  • config (RNNConfig) – rnn configuration.
  • prefix (str) – Symbol prefix for RNN.
  • parallel_inputs (bool) – Support parallel inputs for the stacked RNN cells.
  • layers (Optional[Iterable[int]]) – Specify which layers to create as a list of layer indexes.
Return type:

SequentialRNNCell

Returns:

RNN cell.

sockeye.rnn_attention module

Implementations of different attention mechanisms in sequence-to-sequence models.

class sockeye.rnn_attention.Attention(input_previous_word, dynamic_source_num_hidden=1, prefix='att_', dtype='float32')[source]

Bases: object

Generic attention interface that returns a callable for attending to source states.

Parameters:
  • input_previous_word (bool) – Feed the previous target embedding into the attention mechanism.
  • dynamic_source_num_hidden (int) – Number of hidden units of dynamic source encoding update mechanism.
  • dtype (str) – Data type.
get_initial_state(source_length, source_seq_len)[source]

Returns initial attention state. Dynamic source encoding is initialized with zeros.

Parameters:
  • source_length (Symbol) – Source length. Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

AttentionState

make_input(seq_idx, word_vec_prev, decoder_state)[source]

Returns AttentionInput to be fed into the attend callable returned by the on() method.

Parameters:
  • seq_idx (int) – Decoder time step.
  • word_vec_prev (Symbol) – Embedding of previously predicted ord
  • decoder_state (Symbol) – Current decoder state
Return type:

AttentionInput

Returns:

Attention input.

on(source, source_length, source_seq_len)[source]

Returns callable to be used for recurrent attention in a sequence decoder. The callable is a recurrent function of the form: AttentionState = attend(AttentionInput, AttentionState).

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Attention callable.

class sockeye.rnn_attention.AttentionConfig(type, num_hidden, input_previous_word, source_num_hidden, query_num_hidden, layer_normalization, config_coverage=None, num_heads=None, is_scaled=False, dtype='float32')[source]

Bases: sockeye.config.Config

Attention configuration.

Parameters:
  • type (str) – Attention name.
  • num_hidden (int) – Number of hidden units for attention networks.
  • input_previous_word (bool) – Feeds the previous target embedding into the attention mechanism.
  • source_num_hidden (int) – Number of hidden units of the source.
  • query_num_hidden (int) – Number of hidden units of the query.
  • layer_normalization (bool) – Apply layer normalization to MLP attention.
  • config_coverage (Optional[CoverageConfig]) – Optional coverage configuration.
  • num_heads (Optional[int]) – Number of attention heads. Only used for Multi-head dot attention.
  • is_scaled (Optional[bool]) – If ‘dot’ attentions should be scaled.
  • dtype (str) – Data type.
class sockeye.rnn_attention.AttentionInput(seq_idx, query)

Bases: tuple

Input to attention callables.

Parameters:
  • seq_idx – Decoder time step / sequence index.
  • query – Query input to attention mechanism, e.g. decoder hidden state (plus previous word).
query

Alias for field number 1

seq_idx

Alias for field number 0

class sockeye.rnn_attention.AttentionState(context, probs, dynamic_source)

Bases: tuple

Results returned from attention callables.

Parameters:
  • context – Context vector (Bahdanau et al, 15). Shape: (batch_size, encoder_num_hidden)
  • probs – Attention distribution over source encoder states. Shape: (batch_size, source_seq_len).
  • dynamic_source – Dynamically updated source encoding. Shape: (batch_size, source_seq_len, dynamic_source_num_hidden)
context

Alias for field number 0

dynamic_source

Alias for field number 2

probs

Alias for field number 1

class sockeye.rnn_attention.BilinearAttention(query_num_hidden, dtype='float32', prefix='att_')[source]

Bases: sockeye.rnn_attention.Attention

Bilinear attention based on Luong et al. 2015.

score(h_t, h_s) = h_t^T \mathbf{W} h_s

For implementation reasons we modify to:

score(h_t, h_s) = h_s^T \mathbf{W} h_t

Parameters:
  • query_num_hidden (int) – Number of hidden units the source will be projected to.
  • dtype (str) – data type.
  • prefix (str) – Name prefix.
on(source, source_length, source_seq_len)[source]

Returns callable to be used for recurrent attention in a sequence decoder. The callable is a recurrent function of the form: AttentionState = attend(AttentionInput, AttentionState).

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Attention callable.

class sockeye.rnn_attention.DotAttention(input_previous_word, source_num_hidden, query_num_hidden, num_hidden, is_scaled=False, prefix='att_', dtype='float32')[source]

Bases: sockeye.rnn_attention.Attention

Attention mechanism with dot product between encoder and decoder hidden states [Luong et al. 2015].

score(h_t, h_s) =  \langle h_t, h_s \rangle

a = softmax(score(*, h_s))

If rnn_num_hidden != num_hidden, states are projected with additional parameters to num_hidden.

score(h_t, h_s) = \langle \mathbf{W}_t h_t, \mathbf{W}_s h_s \rangle

Parameters:
  • input_previous_word (bool) – Feed the previous target embedding into the attention mechanism.
  • source_num_hidden (int) – Number of hidden units in source.
  • query_num_hidden (int) – Number of hidden units in query.
  • num_hidden (int) – Number of hidden units.
  • is_scaled (bool) – Optionally scale query before dot product [Vaswani et al, 2017].
  • prefix (str) – Name prefix.
  • dtype (str) – data type.
on(source, source_length, source_seq_len)[source]

Returns callable to be used for recurrent attention in a sequence decoder. The callable is a recurrent function of the form: AttentionState = attend(AttentionInput, AttentionState).

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Attention callable.

class sockeye.rnn_attention.EncoderLastStateAttention(input_previous_word, dynamic_source_num_hidden=1, prefix='att_', dtype='float32')[source]

Bases: sockeye.rnn_attention.Attention

Always returns the last encoder state independent of the query vector. Equivalent to no attention.

on(source, source_length, source_seq_len)[source]

Returns callable to be used for recurrent attention in a sequence decoder. The callable is a recurrent function of the form: AttentionState = attend(AttentionInput, AttentionState).

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Attention callable.

class sockeye.rnn_attention.LocationAttention(input_previous_word, max_seq_len, prefix='att_', dtype='float32')[source]

Bases: sockeye.rnn_attention.Attention

Attends to locations in the source [Luong et al, 2015]

a_t = softmax(\mathbf{W}_a h_t) for decoder hidden state at time t.

Note:

\mathbf{W}_a is of shape (max_source_seq_len, decoder_num_hidden).

Parameters:
  • input_previous_word (bool) – Feed the previous target embedding into the attention mechanism.
  • max_seq_len (int) – Maximum length of source sequences.
  • prefix (str) – Name prefix.
  • dtype (str) – data type.
on(source, source_length, source_seq_len)[source]

Returns callable to be used for recurrent attention in a sequence decoder. The callable is a recurrent function of the form: AttentionState = attend(AttentionInput, AttentionState).

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Attention callable.

class sockeye.rnn_attention.MlpAttention(input_previous_word, num_hidden, layer_normalization=False, prefix='att_', dtype='float32')[source]

Bases: sockeye.rnn_attention.Attention

Attention computed through a one-layer MLP with num_hidden units [Luong et al, 2015].

score(h_t, h_s) = \mathbf{W}_a tanh(\mathbf{W}_c [h_t, h_s] + b)

a = softmax(score(*, h_s))

Optionally, if attention_coverage_type is not None, attention uses dynamic source encoding (‘coverage’ mechanism) as in Tu et al. (2016): Modeling Coverage for Neural Machine Translation.

score(h_t, h_s) = \mathbf{W}_a tanh(\mathbf{W}_c [h_t, h_s, c_s] + b)

c_s is the decoder time-step dependent source encoding which is updated using the current decoder state.

Parameters:
  • input_previous_word (bool) – Feed the previous target embedding into the attention mechanism.
  • num_hidden (int) – Number of hidden units.
  • layer_normalization (bool) – If true, normalizes hidden layer outputs before tanh activation.
  • prefix (str) – Name prefix
  • dtype (str) – data type.
on(source, source_length, source_seq_len)[source]

Returns callable to be used for recurrent attention in a sequence decoder. The callable is a recurrent function of the form: AttentionState = attend(AttentionInput, AttentionState).

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Attention callable.

class sockeye.rnn_attention.MlpCovAttention(input_previous_word, num_hidden, layer_normalization=False, config_coverage=None, prefix='att_', dtype='float32')[source]

Bases: sockeye.rnn_attention.MlpAttention

MlpAttention with optional coverage config.

Parameters:
  • input_previous_word (bool) – Feed the previous target embedding into the attention mechanism.
  • num_hidden (int) – Number of hidden units.
  • layer_normalization (bool) – If true, normalizes hidden layer outputs before tanh activation.
  • config_coverage (Optional[CoverageConfig]) – coverage config.
  • prefix (str) – Name prefix.
  • dtype (str) – data type.
class sockeye.rnn_attention.MultiHeadDotAttention(input_previous_word, source_num_hidden, num_heads, prefix='att_', dtype='float32')[source]

Bases: sockeye.rnn_attention.Attention

Dot product attention with multiple heads as proposed in Vaswani et al, Attention is all you need. Can be used with a RecurrentDecoder.

Parameters:
  • input_previous_word (bool) – Feed the previous target embedding into the attention mechanism.
  • source_num_hidden (int) – Number of hidden units.
  • num_heads (int) – Number of attention heads / independently computed attention scores.
  • prefix (str) – Name prefix.
  • dtype (str) – data type.
on(source, source_length, source_seq_len)[source]

Returns callable to be used for recurrent attention in a sequence decoder. The callable is a recurrent function of the form: AttentionState = attend(AttentionInput, AttentionState).

Parameters:
  • source (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • source_length (Symbol) – Shape: (batch_size,).
  • source_seq_len (int) – Maximum length of source sequences.
Return type:

Callable

Returns:

Attention callable.

sockeye.rnn_attention.get_attention(config, max_seq_len, prefix='att_')[source]

Returns an Attention instance based on attention_type.

Parameters:
  • config (AttentionConfig) – Attention configuration.
  • max_seq_len (int) – Maximum length of source sequences.
  • prefix (str) – Name prefix.
Return type:

Attention

Returns:

Instance of Attention.

sockeye.rnn_attention.get_context_and_attention_probs(values, length, logits, dtype)[source]

Returns context vector and attention probabilities via a weighted sum over values.

Parameters:
  • values (Symbol) – Shape: (batch_size, seq_len, encoder_num_hidden).
  • length (Symbol) – Shape: (batch_size,).
  • logits (Symbol) – Shape: (batch_size, seq_len, 1).
  • dtype (str) – data type.
Return type:

Tuple[Symbol, Symbol]

Returns:

context: (batch_size, encoder_num_hidden), attention_probs: (batch_size, seq_len).

sockeye.training module

sockeye.transformer module

class sockeye.transformer.TransformerDecoderBlock(config, prefix)[source]

Bases: object

A transformer encoder block consists self-attention, encoder attention, and a feed-forward layer with pre/post process blocks in between.

class sockeye.transformer.TransformerEncoderBlock(config, prefix)[source]

Bases: object

A transformer encoder block consists self-attention and a feed-forward layer with pre/post process blocks in between.

class sockeye.transformer.TransformerFeedForward(num_hidden, num_model, act_type, dropout, prefix)[source]

Bases: object

Position-wise feed-forward network with activation.

class sockeye.transformer.TransformerProcessBlock(sequence, num_hidden, dropout, prefix)[source]

Bases: object

Block to perform pre/post processing on layer inputs. The processing steps are determined by the sequence argument, which can contain one of the three operations: n: layer normalization r: residual connection d: dropout

sockeye.transformer.get_autoregressive_bias(max_length, name)[source]

Returns bias/mask to ensure position i can only attend to positions <i.

Parameters:
  • max_length (int) – Sequence length.
  • name (str) – Name of symbol.
Return type:

Symbol

Returns:

Bias symbol of shape (1, max_length, max_length).

sockeye.transformer.get_variable_length_bias(lengths, max_length, num_heads=None, fold_heads=True, name='')[source]

Returns bias/mask for variable sequence lengths.

Parameters:
  • lengths (Symbol) – Sequence lengths. Shape: (batch,).
  • max_length (int) – Maximum sequence length.
  • num_heads (Optional[int]) – Number of attention heads.
  • fold_heads (bool) – Whether to fold heads dimension into batch dimension.
  • name (str) – Name of symbol.
Return type:

Symbol

Returns:

Bias symbol.

sockeye.utils module

A set of utility methods.

class sockeye.utils.GpuFileLock(candidates, lock_dir)[source]

Bases: object

Acquires a single GPU by locking a file (therefore this assumes that everyone using GPUs calls this method and shares the lock directory). Sets target to a GPU id or None if none is available.

Parameters:
  • candidates (List[~GpuDeviceType]) – List of candidate device ids to try to acquire.
  • lock_dir (str) – The directory for storing the lock file.
sockeye.utils.acquire_gpus(requested_device_ids, lock_dir='/tmp', retry_wait_min=10, retry_wait_rand=60, num_gpus_available=None)[source]

Acquire a number of GPUs in a transactional way. This method should be used inside a with statement. Will try to acquire all the requested number of GPUs. If currently not enough GPUs are available all locks will be released and we wait until we retry. Will retry until enough GPUs become available.

Parameters:
  • requested_device_ids (List[int]) – The requested device ids, each number is either negative indicating the number of GPUs that will be allocated, or positive indicating we want to acquire a specific device id.
  • lock_dir (str) – The directory for storing the lock file.
  • retry_wait_min (int) – The minimum number of seconds to wait between retries.
  • retry_wait_rand (int) – Randomly add between 0 and retry_wait_rand seconds to the wait time.
  • num_gpus_available (Optional[int]) – The number of GPUs available, if None we will call get_num_gpus().
Returns:

yields a list of GPU ids.

sockeye.utils.average_arrays(arrays)[source]

Take a list of arrays of the same shape and take the element wise average.

Parameters:arrays (List[NDArray]) – A list of NDArrays with the same shape that will be averaged.
Return type:NDArray
Returns:The average of the NDArrays in the same context as arrays[0].
sockeye.utils.check_condition(condition, error_message)[source]

Check the condition and if it is not met, exit with the given error message and error_code, similar to assertions.

Parameters:
  • condition (bool) – Condition to check.
  • error_message (str) – Error message to show to the user.
sockeye.utils.check_version(version)[source]

Checks given version against code version and determines compatibility. Throws if versions are incompatible.

Parameters:version (str) – Given version.
sockeye.utils.chunks(some_list, n)[source]

Yield successive n-sized chunks from l.

Return type:Iterable[List[~T]]
sockeye.utils.cleanup_params_files(output_folder, max_to_keep, checkpoint, best_checkpoint)[source]

Deletes oldest parameter files from a model folder.

Parameters:
  • output_folder (str) – Folder where param files are located.
  • max_to_keep (int) – Maximum number of files to keep, negative to keep all.
  • checkpoint (int) – Current checkpoint (i.e. index of last params file created).
  • best_checkpoint (int) – Best checkpoint. The parameter file corresponding to this checkpoint will not be deleted.
sockeye.utils.compute_lengths(sequence_data)[source]

Computes sequence lengths of PAD_ID-padded data in sequence_data.

Parameters:sequence_data (Symbol) – Input data. Shape: (batch_size, seq_len).
Return type:Symbol
Returns:Length data. Shape: (batch_size,).
sockeye.utils.expand_requested_device_ids(requested_device_ids)[source]

Transform a list of device id requests to concrete device ids. For example on a host with 8 GPUs when requesting [-4, 3, 5] you will get [0, 1, 2, 3, 4, 5]. Namely you will get device 3 and 5, as well as 3 other available device ids (starting to fill up from low to high device ids).

Parameters:requested_device_ids (List[int]) – The requested device ids, each number is either negative indicating the number of GPUs that will be allocated, or positive indicating we want to acquire a specific device id.
Return type:List[int]
Returns:A list of device ids.
sockeye.utils.get_alignments(attention_matrix, threshold=0.9)[source]

Yields hard alignments from an attention_matrix (target_length, source_length) given a threshold.

Parameters:
  • attention_matrix (ndarray) – The attention matrix.
  • threshold (float) – The threshold for including an alignment link in the result.
Return type:

Iterator[Tuple[int, int]]

Returns:

Generator yielding strings of the form 0-0, 0-1, 2-1, 2-2, 3-4…

sockeye.utils.get_gpu_memory_usage(ctx)[source]

Returns used and total memory for GPUs identified by the given context list.

Parameters:ctx (List[Context]) – List of MXNet context devices.
Return type:Dict[int, Tuple[int, int]]
Returns:Dictionary of device id mapping to a tuple of (memory used, memory total).
sockeye.utils.get_num_gpus()[source]

Gets the number of GPUs available on the host (depends on nvidia-smi).

Return type:int
Returns:The number of GPUs on the system.
sockeye.utils.get_tokens(line)[source]

Yields tokens from input string.

Parameters:line (str) – Input string.
Return type:Iterator[str]
Returns:Iterator over tokens.
sockeye.utils.get_validation_metric_points(model_path, metric)[source]

Returns tuples of value and checkpoint for given metric from metrics file at model_path. :type model_path: str :param model_path: Model path containing .metrics file. :type metric: str :param metric: Metric values to extract. :return: List of tuples (value, checkpoint).

sockeye.utils.grouper(iterable, size)[source]

Collect data into fixed-length chunks or blocks without discarding underfilled chunks or padding them.

Parameters:
  • iterable (Iterable[+T_co]) – A sequence of inputs.
  • size (int) – Chunk size.
Return type:

Iterable[+T_co]

Returns:

Sequence of chunks.

sockeye.utils.load_params(fname)[source]

Loads parameters from a file.

Parameters:fname (str) – The file containing the parameters.
Return type:Tuple[Dict[str, NDArray], Dict[str, NDArray]]
Returns:Mapping from parameter names to the actual parameters for both the arg parameters and the aux parameters.
sockeye.utils.load_version(fname)[source]

Loads version from file.

Parameters:fname (str) – Name of file to load version from.
Return type:str
Returns:Version string.
sockeye.utils.log_basic_info(args)[source]

Log basic information like version number, arguments, etc.

Parameters:args – Arguments as returned by argparse.
Return type:None
sockeye.utils.metric_value_is_better(new, old, metric)[source]

Returns true if new value is strictly better than old for given metric.

Return type:bool
sockeye.utils.parse_version(version_string)[source]

Parse version string into release, major, minor version.

Parameters:version_string (str) – Version string.
Return type:Tuple[str, str, str]
Returns:Tuple of strings.
sockeye.utils.plot_attention(attention_matrix, source_tokens, target_tokens, filename)[source]

Uses matplotlib for creating a visualization of the attention matrix.

Parameters:
  • attention_matrix (ndarray) – The attention matrix.
  • source_tokens (List[str]) – A list of source tokens.
  • target_tokens (List[str]) – A list of target tokens.
  • filename (str) – The file to which the attention visualization will be written to.
sockeye.utils.print_attention_text(attention_matrix, source_tokens, target_tokens, threshold)[source]

Prints the attention matrix to standard out.

Parameters:
  • attention_matrix (ndarray) – The attention matrix.
  • source_tokens (List[str]) – A list of source tokens.
  • target_tokens (List[str]) – A list of target tokens.
  • threshold (float) – The threshold for including an alignment link in the result.
sockeye.utils.read_metrics_file(path)[source]

Reads lines metrics file and returns list of mappings of key and values.

Parameters:path (str) – File to read metric values from.
Return type:List[Dict[str, Any]]
Returns:Dictionary of metric names (e.g. perplexity-train) mapping to a list of values.
sockeye.utils.save_graph(symbol, filename, hide_weights=True)[source]

Dumps computation graph visualization to .pdf and .dot file.

Parameters:
  • symbol (Symbol) – The symbol representing the computation graph.
  • filename (str) – The filename to save the graphic to.
  • hide_weights (bool) – If true the weights will not be shown.
sockeye.utils.save_params(arg_params, fname, aux_params=None)[source]

Saves the parameters to a file.

Parameters:
  • arg_params (Mapping[str, NDArray]) – Mapping from parameter names to the actual parameters.
  • fname (str) – The file name to store the parameters in.
  • aux_params (Optional[Mapping[str, NDArray]]) – Optional mapping from parameter names to the auxiliary parameters.
sockeye.utils.seedRNGs(seed)[source]

Seed the random number generators (Python, Numpy and MXNet)

Parameters:seed (int) – The random seed.
Return type:None
sockeye.utils.smart_open(filename, mode='rt', ftype='auto', errors='replace')[source]

Returns a file descriptor for filename with UTF-8 encoding. If mode is “rt”, file is opened read-only. If ftype is “auto”, uses gzip iff filename endswith .gz. If ftype is {“gzip”,”gz”}, uses gzip.

Note: encoding error handling defaults to “replace”

Parameters:
  • filename (str) – The filename to open.
  • mode (str) – Reader mode.
  • ftype (str) – File type. If ‘auto’ checks filename suffix for gz to try gzip.open
  • errors (str) – Encoding error handling during reading. Defaults to ‘replace’
Returns:

File descriptor

sockeye.utils.topk(scores, t, k, batch_size, offset, use_mxnet_topk)[source]

Get the lowest k elements per sentence from a scores matrix.

Parameters:
  • scores (NDArray) – Vocabulary scores for the next beam step. (batch_size * beam_size, target_vocabulary_size)
  • t (int) – Time step in the beam search.
  • k (int) – The number of smallest scores to return.
  • batch_size (int) – Number of sentences being decoded at once.
  • offset (ndarray) – Array to add to the hypothesis indices for offsetting in batch decoding.
  • use_mxnet_topk (bool) – True to use the mxnet implementation or False to use the numpy one.
Return type:

Tuple[ndarray, ndarray, Union[ndarray, NDArray]]

Returns:

The row indices, column indices and values of the k smallest items in matrix.

sockeye.utils.write_metrics_file(metrics, path)[source]

Write metrics data to tab-separated file.

Parameters:

sockeye.vocab module

sockeye.vocab.build_from_paths(paths, num_words=50000, min_count=1)[source]

Creates vocabulary from paths to a file in sentence-per-line format. A sentence is just a whitespace delimited list of tokens. Note that special symbols like the beginning of sentence (BOS) symbol will be added to the vocabulary.

Parameters:
  • paths (List[str]) – List of paths to files with one sentence per line.
  • num_words (int) – Maximum number of words in the vocabulary.
  • min_count (int) – Minimum occurrences of words to be included in the vocabulary.
Return type:

Dict[str, int]

Returns:

Word-to-id mapping.

sockeye.vocab.build_vocab(data, num_words=50000, min_count=1)[source]

Creates a vocabulary mapping from words to ids. Increasing integer ids are assigned by word frequency, using lexical sorting as a tie breaker. The only exception to this are special symbols such as the padding symbol (PAD).

Parameters:
  • data (Iterable[str]) – Sequence of sentences containing whitespace delimited tokens.
  • num_words (int) – Maximum number of words in the vocabulary.
  • min_count (int) – Minimum occurrences of words to be included in the vocabulary.
Return type:

Dict[str, int]

Returns:

Word-to-id mapping.

sockeye.vocab.get_ordered_tokens_from_vocab(vocab)[source]

Returns the list of tokens in a vocabulary, ordered by increasing vocabulary id.

Parameters:vocab (Dict[str, int]) – Input vocabulary.
Return type:List[str]
Returns:List of tokens.
sockeye.vocab.load_or_create_vocab(data, vocab_path, num_words, word_min_count)[source]

If the vocabulary path is defined, the vocabulary is loaded from the path. Otherwise, it is built from the data file. No writing to disk occurs.

Return type:Dict[str, int]
sockeye.vocab.load_or_create_vocabs(source_paths, target_path, source_vocab_paths, target_vocab_path, shared_vocab, num_words_source, word_min_count_source, num_words_target, word_min_count_target)[source]

Returns vocabularies for source files (including factors) and target. If the respective vocabulary paths are not None, the vocabulary is read from the path and returned. Otherwise, it is built from the support and saved to the path.

Parameters:
  • source_paths (List[str]) – The path to the source text (and optional token-parallel factor files).
  • target_path (str) – The target text.
  • source_vocab_paths (List[Optional[str]]) – The source vocabulary path (and optional factor vocabulary paths).
  • target_vocab_path (Optional[str]) – The target vocabulary path.
  • shared_vocab (bool) – Whether the source and target vocabularies are shared.
  • num_words_source (int) – Number of words in the source vocabulary.
  • word_min_count_source (int) – Minimum frequency of words in the source vocabulary.
  • num_words_target (int) – Number of words in the target vocabulary.
  • word_min_count_target (int) – Minimum frequency of words in the target vocabulary.
Return type:

Tuple[List[Dict[str, int]], Dict[str, int]]

Returns:

List of source vocabularies (for source and factors), and target vocabulary.

sockeye.vocab.load_source_vocabs(folder)[source]

Loads source vocabularies from folder. The first element in the list is the primary source vocabulary. Other elements correspond to optional additional source factor vocabularies found in folder.

Parameters:folder (str) – Source folder.
Return type:List[Dict[str, int]]
Returns:List of vocabularies.
sockeye.vocab.load_target_vocab(folder)[source]

Loads target vocabulary from folder.

Parameters:folder (str) – Source folder.
Return type:Dict[str, int]
Returns:Target vocabulary
sockeye.vocab.reverse_vocab(vocab)[source]

Returns value-to-key mapping from key-to-value-mapping.

Parameters:vocab (Dict[str, int]) – Key to value mapping.
Return type:Dict[int, str]
Returns:A mapping from values to keys.
sockeye.vocab.save_source_vocabs(source_vocabs, folder)[source]

Saves source vocabularies (primary surface form vocabulary) and optional factor vocabularies to folder.

Parameters:
  • source_vocabs (List[Dict[str, int]]) – List of source vocabularies.
  • folder (str) – Destination folder.
sockeye.vocab.save_target_vocab(target_vocab, folder)[source]

Saves target vocabulary to folder.

Parameters:
  • target_vocab (Dict[str, int]) – Target vocabulary.
  • folder (str) – Destination folder.
sockeye.vocab.vocab_from_json(path, encoding='utf-8')[source]

Saves vocabulary in json format.

Parameters:
  • path (str) – Path to json file containing the vocabulary.
  • encoding (str) – Vocabulary encoding.
Return type:

Dict[str, int]

Returns:

The loaded vocabulary.

sockeye.vocab.vocab_to_json(vocab, path)[source]

Saves vocabulary in human-readable json.

Parameters:
  • vocab (Dict[str, int]) – Vocabulary mapping.
  • path (str) – Output file path.