partial_tagger.data.collators module#

class partial_tagger.data.collators.Batch(tagger_inputs: 'dict[str, torch.Tensor]', mask: 'torch.Tensor')[source]#
class partial_tagger.data.collators.BaseCollator[source]#

Base class for all collators.

abstract __call__(texts: tuple[str, ...]) tuple[Batch, tuple[LabelAlignment, ...]][source]#

Tokenizes given texts and encodes them into tensors. Also, provides an instance of Alignments based on the tokenization results.

Parameters:

texts – A tuple of strings where each item represents a text.

Returns:

A pair of instances of Batch and Alignments.

class partial_tagger.data.collators.TransformerCollator(tokenizer: PreTrainedTokenizerFast, tokenizer_args: dict[str, Any] | None = None)[source]#

A collator class for transformers.

Parameters:
  • tokenizer – A transformer tokenizer.

  • tokenizer_args – Additional tokenizer arguments. Defaults to None.

__call__(texts: tuple[str, ...]) tuple[Batch, tuple[LabelAlignment, ...]][source]#

Tokenizes given texts and encodes them into tensors. Also, provides an instance of Alignments based on the tokenization results.

Parameters:

texts – A tuple of strings where each item represents a text.

Returns:

A pair of instances of Batch and Alignments.