partial_tagger.data.collators module#
- class partial_tagger.data.collators.Batch(tagger_inputs: 'dict[str, torch.Tensor]', mask: 'torch.Tensor')[source]#
- class partial_tagger.data.collators.BaseCollator[source]#
Base class for all collators.
- abstract __call__(texts: tuple[str, ...]) tuple[Batch, tuple[LabelAlignment, ...]][source]#
Tokenizes given texts and encodes them into tensors. Also, provides an instance of Alignments based on the tokenization results.
- Parameters:
texts – A tuple of strings where each item represents a text.
- Returns:
A pair of instances of Batch and Alignments.
- class partial_tagger.data.collators.TransformerCollator(tokenizer: PreTrainedTokenizerFast, tokenizer_args: dict[str, Any] | None = None)[source]#
A collator class for transformers.
- Parameters:
tokenizer – A transformer tokenizer.
tokenizer_args – Additional tokenizer arguments. Defaults to None.
- __call__(texts: tuple[str, ...]) tuple[Batch, tuple[LabelAlignment, ...]][source]#
Tokenizes given texts and encodes them into tensors. Also, provides an instance of Alignments based on the tokenization results.
- Parameters:
texts – A tuple of strings where each item represents a text.
- Returns:
A pair of instances of Batch and Alignments.