AICurious Logo

What is: CANINE?

SourceCANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

CANINE is a pre-trained encoder for language understanding that operates directly on character sequences—without explicit tokenization or vocabulary—and a pre-training strategy with soft inductive biases in place of hard token boundaries. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context.