AICurious Logo

What is: Baidu Dependency Parser?

SourceA Practical Chinese Dependency Parser Based on A Large-scale Dataset
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

DDParser, or Baidu Dependency Parser, is a Chinese dependency parser trained on a large-scale manually labeled dataset called Baidu Chinese Treebank (DuCTB).

For inputs, for the ii th word, its input vector eie_{i} is the concatenation of the word embedding and character-level representation:

e_i=e_iwordCharLSTM(w_i)e\_{i}=e\_{i}^{w o r d} \oplus C h a r L S T M\left(w\_{i}\right)

Where CharLSTM(wi)\operatorname{CharLSTM}\left(w_{i}\right) is the output vectors after feeding the character sequence into a BiLSTM layer. The experimental results on DuCTB dataset show that replacing POS tag embeddings with CharLSTM(wi)\operatorname{CharLSTM}\left(w_{i}\right) leads to the improvement.

For the BiLSTM encoder, three BiLSTM layers are employed over the input vectors for context encoding. Denote r_ir\_{i} the output vector of the top-layer BiLSTM for w_iw\_{i}

The dependency parser of Dozat and Manning is used. Dimension-reducing MLPs are applied to each recurrent output vector r_ir\_{i} before applying the biaffine transformation. Applying smaller MLPs to the recurrent output states before the biaffine classifier has the advantage of stripping away information not relevant to the current decision. Then biaffine attention is used both in the dependency arc classifier and relation classifier. The computations of all symbols in the Figure are shown below:

hidarc=MLPdarc(ri)h_{i}^{d-a r c}=M L P^{d-a r c}\left(r_{i}\right)
hiharc=MLPharc(ri)h_{i}^{h-a r c}=M L P^{h-a r c}\left(r_{i}\right) \\
hidrel=MLPdrel(ri)h_{i}^{d-r e l}=M L P^{d-r e l}\left(r_{i}\right) \\
hihrel=MLPhrel(ri)h_{i}^{h-r e l}=M L P^{h-r e l}\left(r_{i}\right) \\
Sarc=(HdarcI)UarcHharcS^{a r c}=\left(H^{d-a r c} \oplus I\right) U^{a r c} H^{h-a r c} \\
Srel=(HdrelI)Urel((Hhrel)TI)TS^{r e l}=\left(H^{d-r e l} \oplus I\right) U^{r e l}\left(\left(H^{h-r e l}\right)^{T} \oplus I\right)^{T}

For the decoder, the first-order Eisner algorithm is used to ensure that the output is a projection tree. Based on the dependency tree built by biaffine parser, we get a word sequence through the in-order traversal of the tree. The output is a projection tree only if the word sequence is in order.