**Cross View Training**, or **CVT**, is a semi-supervised algorithm for training distributed word representations that makes use of unlabelled and labelled examples. 

CVT adds $k$ auxiliary prediction modules to the model, a Bi-[LSTM](https://paperswithcode.com/method/lstm) encoder, which are used when learning on unlabeled examples. A prediction module is usually a small neural network (e.g., a hidden layer followed by a [softmax](https://paperswithcode.com/method/softmax) layer). Each one takes as input an intermediate representation $h^j(x_i)$ produced by the model (e.g., the outputs of one of the LSTMs in a Bi-LSTM model). It outputs a distribution over labels $p\_{j}^{\theta}\left(y\mid{x\_{i}}\right)$.

Each $h^j$ is chosen such that it only uses a part of the input $x_i$; the particular choice can depend on the task and model architecture. The auxiliary prediction modules are only used during training; the test-time prediction come from the primary prediction module that produces $p_\theta$.

**Phase Shuffle** is a technique for removing pitched noise artifacts that come from using transposed convolutions in audio generation models. Phase shuffle is an operation with hyperparameter $n$. It randomly perturbs the phase of each layer’s activations by −$n$ to $n$ samples before input to the next layer.

In the original application in [WaveGAN](https://paperswithcode.com/method/wavegan), the authors only apply phase shuffle to the discriminator, as the latent vector already provides the generator a mechanism to manipulate the phase
of a resultant waveform. Intuitively speaking, phase shuffle makes the discriminator’s job more challenging by requiring invariance to the phase of the input waveform.

Phase Shuffle

Adversarial Audio Synthesis

Cross-View Training

Semi-Supervised Sequence Modeling with Cross-View Training

**PRNet+** is a multi-task neural network for outdoor position recovery from measurement record (MR) data. PRNet+ develops a feature extraction module to learn common local-, short- and long-term spatio-temporal locality from heterogeneous MR samples, with a convolutional neural network (CNN), long short-term memory cells ([LSTM](https://paperswithcode.com/method/lstm)), and attention mechanisms. Specifically, PRNet+ 1) allows the various-length sequences of MR samples, such that the two components (CNN and LSTM) are able to capture spatial locality from the samples within each MR sequence, 2) exploits two attention mechanisms for the time-interval between neighbouring MR samples, together with the one between neighbouring MR sequences, to capture temporal locality, and 3) incorporates the detected transportation modes and predicted locations of heterogeneous MR data into a joint loss for better result.

Source	Semi-Supervised Sequence Modeling with Cross-View Training
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com