**SC-GPT** is a multi-layer [Transformer](http://paperswithcode.com/method/transformer) neural language model, trained in three steps: (i) Pre-trained on plain text, similar to [GPT-2](http://paperswithcode.com/method/gpt-2); (ii) Continuously pretrained on large amounts of dialog-act labeled utterances corpora to acquire the ability of controllable generation; (iii) Fine-tuned for a target domain using very limited amounts of domain labels. Unlike [GPT-2](http://paperswithcode.com/method/gpt-2), SC-GPT generates semantically controlled responses that are conditioned on the given semantic form, similar to SC-[LSTM](https://paperswithcode.com/method/lstm) but requiring much less domain labels to generalize to new domains. It is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability, and fine-tuned with only a few domain-specific labels to adapt to new domains.

Parametrized learnable activation function, based on the Padé approximant.

Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks

SC-GPT

Few-shot Natural Language Generation for Task-Oriented Dialog

**Instance-Level Meta Normalization** is a normalization method that addresses a learning-to-normalize problem. ILM-Norm learns to predict the normalization parameters via both the feature feed-forward and the gradient back-propagation paths. It uses an auto-encoder to predict the weights $\omega$ and bias $\beta$ as the rescaling parameters for recovering the distribution of the tensor $x$ of feature maps. Instead of using the entire feature tensor $x$ as the input for the auto-encoder, it uses the mean $\mu$ and variance $\gamma$ of $x$ for characterizing its statistics.

Source	Few-shot Natural Language Generation for Task-Oriented Dialog
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com