![]() The processed sequence is further passed to a few fixed-width 1-D convolutions, whose outputs are added with the original input sequence via residual connections. ![]() A stride of 1 is used to preserve the original time resolution. The convolution outputs are stacked together and further max pooled along time to increase local invariances. These filters explicitly model local and contextual information (akin to modeling unigrams, bigrams, up to K-grams). The input sequence is firstĬonvolved with $K$ sets of 1-D convolutional filters, where the $k$-th set contains $C\_$ filters of width $k$ (i.e. ![]() The module is used to extract representations from sequences. It consists of a bank of 1-D convolutional filters, followed by highway networks and a bidirectional gated recurrent unit (()). ![]() **CBHG** is a building block used in the () text-to-speech model. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |