From 4a99db684d406ea8cd1c76cb33c4515c34148c0c Mon Sep 17 00:00:00 2001 From: Aden Haussmann Date: Wed, 27 Mar 2024 22:07:46 +0000 Subject: [PATCH 1/2] Change n to m for Y --- encoder-decoder.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/encoder-decoder.md b/encoder-decoder.md index 8db8d2d936..ef72bec0f7 100644 --- a/encoder-decoder.md +++ b/encoder-decoder.md @@ -366,10 +366,10 @@ $$ f_{\theta_{\text{enc}}}: \mathbf{X}_{1:n} \to \mathbf{\overline{X}}_{1:n}. $$ The transformer-based decoder part then models the conditional probability distribution of the target vector sequence -\\(\mathbf{Y}_{1:n}\\) given the sequence of encoded hidden states +\\(\mathbf{Y}_{1:m}\\) given the sequence of encoded hidden states \\(\mathbf{\overline{X}}_{1:n}\\): -$$ p_{\theta_{dec}}(\mathbf{Y}_{1:n} | \mathbf{\overline{X}}_{1:n}).$$ +$$ p_{\theta_{dec}}(\mathbf{Y}_{1:m} | \mathbf{\overline{X}}_{1:n}).$$ By Bayes\' rule, this distribution can be factorized to a product of conditional probability distribution of the target vector \\(\mathbf{y}_i\\) @@ -377,7 +377,7 @@ given the encoded hidden states \\(\mathbf{\overline{X}}_{1:n}\\) and all previous target vectors \\(\mathbf{Y}_{0:i-1}\\): $$ -p_{\theta_{dec}}(\mathbf{Y}_{1:n} | \mathbf{\overline{X}}_{1:n}) = \prod_{i=1}^{n} p_{\theta_{\text{dec}}}(\mathbf{y}_i | \mathbf{Y}_{0: i-1}, \mathbf{\overline{X}}_{1:n}). $$ +p_{\theta_{dec}}(\mathbf{Y}_{1:m} | \mathbf{\overline{X}}_{1:n}) = \prod_{i=1}^{m} p_{\theta_{\text{dec}}}(\mathbf{y}_i | \mathbf{Y}_{0: i-1}, \mathbf{\overline{X}}_{1:n}). $$ The transformer-based decoder hereby maps the sequence of encoded hidden states \\(\mathbf{\overline{X}}_{1:n}\\) and all previous target vectors From 2f19d6a6fdafb27000db4b8c4584ad0794e9ad0d Mon Sep 17 00:00:00 2001 From: Aden Haussmann Date: Wed, 27 Mar 2024 22:16:21 +0000 Subject: [PATCH 2/2] found another n --- encoder-decoder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/encoder-decoder.md b/encoder-decoder.md index ef72bec0f7..1a7b5f3aa2 100644 --- a/encoder-decoder.md +++ b/encoder-decoder.md @@ -352,7 +352,7 @@ mapping. Similar to RNN-based encoder-decoder models, the transformer-based encoder-decoder models define a conditional distribution of target -vectors \\(\mathbf{Y}_{1:n}\\) given an input sequence \\(\mathbf{X}_{1:n}\\): +vectors \\(\mathbf{Y}_{1:m}\\) given an input sequence \\(\mathbf{X}_{1:n}\\): $$ p_{\theta_{\text{enc}}, \theta_{\text{dec}}}(\mathbf{Y}_{1:m} | \mathbf{X}_{1:n}).