stan-dev · SteveBronder · Aug 20, 2024 · Aug 20, 2024 · Sep 16, 2024 · bob-carpenter
diff --git a/src/reference-manual/transforms.qmd b/src/reference-manual/transforms.qmd
@@ -15,7 +15,7 @@ constrained to be ordered, positive ordered, or simplexes.  Matrices
 may be constrained to be correlation matrices or covariance matrices.
 This chapter provides a definition of the transforms used for each
 type of variable.   For examples of how to declare and define these
-variables in a Stan program, see section 
+variables in a Stan program, see section
 [Variable declaration](types.qmd#variable-declaration.section).
 
 Stan converts models to C++ classes which define probability
@@ -39,7 +39,7 @@ the exact arithmetic:
     rounded to the boundary. This may cause unexpected warnings or
     errors, if in other parts of the code the boundary value is
     invalid. For example, we may observe floating-point value 0 for
-    a variance parameter that has been declared to be larger than 0. 
+    a variance parameter that has been declared to be larger than 0.
 	See more about [Floating point Arithmetic in Stan user's guide](../stan-users-guide/floating-point.qmd)).
   - CmdStan stores the output to CSV files with 6 significant digits
     accuracy by default, but the constraints are checked with 8
@@ -672,6 +672,72 @@ z_k
 .
 $$
 
+## Stochastic Matrix {#stochastic-matrix-transform.section}
+
+The `column_stochastic_matrix[N, M]` and `row_stochastic_matrix[N, M]` type in
+Stan represents an \(N \times M\) matrix where each column (row) is a unit simplex
+of dimension \(N\). In other words, each column (row) of the matrix is a vector
+constrained to have non-negative entries that sum to one.
+
+### Definition of a Stochastic Matrix {-}
+
+A column stochastic matrix \(X \in \mathbb{R}^{N \times M}\) is defined such
+that each column is a simplex. For column \(j\) (where \(1 \leq j \leq M\)):
+
+$$
+X_{i, j} \geq 0 \quad \text{for } 1 \leq i \leq N,
+$$
+
+and
+
+$$
+\sum_{i=1}^N X_{i, j} = 1.
+$$
+
+A row stochastic matrix is any matrix whose transpose is a column stochastic matrix
+(i.e. the rows of the matrix are simplexes)
+
+
+$$
+X_{i, j} \geq 0 \quad \text{for } 1 \leq j \leq N,
+$$
+
+and
+
+$$
+\sum_{j=1}^N X_{i, j} = 1.
+$$
+
+This definition ensures that each column (row) of the matrix \(X\) lies on the
+\(N-1\) dimensional unit simplex, similar to the `simplex[N]` type, but
+extended across multiple columns(rows).
+
+### Inverse Transform for Stochastic Matrix {-}
+
+For the column and row stochastic matrices the inverse transform is the same
+as simplex, but applied to each column (row).
+
+### Absolute Jacobian Determinant for the Inverse Transform {-}
+
+The Jacobian determinant of the inverse transform for each column \(j\) in
+the matrix is given by the product of the diagonal entries \(J_{i,i,j}\) of
+the lower-triangular Jacobian matrix. This determinant is calculated as:
+
+$$
+\left| \det J_j \right| = \prod_{i=1}^{N-1} \left( z_{i, j} (1 - z_{i, j}) \left( 1 - \sum_{i'=1}^{i-1} X_{i'j} \right) \right).
+$$
+
+Thus, the overall Jacobian determinant for the entire `column_stochastic_matrix` and `row_stochastic_matrix`
+is the product of the determinants for each column (row):
+
+$$
+\left| \det J \right| = \prod_{j=1}^{M} \left| \det J_j \right|.
+$$
+
+### Transform for Stochastic Matrix {-}
+
+For the column and row stochastic matrices the transform is the same
+as simplex, but applied to each column (row).
 
 ## Unit vector {#unit-vector.section}
 

diff --git a/src/reference-manual/types.qmd b/src/reference-manual/types.qmd
@@ -673,6 +673,83 @@ iterations, and in either case, with less dispersed parameter
 initialization or custom initialization if there are informative
 priors for some parameters.
 
+### Stochastic Matrices {-}
+
+A stochastic matrix is a matrix where each column, row, or both is a
+unit simplex, meaning that each column (row) vector has non-negative
+values that sum to 1. The following example is a \(3 \times 4\)
+column-stochastic matrix.
+
+$$
+\begin{bmatrix}
+0.2 & 0.5 & 0.1 & 0.3 \\
+0.3 & 0.3 & 0.6 & 0.4 \\
+0.5 & 0.2 & 0.3 & 0.3
+\end{bmatrix}
+$$
+
+An example of a \(3 \times 4\) row-stochastic matrix is the following.
+
+$$
+\begin{bmatrix}
+0.2 & 0.5 & 0.1 & 0.2 \\
+0.2 & 0.1 & 0.6 & 0.1 \\
+0.5 & 0.2 & 0.2 & 0.1
+\end{bmatrix}
+$$
+
+
+In the examples above, each column (or row) sums to 1, making the matrices
+valid `column_stochastic_matrix` and `row_stochastic_matrix` types.
+
+Column-stochastic matrices are often used in models where
+each column represents a probability distribution across a
+set of categories such as in multiple multinomial distributions,
+factor models, transition matrices in Markov models,
+or compositional data analysis.
+They can also be used in situations where you need multiple simplexes
+of the same dimensionality.
+
+The `column_stochastic_matrix` and `row_stochastic_matrix` types are declared
+with row and column sizes. For instance, a matrix `theta` with
+3 rows and 4 columns, where each
+column is a 3-simplex, is declared like a matrix with 3 rows and 4 columns.
+
+```stan
+column_stochastic_matrix[3, 4] theta;
+```
+
+A matrix `theta` with 3 rows and 4 columns, where each row is a 4-simplex,
+is similarly declared as a matrix with 3 rows and 4 columns.
+
+```stan
+row_stochastic_matrix[3, 4] theta;
+```
+
+As with simplexes, `column_stochastic_matrix` and `row_stochastic_matrix`
+variables are subject to validation, ensuring that each column (row)
+satisfies the simplex constraints. This validation accounts for
+floating-point imprecision, with checks performed up to a statically
+specified accuracy threshold \(\epsilon\).
+
+#### Stability Considerations {-}
+
+In high-dimensional settings, `column_stochastic_matrix` and `row_stochastic_matrix`
+types may require careful tuning of the inference
+algorithms. To ensure stability:
+
+- **Smaller Step Sizes:** In samplers like Hamiltonian Monte Carlo (HMC),
+smaller step sizes can help maintain stability, especially in high dimensions.
+- **Higher Target Acceptance Rates:** Setting higher target acceptance
+rates can improve the robustness of the sampling process.
+- **Longer Warmup Periods:** Increasing the warmup period allows the sampler
+to better explore the parameter space before the actual sampling begins.
+- **Tighter Optimization Tolerances:** For optimization-based inference,
+tighter tolerances with more iterations can yield more accurate results.
+- **Custom Initialization:** If prior information about the parameters is
+available, custom initialization or less dispersed initialization can lead
+to more efficient inference.
+
 ### Unit vectors {-}
 
 A unit vector is a vector with a norm of one.  For instance, $[0.5,