01
Why collapse needs a target.
Prediction alone can make related views agree while the representation becomes a point, a line, or a thin manifold. SIGReg gives the embedding cloud a specific target shape.
Sketched Isotropic Gaussian Regularization
One idea
Embeddings \( \mathbf z_n=f_\theta(\mathbf x_n) \) should look like \( Q=\mathcal N(\mathbf 0,I_K) \). SIGReg avoids judging the whole cloud at once: it inserts slice planes, projects the points, tests each one-dimensional shadow, and averages the resulting scores.
01
Prediction alone can make related views agree while the representation becomes a point, a line, or a thin manifold. SIGReg gives the embedding cloud a specific target shape.
02
The LeJEPA paper argues that, when the future downstream task is unknown, embeddings should spend their variance evenly across directions. A centered isotropic Gaussian is the cleanest version of that geometry.
When some directions are stretched and others are tiny, a linear readout becomes more sensitive to sampling and regularization.
For k-NN and kernel-style predictors, the paper’s analysis points to the isotropic Gaussian as the unique bias-minimizing design under fixed covariance budget.
If the full cloud is isotropic Gaussian, every one-dimensional projection has the same standard normal shape.
03
After projection, Epps-Pulley compares the shadow’s empirical characteristic function against the standard Gaussian one. The score is small when the shadow has the right shape.
Move \(t\): each projected value \(u_n\) becomes a unit-circle point \((\cos(tu_n),\sin(tu_n))\). Their average is \(\widehat{\phi}_{\mathbf a}(t)\).
Mean cos: 0.000
Mean sin: 0.000
Target real: 0.000
Target imaginary: 0.000
EP(\(\mathbf a\))
0.000\(N\cdot\operatorname{trapz}\) over \(t_j\in[-5,5]\)
04
This toy demo directly optimizes the points \( \mathbf z_n \). In an encoder, the same gradients flow through \( \mathbf z_n=f_\theta(\mathbf x_n) \) into \(f_\theta\).
Prediction loss makes related views agree. SIGReg keeps the embedding distribution close to \( \mathcal N(\mathbf 0,I_K) \). Together they encourage useful, non-collapsed features.