NeurIPS Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning

Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning

Fadhel Ayed · Francois Caron · Paul Jung · Juho Lee · Hoil Lee · Hongseok Yang

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We consider gradient-based optimisation of wide, shallow neural networks with hidden-node ouputs scaled by positive scale parameters. The scale parameters are non-identical, differing from classical Neural Tangent Kernel (NTK) parameterisation. We prove that, for large networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime.

Chat is not available.

Poster in Workshop: Mathematics of Modern Machine Learning (M3L)

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning

Fadhel Ayed · Francois Caron · Paul Jung · Juho Lee · Hoil Lee · Hongseok Yang

Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)