Wrong conclusions #23

ibmua · 2016-11-07T14:37:45Z

Just took another look at https://arxiv.org/pdf/1605.07146v1.pdf

To summarize:
• widening consistently improves performance across residual networks of different
depth;
• increasing both depth and width helps until the number of parameters becomes too
high and stronger regularization is needed;

Actually, the first conclusion contradicts the second in a way, but that's not the point.

You may really want to try running tests on MNIST with absolute 0 amount of augmentation as a test on which it's actually easy to overfit. Lesson learnt from MNIST for me was that it's more like there's an optimal width & height which is actually rather low (for such a task). And also that the standart blocks/activation scheduling (standard "preact") may not always be optimal and that groups (like in https://arxiv.org/pdf/1605.06489v1.pdf ) are hugely beneficial at least up to some amount of them.

I was able to achieve .25% peak error rate pretty easily and my best arch was pulling out same peak, but also .26% error through lots of epochs, which was rather hard to get here considering that this is a very high precision already, so the across-epoch fluctuations are relatively high. This was without any parameter smoothing, like moving average.

ibmua · 2016-11-16T20:42:10Z

Cifar performs pretty awfully with that arch, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong conclusions #23

Wrong conclusions #23

ibmua commented Nov 7, 2016 •

edited

Loading

ibmua commented Nov 16, 2016

Wrong conclusions #23

Wrong conclusions #23

Comments

ibmua commented Nov 7, 2016 • edited Loading

ibmua commented Nov 16, 2016

ibmua commented Nov 7, 2016 •

edited

Loading