Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong conclusions #23

Open
ibmua opened this issue Nov 7, 2016 · 1 comment
Open

Wrong conclusions #23

ibmua opened this issue Nov 7, 2016 · 1 comment

Comments

@ibmua
Copy link

ibmua commented Nov 7, 2016

Just took another look at https://arxiv.org/pdf/1605.07146v1.pdf

To summarize:
• widening consistently improves performance across residual networks of different
depth;
• increasing both depth and width helps until the number of parameters becomes too
high and stronger regularization is needed;

Actually, the first conclusion contradicts the second in a way, but that's not the point.

You may really want to try running tests on MNIST with absolute 0 amount of augmentation as a test on which it's actually easy to overfit. Lesson learnt from MNIST for me was that it's more like there's an optimal width & height which is actually rather low (for such a task). And also that the standart blocks/activation scheduling (standard "preact") may not always be optimal and that groups (like in https://arxiv.org/pdf/1605.06489v1.pdf ) are hugely beneficial at least up to some amount of them.

I was able to achieve .25% peak error rate pretty easily and my best arch was pulling out same peak, but also .26% error through lots of epochs, which was rather hard to get here considering that this is a very high precision already, so the across-epoch fluctuations are relatively high. This was without any parameter smoothing, like moving average.

@ibmua
Copy link
Author

ibmua commented Nov 16, 2016

Cifar performs pretty awfully with that arch, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant