General hyperparameters? #16

zizhaozhang · 2016-09-08T22:06:53Z

Hi,

My question seems a bit unrelated. But I am really curious, so sorry for interrupt,

WRN uses a quite different weightDecay and learning rate schedule scheme from fb-resnet-torch used. As the WRN paper mentions, the accuracy of pre-act-Resnet trained with WRN learning scheme gets better results. So I think this hyperparameter setting is quite good and generalized.

Recently, I am using WRN code to train a new method, DENSELY CONNECTED CONVOLUTIONAL NETWORKS (DenseNet) http://arxiv.org/pdf/1608.06993v1.pdf, but the error is larger than the accuracy trained using fb-resnet-torch code (5.2 vs. 4.1 (original paper reported)).

I understand the hyperparameters may vary from model to model. But by so many tests of WRN, I think this setting should not have obtained more than 1.0 increased error rate. WRN paper does not discuss much about how they select the hyperparameters.

I am not sure if you are familiar with this new method (DenseNet), could you comment on this situation?
In addition, could you provide more details about how you select the hyperparameters instead of using fb-resnet-torch code's settings. It will be very helpful for us to train some modified architecture based on WRN to delve better hyperparameter settings.

Thanks a lot!

szagoruyko · 2016-09-26T16:37:45Z

@zizhaozhang that's probably related to #17, DenseNet doesn't use whitened data

ibmua · 2016-09-26T17:05:29Z

OMFG! Thanks for that paper, it's totally the same concept as I've recently figured out and had been testing. NNs grow up by days, not years right now.

ibmua · 2016-09-26T17:24:22Z

Yes, as far as I see from their pictures, it's nearly identical to my hoardNet, so you might as well just use my code, maybe, modify it a little bit.

https://github.com/ibmua/Breaking-Cifar

Check out the "hoard" models. Major parameters there are "sequences" and "depth", though, I think "2-x"+ models I've designed to be run with depth=2. Earlier models are more generic. But the thing should be easily tweakable, has a clean code. Mind that it has 4-space tabs.
Though, I haven't read the article yet, I'm guessing that some details may be slightly different, though. Also mind, that you want to clone the whole thing - just importing a model into Sergey's WRN won't work.

BTW, hoard-2-x is that model that I referred to as possibly being comparable to WRN in terms of performance/parameters. IMHO HoardNet sounds like a more meaningful name for this thing =D The info is being hoarded without discarding it like in usual architectures. Accumulation would be a less reasonable way to call this, because it sounds more like something ResNets do. DenseNet doesn't seem too reasonable of a name to me.

https://github.com/ibmua/Breaking-Cifar/blob/master/logs/load_59251794/log.txt log from near-end of that training where I had 19.5% on my [0..1]-scaled Cifar-100+

ibmua · 2016-09-26T18:14:00Z

Their code is also available. At https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua . They are using preactivation, which is different from what I've used. I thought that that may be beneficial, but that needed more testing and I don't have too many resources. =) That's a lot more expensive, though. But the difference may be well worth it. Some other things I've designed differently may actually work better, I think. I think I took a bit from Inception & InceptionResNet when they've only took from ResNet.

zizhaozhang · 2016-09-29T14:08:33Z

@szagoruyko I see. I will have a try of that. It is really trick. Thanks @ibmua for observing that. I have tried so many different tests using WRN code to train DenseNet, ignoring this part.

A little different is that @ibmua uses [0,1] scaled data and DenseNet uses mean&std as fb-resnet.torch does. Which do you think is better?

ibmua · 2016-09-29T15:16:35Z

Mean+std is likely a little bit better.

zizhaozhang · 2016-09-29T18:22:51Z

Cool. I will check your code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General hyperparameters? #16

General hyperparameters? #16

zizhaozhang commented Sep 8, 2016

szagoruyko commented Sep 26, 2016

ibmua commented Sep 26, 2016

ibmua commented Sep 26, 2016 •

edited

Loading

ibmua commented Sep 26, 2016 •

edited

Loading

zizhaozhang commented Sep 29, 2016

ibmua commented Sep 29, 2016

zizhaozhang commented Sep 29, 2016

General hyperparameters? #16

General hyperparameters? #16

Comments

zizhaozhang commented Sep 8, 2016

szagoruyko commented Sep 26, 2016

ibmua commented Sep 26, 2016

ibmua commented Sep 26, 2016 • edited Loading

ibmua commented Sep 26, 2016 • edited Loading

zizhaozhang commented Sep 29, 2016

ibmua commented Sep 29, 2016

zizhaozhang commented Sep 29, 2016

ibmua commented Sep 26, 2016 •

edited

Loading

ibmua commented Sep 26, 2016 •

edited

Loading