Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General hyperparameters? #16

Open
zizhaozhang opened this issue Sep 8, 2016 · 7 comments
Open

General hyperparameters? #16

zizhaozhang opened this issue Sep 8, 2016 · 7 comments

Comments

@zizhaozhang
Copy link

Hi,

My question seems a bit unrelated. But I am really curious, so sorry for interrupt,

WRN uses a quite different weightDecay and learning rate schedule scheme from fb-resnet-torch used. As the WRN paper mentions, the accuracy of pre-act-Resnet trained with WRN learning scheme gets better results. So I think this hyperparameter setting is quite good and generalized.

Recently, I am using WRN code to train a new method, DENSELY CONNECTED CONVOLUTIONAL NETWORKS (DenseNet) http://arxiv.org/pdf/1608.06993v1.pdf, but the error is larger than the accuracy trained using fb-resnet-torch code (5.2 vs. 4.1 (original paper reported)).

I understand the hyperparameters may vary from model to model. But by so many tests of WRN, I think this setting should not have obtained more than 1.0 increased error rate. WRN paper does not discuss much about how they select the hyperparameters.

I am not sure if you are familiar with this new method (DenseNet), could you comment on this situation?
In addition, could you provide more details about how you select the hyperparameters instead of using fb-resnet-torch code's settings. It will be very helpful for us to train some modified architecture based on WRN to delve better hyperparameter settings.

Thanks a lot!

@szagoruyko
Copy link
Owner

@zizhaozhang that's probably related to #17, DenseNet doesn't use whitened data

@ibmua
Copy link

ibmua commented Sep 26, 2016

OMFG! Thanks for that paper, it's totally the same concept as I've recently figured out and had been testing. NNs grow up by days, not years right now.

@ibmua
Copy link

ibmua commented Sep 26, 2016

Yes, as far as I see from their pictures, it's nearly identical to my hoardNet, so you might as well just use my code, maybe, modify it a little bit.

https://github.com/ibmua/Breaking-Cifar

Check out the "hoard" models. Major parameters there are "sequences" and "depth", though, I think "2-x"+ models I've designed to be run with depth=2. Earlier models are more generic. But the thing should be easily tweakable, has a clean code. Mind that it has 4-space tabs.
Though, I haven't read the article yet, I'm guessing that some details may be slightly different, though. Also mind, that you want to clone the whole thing - just importing a model into Sergey's WRN won't work.

BTW, hoard-2-x is that model that I referred to as possibly being comparable to WRN in terms of performance/parameters. IMHO HoardNet sounds like a more meaningful name for this thing =D The info is being hoarded without discarding it like in usual architectures. Accumulation would be a less reasonable way to call this, because it sounds more like something ResNets do. DenseNet doesn't seem too reasonable of a name to me.

https://github.com/ibmua/Breaking-Cifar/blob/master/logs/load_59251794/log.txt log from near-end of that training where I had 19.5% on my [0..1]-scaled Cifar-100+

@ibmua
Copy link

ibmua commented Sep 26, 2016

Their code is also available. At https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua . They are using preactivation, which is different from what I've used. I thought that that may be beneficial, but that needed more testing and I don't have too many resources. =) That's a lot more expensive, though. But the difference may be well worth it. Some other things I've designed differently may actually work better, I think. I think I took a bit from Inception & InceptionResNet when they've only took from ResNet.

@zizhaozhang
Copy link
Author

@szagoruyko I see. I will have a try of that. It is really trick. Thanks @ibmua for observing that. I have tried so many different tests using WRN code to train DenseNet, ignoring this part.

A little different is that @ibmua uses [0,1] scaled data and DenseNet uses mean&std as fb-resnet.torch does. Which do you think is better?

@ibmua
Copy link

ibmua commented Sep 29, 2016

Mean+std is likely a little bit better.

@zizhaozhang
Copy link
Author

Cool. I will check your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants