Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finally multithreading! #3

Open
1 of 3 tasks
Akababa opened this issue Dec 17, 2017 · 34 comments
Open
1 of 3 tasks

Finally multithreading! #3

Akababa opened this issue Dec 17, 2017 · 34 comments

Comments

@Akababa
Copy link
Owner

Akababa commented Dec 17, 2017

@benediamond @Zeta36
After many hours of hopeless debugging I discovered locks which are amazing. The overall speedup on my machine is quite a lot, I would say at least 2x.
That being said, I haven't tested it fully and the code is almost completely rewritten/refactored by now, so please feel free to use it and tell me if I missed anything :)
https://github.com/Akababa/chess-alpha-zero/blob/opts/src/chess_zero/agent/player_chess.py

TODO:

  • Testing
  • Ctrl+C stop it
  • C++ implementation, looks lock-bound
@benediamond
Copy link

benediamond commented Dec 17, 2017

Hi @Akababa, this looks great, I'm going over it now.

In the mean time, I noticed your flip_policy step. Could you say more about this?

@Zeta36 I'm wondering if this is the crucial bug that appeared since the DeepMind-style board representation. When you flip the board to orient the features to the perspective of the current player... Then the final NN map onto the policy vector must be flipped back as well!? Furthermore it would be necessary to "preemptively" flip visit count information before feeding it into the neural network, e.g. during the convert_to_training_data step.

@benediamond
Copy link

Also, why did you remove the manual annealing of the learning rate?

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

Hey @benediamond , thanks for commenting!
Btw I'm pushing more optimizations to https://github.com/Akababa/chess-alpha-zero/blob/opts/src/chess_zero/agent/player_chess.py now. It looks like it's working well.

I didn't train for 100,000 steps anyway (the first time lr changes) so it doesn't really matter, I was just experimenting with different optimizers. On the scale of our testing Google's annealing isn't really applicable.

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

yes, I think I did the flipping stuff correctly here, but would really appreciate if you could take a quick look to see if it checks out from your point of view.

If you have code you'd like a sanity check on too I'd be happy to help out :)

@Zeta36
Copy link

Zeta36 commented Dec 17, 2017

@benediamond:

When you flip the board to orient the features to the perspective of the current player... Then the final NN map onto the policy vector must be flipped back as well!?

Well, this is really a problem we didn't take into account. Certainty it may be the cause of the converge failing.

@benediamond
Copy link

Yes. I can't believe we didn't think of this. @Akababa, kudos! I'll be looking through your code making sense of everything. I'll let you know how things work.

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

Yeah that's always a worry in the back of my mind (hence the paranoid asserts). I'm a little confused by the conversation though, is there already a bug found in my code? Or is it a previous one before my implementation?

@benediamond Thank you! Please feel free to write some test cases and sanity checks.

@benediamond
Copy link

@Akababa The point is that I had developed a "DeepMind-style" feature plane input on my own, but I hadn't realized (as you did) that the policy vector needed to be flipped for black. @Zeta36 and I were wondering why it didn't converge. I'll be updating it accordingly as soon as possible.

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

Doesn't the DeepMind input actually use an extra plane to encode the side to move?

The main reason I did this was for fun, and also it might make the network train faster, as I believe it's strictly better than having the color plane and using this transformation to augment the training data.

@benediamond
Copy link

benediamond commented Dec 17, 2017

Yes, they did. You can see my current approach here.

Here is another quick question. It appears that you clear the move tables at the beginning of each call to action. Yet isn't this contrary to the DeepMind approach, where, as they say, after each move the non-chosen portion of the tree is discarded but the chosen one is kept? Here, we will have to build visit counts from scratch each time a new move is chosen. Previously, memory was released only at the end of the game (when self.white and self.black are reassigned in self_play.py).

Also, what do you mean by "losing MCTS nodes...!"?

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

But why is there a need to flip the policy if you are feeding in the side to move?
Yes that was before I read that part of the paper, but even then I'm not sure how move counts from previous transpositions would affect the table. In any case I'm mostly doing this as a "functional" approach to make results and bugs reproducible and make things easier to think about for now.

@benediamond
Copy link

Hmm, I see what you're saying. But that'd be much harder for the network, no? The entire mapping from the convolved input stack to the policy vector will have to be re-learned from scratch for black, in a new way that is totally an arbitrary scrambling of the first. At that point, there is no reason to place the side-to-move on top of the stack, orient the board from the player's perspective, etc... Right?

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

That might be true especially at the beginning, before the model has the chance to learn the rules of chess.

However I think we are doing something similar with the flattened policy output layer anyway. Google's paper does mention that final result (between flattened policy and huge stack of one-hot layers) was the same but training was slightly slower with a "compressed" format, which for us with our 0 TPUs probably means we won't see significant results from scratch for a while.

One thing I considered doing is having two 64-unit FC outputs for to and from square (and maybe ignore underpromotions for now); it might be a little easier for the network to use. BUT I don't know if this would output a sensible probability distribution with regard to softmax and ranking chess moves.

@benediamond
Copy link

By the way, do you know what the "alternative" to the flat approach is? I can't figure out what the "non-flat" approach they're referring to is.

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

Yeah I agree that's unclear. I don't even know how they came up with 4629 possible moves.

@benediamond
Copy link

4672 comes from their 73x8x8 move representation, as described in the arxiv paper. They also mention that they tried various approaches, which all worked well.

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

Yeah my impression is anything we understand won't matter anyway :) All we can do is ensure the inputs and outputs are correct and pray for the best.

BTW are you able to access their nature paper? If not, I got it from my university and can send it to you if you want.

@benediamond
Copy link

On line 21 of your player_chess.py, you reference asyncio despite having deleted the import. Is this intentional?

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

Try checking the new branch, I removed that part and optimized a lot of other stuff.

@benediamond
Copy link

Didn't see that, thanks.

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

@benediamond sorry I didn't see your other comment. I think Python passes the reference to the [] to self.prediction queue so it's all good. You can uncomment the
#logger.debug(f"predicting {len(item_list)} items")
to verify for yourself.

@benediamond
Copy link

Yes, Indeed I deleted it because I figured that out myself just after posting! Thanks.

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

By the way, I'm brainstorming a list of ways to fix the draws by repetition thing, hopefully we can figure this one out.

@benediamond
Copy link

Hi @Akababa, the one thing that seemed to effect this most strongly for me was the change_tau_turn. I would first try setting this value to a very large number (1000, etc.), so that tau never drops.

I've also experimented with a slowly (exponentially) decaying tau.

Using either of these two, I could essentially eliminate draws by repetition.

@Akababa
Copy link
Owner Author

Akababa commented Dec 17, 2017

Thanks, if that works it's a much nicer solution than the stuff I came up with. Did you let tau=e^{-0.01*turn} ?

@benediamond
Copy link

benediamond commented Dec 17, 2017

Yes, essentially. I replaced the parameter change_tau_turn with tau_decay_rate. 0.99 was a good value (very close to e^{-0.01} lol). Then set

tau = np.power(tau_decay_rate, turn)

@benediamond
Copy link

My tensorflow was broken by the latest CUDA update, so it'll be a bit before I can get working again.

@Akababa
Copy link
Owner Author

Akababa commented Dec 18, 2017

What version? I'm on cuda 8 and cudnn 6

@benediamond
Copy link

I've got CUDA 9.1 and cuDNN 7.0.5. still no luck.

@Akababa
Copy link
Owner Author

Akababa commented Dec 18, 2017

I just used the old versions on the TF site. Are the new ones faster?

@benediamond
Copy link

As my machine runs CUDA 9(.1), TF w/ GPU won't work out of the box. rather than attempt to downgrade, I just built from source. That proved to be a good idea, until recently.

@benediamond
Copy link

As for speed, I'm not sure, but using the new versions couldn't hurt, I think.

@apollo-time
Copy link

Multithread is not parallel task in python due to GIL, is it?

@Akababa
Copy link
Owner Author

Akababa commented Dec 25, 2017

No, unfortunately not :( But locks are still faster than asyncio with event loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants