-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gitteh needs a primary maintainer! #68
Comments
@FrozenCow @lushzero @jmendeth @ben @iamwilhelm - Who's it gonna be fellas? :) |
Does it need to be just one?
I have some overall experience in it, I've mostly written Robotskirt (Node.JS wrapper for the now-frozen Sundown library), as you can see in the contributors page; from that I derived the V8U set of microutilities for writing V8 addons. And some other small modules, like parport.js (high-level access to parallell ports from Node). But I wouldn't like to hold that responsibility alone. PS: @ben is already a maintainer. :) |
I have experience with C++ and I have experience with Node/JS. However I don't have any experience writing C++ addons for Node (apart from the PR I did here), so I don't think I'm fit for the job.
I agree, 2 or more would be best. |
you'll get experience from gitteh! :D |
@samcday That's good to hear. I'd hate to see it languish. I have a vested interest in making sure gitteh is up to date and up to par, as I use it in cubehero.com. However, I haven't written a node native module before, I'm not against spending the time to learn, however. I think as one of the maintainer, I'd probably start off doing things like making sure documentation is up to date, and making sure PRs and issues don't just sit there. |
That's the spirit! It's clear that we need several maintainers. 😃 |
I'm afraid I have to bow out for the maintainer role. I'm a bit overcommitted just now, and more projects are coming down the pipeline. I'll keep watching this, and feel free to summon me for libgit2 questions. |
I understand. But don't worry, we'll still @ summon you. ;) |
I'm pretty much out on gitteh at this point and working on sgit. To me a lot of the fundamental technological decisions made in gitteh were sound at the time they were made but not now. Libgit2 is very different now so they don't make sense anymore, i.e. C++ async, thread handling and thread locking which is 70% of gitteh's code. |
@lushzero Fair enough, I'm not sure I'd want someone working on gitteh if they make statements like that without backing it up with proper reasoned arguments ;) @jmendeth @FrozenCow @iamwilhelm Thanks for indicating interest. I wrote up the initial issue pretty fast, and didn't mean for it to sound like it precludes more than one active maintainer. What I meant by it is I need someone (or someones, hehe) to be involved in gitteh enough so that it keeps moving along, as it's not really something I realistically have time for. That said, I plan on remaining involved, and may be able to contribute some more features over the coming weeks / months. We'll see. @iamwilhelm I didn't actually realise you were using gitteh in anger somewhere like that. That's pretty cool! Gitteh is actually something I wrote for a project I was working on a while ago, but I've since scrapped that project and I think that's one of the biggest reasons why I haven't had much time for gitteh. @ben Understandable! Okay so the guys who are interested, sounds like we need to divvy up some responsibilities. Here's a quick rough cut of what might work:
Whatcha think? |
Yeah, that sounds great to me. But I think that, before trying to start working, we should set a simpler There's also things we can simplify on gitteh's code, now that LibUV is |
Since @lushzero was working on a fork, out of interest I thought I'd look around whether there were other git libraries for node based on libgit2. I found https://github.com/nodegit/nodegit/tree/wip . Does anyone know anything about that project? They seem to do things very similar to what node-gitteh does and still active. I'm not sure how far along they are, but they do have treebuilder stuff in there, which was something I wanted to put in gitteh too. With sgit, there are 3 projects doing the exact same thing with slight in their API and the way it works. I'm a bit torn at what to do here. Focus our efforts on one project? Keep developing multiple projects side-by-side? Collaborate a bit? I'm still interested in the new methods @lushzero is suggesting, because the current way to implement methods in gitteh is quite hard as a beginner. With the code being in a better state, it would be easier for people to contribute. That being said (back on-topic), I don't mind being a maintainer, though I do not know how long it will be of interest for me. I'm working on a project for myself that uses gitteh and for that I'm adding features that are needed, but I don't know how long that project will last. I'll likely lose interest/become less active once that project dies. So, short-term: yes. long-term: I'm not sure. |
My thoughts:
PS. Sorry for my ugly English. |
I thought I might introduce myself as I'm now making active contributions to nodegit. I'm the author of the The wip branch is a rewrite of nodegit from the ground up. It uses codegen to generate the node bindings automatically, from a JSON description of the API. The wip branch is, I believe, already more comprehensive than both nodegit/master and node-gitteh. IMO, codegen is a better approach than writing bindings by hand because it requires less manual work to add support for a new function -- and libgit2's API is huge! Other advantages include: a bug fixed in one place is a bug fixed in all places, etc. This All of that "convenience" is implemented in Javascript so people can make Even the 1:1 "raw" API is not as bad as it sounds. It's still object-oriented in style, and is asynchronous exactly and only where libgit2 (potentially) does blocking IO: git.Repo.open(path.resolve(__dirname, '../.git'), function(error, repo) {
if (error) throw error;
repo.getCommit('59b20b8d5c6ff8d09518454d4dd8b7b30f095ab5', function(error, commit) {
if (error) throw error;
commit.getEntry('README.md', function(error, entry) {
if (error) throw error;
entry.getBlob(function(error, blob) {
if (error) throw error;
console.log(entry.name(), entry.sha(), blob.size());
});
});
});
}); Note that in the above, code, nothing is part of the "convience" API -- everything is 1:1 and automatically generated. Preliminary documentation of this I'd like to explore the idea of whether we can merge these two projects. That is, whether we can just take the codegen rewrite of nodegit, and once it is stable and well documented, call it libgit2/WHATEVER and deprecate the previous two projects. One project has the benefit of: one implementation, more maintainers, fewer bugs, better documentation. So that's my hope. |
Hi @nkallen, that explains many things (I remember You see, Gitteh now wants to remodellate itself and needs some people who care of it (reason of this thread). So I think the best approach would be to let @samcday decide, then wait a bit so we can simplify and get things to stability again, and then we would all debate about the fusion you're proposing ---I'm serious about getting a single project with all our efforts combined. Conclusion: we need some time to get Gitteh itself to normality, and then we can talk about that change. :) My thoughts. |
@nkallen I find the codegen approach interesting. I guess it sort of leads into the main goals of a binding project and what individual developers want to use it for. I have been making pretty swift progress on node-sgit, https://github.com/DerpKitten/node-sgit, and ended up with something relatively high level. My needs are to be able to create repositories, commit files to them and then get history and revisions of files. What particular codegen tool are you using? I am really curious as to what everyone else's needs are and would a higher level or lower level approach be a better fit for them? I do concur that finding someone to combine efforts would make sense if it can be worked out. Although I could definitely see a situation that results in 2 binding modules, one low level and one higher level. |
@lushzero Oohh I googled
I'm also curious as to what benefits node-sgit gives and how much precision it offers over the generated bindings. Performance is also an important matter.
Me too. |
@jmendeth I don't think this is something that needs me to make a decision on actually! For one, I'm putting gitteh out there for some new people to take ownership of, so my vote means a whole lot less anyway. Second of all, I think @nkallen's approach sounds very promising, as maintaining C++ bindings are tedious and error prone. If the general consensus from those who's opinions I value (basically everyone in this thread) is that a codegen approach is better, then I think that we should go for it. If we can deprecate gitteh and nodegit in favour of a unified codebase, vision and group of contributors, I think that's the best possible outcome for the Node community that just wants to load a couple of commits / blobs and walk a reflog or two :P |
Don't get me wrong, I just would like to know i.e. the performance impact, modularity change, API compatibility, etc. of such a big change, before jumping into it. But if you guys see it clear, let's go on! :) |
EDIT: replies going fast, this was in a response to @samcday ;) I agree with what @jmendeth just said. Sounds good. We could also decide to start in a separate project (like nodegit's wip branch) and move it over under the libgit2 umbrella once we think things are stable/settled. We could also decide to stay api-compatible with gitteh and keep everything under the same name. People who were using gitteh can keep using gitteh that way. Anyway, this is a pretty substantial decision. I still need to try nodegit's wip branch by running it in an application. That should give me a better idea what to do next. |
That's one option. The other is to just deprecate this and focus our efforts on If we were to deprecate gitteh, we could write a last gitteh version module that adapts
👍 |
Could anyone else chime in with their requirements. Mine are pretty simple but were without much or reliable support in any of the existing modules I tested. I settled on a very simple api that is probably closer to the command line git than anything else. Super simple, no nesting, calls are self contained. The result is easy to learn, use, extremely ?fast? and light on memory. https://github.com/DerpKitten/node-sgit/blob/master/example.js . I'm not sure if my use cases are anything like anyone else's though. This isn't a reflection of the qualitative value of nodegit, just the fact that it is much lower level but the "log" case in nodegit is roughly 38 lines (see API example, https://github.com/nodegit/nodegit) to only 6 in sgit. In both cases the timing is quite fast. Across ten runs against the same .git repo (sgit) with 479 commit log entries I get for sgit: 27541637 median nanoseconds and for nodegit 107501468 median nanoseconds. Sgit is marginally faster (about 70 million nanoseconds) but it's neglible for any real world use and the benchmark methodology is imperfect (process.hrtime). Mostly the difference is in the approach I think, nodegit loops over each, sgit gets them in a large batch. If someone can provide a working example of the "log" case with gitteh I can time that as well. To me it's sort of meaningless to talk about performance in abstract without talking about actual numbers and applicable use cases. From my read everything anyone here is talking about is pretty damn fast, there's not all that much road between the JS call, the C++ wrapping and the underlying C library no matter how you cut it. I don't think performance is going to be a big issue for anyone but please explain how that thinking is wrong. My suggestion is that if we call all agree on the rough sketch of what the node API should look like we can then decide what backend(s) configuration makes the most sense for most of the people most of the time. |
Hi guys, I'm excited you are going to consider the codegen approach as an option. I agree with the commenters above ( @jmendeth , @FrozenCow) that you guys should evaluate it thoroughly before making a commitment. It's still a work in progress, and an outside evaluation is exactly what I need to get the quality to a high level. I'm sure you'll find bugs and have suggestions for improvements, but I'm confident that after I incorporate your feedback, you'll find the functionality comprehensive, stable, and performant. Whoever is willing to volunteer to beta-test it, I'm available to help. The documentation is a bit sparse, as I said, but you can read the examples linked to above and also the source code to this application I built http://gitdb.elasticbeanstalk.com/repos/gitdb/refs/heads/master/tree/ which uses it. Making the decision to adopt codegen won't be easy. The API will break. Yes, a compatibility layer in Javascript can be written -- and it's "easy" to do so, if incredibly tedious. If I may "chime in" with some requirements, I think the goal first and foremost should be to support as large a portion of libgit2 functionality as possible. That supports the widest variety of use-cases, obviously. A good reason not to have 3 separate projects is that libgit's API is large, and getting decent coverage is a lot of effort. For example @lushzero , here is a pretty terse way to do a git.Repo.open(path.resolve(__dirname, '../.git'), function(error, repo) {
repo.getMaster(function(error, branch) {
var history = branch.history();
history.on('commit', function(commit) {
console.log('commit ' + commit.sha());
console.log('Author:', commit.author().name() + ' <' + commit.author().email() + '>');
console.log('Date:', commit.date());
console.log('\n ' + commit.message());
}).start();
}); Again, following the same approach: nearly all of libgit2's functionality is directly available in javascript-land, and the API is as friendly as you want by writing wrappers in javascript. The reason libgit2 doesn't JUST have a simple That said, your approach has, at least in theory, higher throughput because it crosses the runtime boundary less often and it has less context switching (because fewer items are put in the libuv event queue). On the other hand, your approach doesn't interleave well. It hogs the libuv thread while it looks up all of the commits -- each of which is potentially a random access on disk -- which means that a concurrent application (like a web server) will have extremely latency variance as disproportionately large jobs clog up your thread pool. In summary, I think libgit2's API strikes the right balance of small, compose-able tools that provide maximum flexibility. Making it directly available in Javascript (in an OO way) provides the greatest functionality and flexibility. The only downside of the 1:1 mapping is that it means crossing the runtime barrier and context switching more often -- potentially a performance hit. However, I'm confident I can address performance issues as they arise, and my preliminary benchmarks suggest that preloading data/minimizing runtime boundary crossings provides a minimal benefit: |
Exactly. That's the big inconvenient I see in |
@nkallen Looking at the generated (because they're autogenerated?) nodegit C files, I see it uses Boost to escape values, joins them into JSON and passes that to the JS side. It'll be interesting to see the performance of that against what a manually written module such as Gitteh. :) I must have some benchmarker around, will try to adapt it to nodegit. |
The generated code doesn't use boost. See https://github.com/nodegit/nodegit/blob/wip/src/revwalk.cc This uses a custom code generator so the code can generate any style of C++ On Sep 2, 2013, at 4:15 AM, Xavier Mendez [email protected] wrote:
|
@samcday thanks! Gitteh did cause me to burst veins sometimes, when the lib didn't install or build on npm, but that's over now. Re: documentation How did you plan on generating the documentation? Did you have a Cake task somewhere, or have a particular preference? Should we move the documentation talk to a different issue, and leave this thread for handover talk? |
@samcday I think we look to you, since you're the current maintainer, and you actually have an informed opinion of what might work for gitteh. You may not have to make the final decision, but you will probably have to guide the decision, or else the project will languish from indecision. |
When it comes to needs and requirements, my needs are the following:
|
@iamwilhelm I'm pretty much inline with this view, I'm looking for an API much closer to the CLI git usage than the actual plumbing of libgit2. After doing some additional testing one of my concerns with the codegen approach seems to be true. It opens the doors to a lot of memory leaks. Node itself is not awesome in that regard, but I don't see how without a lot of manual coding on the C++ bindings and without additional attention on the JS side in relation to those you can keep that under control and that sort of defeats the purpose of using codegen in the first place. A cursory review with valgrind that should not be taken as gospel shows in the log example I noted earlier that nodegit leaks: 1 run: 50 runs: versus the sgit approach to the same log function: 1 run: 50 runs: It's not a completely fair head to head comparison given that the two approached are diametrically opposed but it's the reason I chose the approach I did in sgit. My use case can't survive substantive memory leaking. It's not that the nodegit approach can't be fixed either, it's that it will take as much effort as simply writing quality C++ bindings in the first place and exposing release functions. That I also see as a problem though, node and JS in general does not have a lot of convention to handle the manual release of resources. I think that will be a problem to the extent that any references and pointers to libgit2 items are brought all the way into the JS side. I can see a few cases were very limited use would make sense but I don't see how to avoid doing it wholesale in the codegen approach. I'm generally a big fan of "automatic" bindings and was hopeful here but the data has convinced me. I'm a huge fan of SWIG and such but I just don't think it makes sense in the node/git case or at least as it relates to my use of it. If there is a compelling counter-argument I'm all for hearing it. Libgit2 has roughly 500 functions of which I could potentially see about 30 needing direct access from the JS side. Maybe even as few as a dozen. So to me my investigation leaves a couple relatively straightforward technical items that can determine how best to proceed and combine efforts.
I'm not only advocating for sgit, I'd be just as happy to contribute to nodegit. Bottom line for me I need a library that can create repos, commit to repos and pull revisions/history (log) ASAP and I need it not to leak memory. I don't really care how that gets accomplished and would prefer a solution that has multiple contributors. On the performance level I think all the solutions here can be made to perform more than adequately for normal use cases. |
Hmm, I wonder what is causing the memory leak since both sgit and nodegit are leaking. Could you share the test-case? It seems that sgit doesn't do asynchronous calls, since it's using v8 types inside the same function where libgit2 is called. If that's indeed the case, I think it's skewed to compare the two projects at the level of memory or performance. Asynchronous calls are vital for node libraries, even though they do cost more cpu/memory. From what I can tell nodegit frees its libgit2 objects once the garbage collector of v8 cleans up its objects (in the destructor of its ObjectWrappers). I could see this as being one potential reason why you're seeing memory leaks: v8 might not delete all objects cleanly if the process is killed before v8 cleans up the garbage. It seems nodegit does do its freeing consistently for all its objectwrappers, but only if a libgit2-free function is defined. This sounds like a good choice: it assures all objects are freed correctly. It might be a good idea to test the memory footprint by walking the tree a great number of times and give the GC some time to clean up the garbage. I'm also interested in whether valgrind also shows memory leakage when creating a great bunch of objects in node without using any nodegit/sgit/gitteh libraries. Anyway, maybe we should do this memory leak discussion on nodegit's issue tracker. That all being said, I do think it's better to have a minimal layer between node and libgit2. That assures us we can use all functionality that's in libgit2. The code generation allows keeping the bindings up-to-date with libgit2 more easily and assuring everything works somewhat the same (no forgotten free's for example). We can focus on getting these low-level bindings stable. My use-cases mostly requires creating bare repositories, creating commits (based on previous commits/trees), retrieving history, retrieving trees, retrieving files, but all without an index/working directory. That's short-term and I can use what's in gitteh for now. Long term I see use in merging, pushing/pulling and preferably streaming blobs (for saving and retrieving files directly, instead of buffering things first in memory). |
@FrozenCow I would urge you to conduct your own tests, it's pretty straightforward. My examples were using the nodegit one from it's api readme (Git Log Emulation) and the log example for node-sgit. The command line I was using for valgrind was "valgrind --leak-check=full --show-reachable=yes node test.js " The nodegit example does not appear to use async (on the JS side) and I was using sgit without async either. The scripts did nothing but run the example in a for loop and neither printed the results out, only the api calls were made. I ran them against the node-sgit repo which is roughly 500 commits. As I understand it the node/v8 GC should not be depended upon to cleanup external resources -- see https://code.google.com/p/v8-juice/wiki/CommentaryOnV8#No_GC_guaranty . In the test cases I pointed out I'm pretty sure the GC wouldn't be able to cleanup those items anyway. For node, my point is that you're going to have to write a bunch of C++ cleanup code -- or even a borderline custom "GC", unless you want to push all the release work to the JS end user -- to address the fact that the node GC may never run. That's just as much work for 12 to 30 api_calls as just writing those non-automatically (not codegen) to begin with. I see your use cases encompass perhaps 12 or less api calls as well. Is anyone else in favor of a non codegen approach? My mind is made up at this point, I'm happy to let others contribute to nodegit/gitteh/etc but neither meets my needs today and I forsee months of work to iron out issues that seem apparent to me. I'd be happy to start adding other peoples api needs to sgit if you're willing to test and contribute. If you can point to a roughly working example of your need in C, it can be added to sgit in about 15 minutes. |
So, I'm not getting you... the V8 garbage collector will free most JS objects, unless they have been explicitely persisted in the C++ side using a Other than that, yes, the GC should be expected to remove any V8 objects which have no references.
If you aren't doing those two things correctly, you are leaking. That's it. |
You don't need to do that. You can call Node with the
I'm also interested, since I looked extensively at node-sgit's code and couldn't find any leaks in the log function. Ideal process would be:
|
@jmendeth I was using the code you identified with the exception that I was pointing to an instance of the node-sgit repo instead of the test sandbox repo. I'm not sure valgrind is the absolute best tool to use for memory leak detection in node, something like this is probably better: https://hacks.mozilla.org/2012/11/tracking-down-memory-leaks-in-node-js-a-node-js-holiday-season/ but I'm not versed in it. Node has a tendency to leak memory nearly everywhere for many reasons, but not least of which is closures. Here is perhaps the most famous example: vvo/node@e138f76 V8 has a fundamental tradeoff in that the GC is NOT called until there is memory pressure. You need to make sure that long lived code will not unnecessarily "leak" prior to the GC being called or you end up with a very disadvantageous performance profile I would refer to as "boom and bust". The effect is exaggerated on larger systems as the scale of memory beings to greatly outstrip the speed of the cpu's. I think you are oversimplifying the complexities of memory management a bit across a stack that encompasses C, C++ and V8 (node) see https://developers.google.com/v8/embed as a start. I would say as a general observation that if you are depending on the node GC to cleanup after your addon module you are going to have a bad time. |
Yes, similar to Java's GC.
I may have explained myself bad, but I had already read that page several times and I understand it (or at least I think so!) correctly. Yes, V8 values are freed when they have no references to them, and by "references" I mean JS and C++ handles. Local handles are deallocated when the scope exits, so you shouldn't have problems with that. Persistent handles have to be So if a single call to my API creates lots of V8 handles but doesn't leak them (i.e. they have no extra references that avoid them being GC'd) everything should be okay?
Does that mean that I should for example have a |
Correction: I explained myself bad again. V8 values are ready to be freed on the next GC round when they have no references pointing at them. |
I think I've "lost sight of the forest for the trees", so please forgive a few questions. There is no question that there are a few memory leaks in the generated code -- just as there is no question there are a few bugs -- this is a work in progress! So please bear with me. I have some valgrind data that I'll share below, but I wanted to make sure I understood a few of your criticisms first. Debugging memory leaks isn't super-obvious in node.js -- you definitely need to ensure that you call the garbage collector It's not categorically different to do memory management in generated versus hand-written code. The generated code generates @lushzero you are totally right that garbage collected runtimes have "boom and bust". It's the same in the JVM. However, this is a fundamental "limitation" of node.js -- I don't see how you can get around it. You can call
-- those strings need to be gc'd by v8, and if the user parses those strings with Another thing I'm unclear about is the 'async' issues you refer to, @lushzero . As you know there is no 'locking' or 'threading' in javascript -- this only applies to C++ bindings. libgit2 does blocking IO, and thus many functions need to be made async (i.e., pass the baton to the libuv threadpool). They need to be async because node.js is a single-threaded event-loop; if you do blocking IO you will cause disastrous performance. Currently, As a philosophical note, I'm interested in providing the majority of libgit2's functionality. I am building a library, like libgit2 itself, where as as @sam said,
This is not possible with a few dozen functions. This requires several dozen functions. I understand why you think you don't need this, but if you were to actually consider all the things that the git command line tool can do (like rebasing, merging with different strategies, creating tags, manipulating the index, making commits without an index, and on and on) you would realize you need a lot of functionality! The libgit2 people aren't just adding features for no good reason. True, 20% of the libgit2 functions are C-idiomatic, like I'm also curious why you said, @lushzero ,
Isn't it a bit premature to have already made up your mind when you (like me!) are still learning about memory management and concurrency in node.js, when your preliminary valgrind tests may not be accurate for reasons pointed out by @jmendeth , and when you haven't seen whether it takes a long time to fix memory leaks and bugs in generated code or not. Maybe it'll take forever! Maybe it'll take me a day or two. Are you sure you have enough information to make up your mind? In conclusion I'd like to share some valgrind data. Check out this gist: https://gist.github.com/nkallen/6430565 -- this provides a repeatable valgrind testcase. You'll note, after 20 runs:
This isn't so bad is it? And as far as I can tell, none of these "leaks" have to do with nodegit. They're internal caches of libgit2 and objects that are internal to node.js. But I did fix a couple memory leaks today! Check out nodegit/nodegit@7f9f3c4 and nodegit/nodegit@79f1115 I think the thing to take away from these commits is fixing memory leaks is not only fairly easy, but it can actually be done systematically: fix it in one place, fix it in many places. |
@jmendeth I don't want to sound like an expert on V8 memory management because I'm definitely not. Also the terminology here becomes a complete mess with a word like reference which means one thing in C, another in C++ and another in V8. My point about "boom and bust" may not apply to node 0.10 as incremental deallocation seems to be employed in some circumstances now as of 0.10 or maybe even 0.8. In short closures, wrapped objects and failure to release memory allocated inside the addon all make it really hard and in many cases impossible to depend on the GC for deallocation. Valgrind is not the be all and end all here, it sounds like the tool I pointed to in my last post is much more of a gold standard for node. Valgrind just tells me enough to say that there appear to be some very difficult problems in the codegen approach with regard to memory management.
if you aren't calling the various git_repository_free(), git__free, etc, then definitely yes those have to be called somewhere. Resources don't deallocate themselves GC or not. Having a manual free method on the JS side is possible but very un-node like. This is my point about having to create almost a, this is not really an appropriate term but I lack a better one: "virtual GC", in the addon code. I guarantee that will take more time and energy then simply coding a dozen or two api functions in C++. @nkallen Move the for loop to encompass the git.Repo.open call and move the global.gc call to run at the end of each loop and that will match my numbers. The gc call doesn't make much difference. I tried both the regular and wip branch, wip was better but not by a whole lot. It's just my opinion but I would posit that this quote "...fixing memory leaks is not only fairly easy, but it can actually be done systematically: fix it in one place, fix it in many places." will haunt you to the ends of the earth ;) I recall a long time ago saying something like that about a project and it haunts me still. My mind is made up in that I need to have something passable soonish and I can easily see chasing edge cases for a long time. Yes the benefit to codegen is that you "fix it in one place, fix it in many places" but then you also need to start branching out to address edge cases and pretty soon the code complexity is through the roof. If ligbit2 was a v1 final well established library and you said, let's use codegen once and patch on top of the result I don't really see a problem with that. But libgit2 is due for plenty of changes itself and I fear how that is going to mesh with a codegen approach. You said "... but 80% are useful in javascript land." regarding the libgit2 api calls. I think that is madness, I can see maybe 3-5% being reasonable to expose to JS/Node. Imagine if zlib exposed it's 90 odd api calls instead of the two dozen it does in node. That just doesn't fundamentally make sense to me, Node/JS is a very high level language. We can agree to disagree about the level of abstraction. The command line git has 22 options including help, everything else is parameterized. That would be a rough approximation of the maximum API I would be looking at in sgit, mostly flat with parameterized options. |
I think you meant another @sam. -Sam On Sep 3, 2013, at 6:27 PM, Nick Kallen [email protected] wrote:
|
@lushzero Here is the valgrind output for the loop moved outside the opening of the Repo. gc in the same place --where it should be. https://gist.github.com/nkallen/6431776 As you can see there is essentially no change. No memory leaks.
As far as I can tell, you have made a mistake in your profile. At this point, you must admit that Again, I don't really understand your arguments against code-gen. You seem to be bringing irrelevant considerations into the discussion. What do closures have to do with codegen versus non? What does freeing locally malloc'd objects have to do with codegen? Everything code-gen And you've provided no evidence nor arguments why wrapped objects don't work. They are garbage collected just as every other object in the js world is; they are in fact a standard way to do library bindings in the node.js world. I'm not sure what your "haunting" experience has to do with this case. You seem to be promulgating fear rather than an argument. |
@nkallen Here is substantially similar to the original code which I did not save. Rerunning it just now produces results very close to what I posted earlier. https://gist.github.com/lushzero/6432272 The exact command line I used last was: valgrind --leak-check=full --show-reachable=yes node --nouse_idle_notification --expose_gc gist.js > log2 2>&1 I feel I've adequately expressed my concerns regarding the codegen approach and why I will be pursuing something else. You seem to be taking this personally and that is in no way my intention. I'm essentially betting that the headaches from upstream libgit2 changes and chasing a variety of edge cases including memory management in the codegen case will be a lot more work than coding a flat set of self contained API calls manually, that's it. Also we don't seem to agree at all on what the user facing API should look like, fair enough. In the simple example I tested that reflects one small element of my immediate use case nodegit leaked between 100K and 2.5MB depending on the branch and other parameters (such as using the non wit branch readme api example code) in a basic 50 loop cycle of the script. "Leak" mine as well be substituted with "used" for my purposes depending on the resolution or the need to depend on the GC. I may not have been choosing the optimal example code but I went with what was posted on the nodegit readme's which is why I chose that example to begin with. I'm not saying that it can't be fixed, I see you did commit several patches for memory management already today, I'm just saying that amongst the other reasons I've noted above I don't see nodegit as a workable solution to the needs I have. Node-gitteh also is not a good fit as the binding code is a little clumsy, it lacks needed functionality and seems not to have been vetted for memory leaks or consumption/use either. Finally let me say I started yesterday with hope that nodegit would be a good fit for my needs but my own testing, however flawed, led me elsewhere. If you are insisting that there aren't any memory issues in nodegit as it stands so be it, everyone can run their own benchmarks and decide for themselves, which is what I suggested in the first post in this thread. Valgrind appears not to be a good tool to hunt and resolve memory leaks in a node application. I used it because it was handy. Most people seem to think https://github.com/lloyd/node-memwatch is the best one. I feel as if this thread has gone wildly off track and a good deal of that is my fault for which I apologize. I would suggest further discussion of memory specifics and benchmarking be moved to a thread on the nodegit issues list or a new thread in this forum. |
I apologize for taking this personally. I hadn't had enough to eat and my temper was acting up. FWIW, there is a bug in your benchmark and so you are being mislead by your data. You have the following code: for(i=0;i<50;i++) {
git.Repo.open('/root/lushzero/node-sgit/.git', function(error, repository) {
...
}
global.gc();
} The place you put the process.on('exit', global.gc) You'd ensure that the garbage collector runs just before valgrind, which is the only way to get accurate data out of valgrind. For future benchmarks, please skip any versions of nodegit other than the |
So I coded a little program that will asynchronously run that You should totally try it out: https://gist.github.com/jmendeth/6434819 In my test, the repository is only opened one time. Everything My resultsI have tried to put
You can see the results here: https://gist.github.com/jmendeth/6440106 |
@jmendeth -- This script is awesome though, thanks! I'd love it if we can continue this discussion over here nodegit/nodegit#84 . As @lushzero rightly mentioned, at this level of technical detail we are derailing the broader discussion. Some quick observations:
I don't know how to force a full gc, but if you incrementally call gc() as in https://gist.github.com/nkallen/6442106#file-gistfile1-js-L4 , even for 5000 runs there are no nodegit objects left over |
Yes. :(
hum, I thought that
|
@lushzero, before we can use sgit, it first needs to be fully asynchronous. If it is not, weird behavior will crop up and comparing it to other libraries wouldn't be fair (whether we're comparing memory, cpu, complexity does not matter). Or am I still missing something? Is sgit already async? I was under the impression that v8 objects couldn't be used outside of v8's main thread, so I presumed it was not. Anyway, I've dabbled a bit with nodegit a bit. It seems to work pretty well, although I have found some minor issues (documentation slightly wrong and one usecase not correctly implemented), but all in all I'm pretty impressed. I like that it's just a 1-to-1 mapping of libgit2, which means you can just lookup libgit2's documentation if you want to have details about a particular function. It might also be a good idea to include those mappings into the documentation where possible. I haven't used all functions that I used for gitteh yet, but for now I'm very positive about the current work. Compared to gitteh it already has more up-to-date documentation, it is easier to find the function you need and it was easy to find the function that was causing a bug and fix it. This is mostly due to the 1-to-1 mapping and the fact that there is documentation (although still minimal and wrong in places). |
No, no, you're right. The functions don't return until all the LibGit2 operations (which are blocking) have been done. There are many potentially dangerous things done in SGit. Just to list an example: const git_oid *oid;
{...}
oid = git_commit_oid(wcommit);
{...}
git_oid *w_oid = (git_oid*) oid; The first lines save the OID of a commit, which is a pointer to a LibGit2 gives you Another bad idea is that {
'OID of the tree of first commit': {
'author': '...',
'date': '...',
'message': '...',
'...': '...'
},
'OID of the tree of second commit': {
'author': '...',
'date': '...',
'message': '...',
'...': '...'
},
'...': '...'
} This doesn't allow to iterate over the commits in order; since keys in a hash don't have a concrete order, the original order in which commits were walked is lost. It's not useful to put the OIDs of the tree as hash keys, because what uniquely identifies a commit object is the commit's OID! I want to list commits, not trees. Also, since two commits can share the same tree (which is what happens when you do a |
This is the particular problem with sgit; the API is too concrete. We need more abstract APIs like the ones provided in Gitteh or nodegit. If SGit had a |
@jmendeth you raise a valid bug regarding the the keys being the tree OID's rather than the commit OID's. Sorting is intended to be a parameter just as it is in the git CLI. The pointer cast you mention is ugly but not an immediate cause for concern as the function is self contained. It should be cleaned up though. Sgit is definitely 0.1, I'm just starting to cleanup some things now that I'm committed to the project. Things like removing the dependency on boost, getting travis builds to work, unit tests, documentation, etc. The code itself has had less than a half-day of coding on it. I think there are two very different approaches, I absolutely do not want to expose the libgit2 api into JS space. I think something like the sqlite addon is an outstanding example to work towards. It's almost automagical but also very fast and memory efficient. Having a large list of 1x1 C api mappings to me it's just not nodelike and I don't like the memory "leaking/consumption" and context switching problems it raises. Try to creating hundreds of repos each with a thousand commits and look at memory (which is very close to my project's needs) and performance with any of the options here. The only way nodegit could work for me would be to manually call the GC or implement additional cleanup in the runtime of the addon and there are numerous problems with both approaches. My scenario involves creating lots of repositories and creating lots of commits and then doing data analysis of those. I'm focusing on an API that is very close to the git command line both for ease of use and ease of documentation. That lends itself to about 20 main api functions and parameterized options arrays. @FrozenCow No doubt async support needs to be implemented. Having said that though there are a couple of different ways to do it each with various tradeoffs. I haven't determined what the best approach is yet. I might suggest trying it, as is, so that you can at least see what the impact looks like for your project. I see a lot of comment about the async issues but little data, I need to see more data to make a good decision of how to implement it best. For the scenario I have, the impact of sync is much less than I would have thought it would be for the currently supported operations. If you're on NFS it's going to be potentially catastrophoc. I would be very interested in feedback on semi-realworld cases to implement as benchmarks and tests to see the impact of various tradeoffs in implementing async support. My next steps are to implement object gets, resolve the tree OID issue jmendeth brought up and implement git_index_add_all, at that point I think I can construct a good benchmark that will inform as to the best means of implementing async. |
Look up disk seek http://julianhyde.blogspot.com/2010/11/numbers-everyone-should-know.html and you will have your answer |
@lushzero Doing things async in node shouldn't be a discussion. Say you're creating a http server in node that does things with git. Now in that same script you want to clone a repository like the Linux kernel. Cloning a repo of that scale takes a while, say a couple of minutes. If cloning is done synchronously your server is not responing, since no other Javascript code can be ran on v8's thread, since you're using that thread for cloning. Cloning is just an example that takes a long time, but any operation will suspend the whole node process if done synchronously. That's why the gitteh and nodegit implementations are so much more complex compared to sgit, they need to do everything in a separate thread. |
@FrozenCow Just to clarify I'm not debating async or not async, I'm debating the precise implementation for async. What level of abstraction is employed that wraps what api function, the number of threads used, how the libgit2 threads interplay, etc. From a cursory review both gitteh and nodegit seem to use the same approximate abstraction regarding async and it's very simple. I just started experimenting and benchmarking for sgit async this afternoon. |
Just an update for everyone following along. Enough people are productively using the nodegit wip branch without segfaults or noticeable memory leaks that I've merged to master. Furthermore, the documentation is now all to date -- it correctly merges the "raw" and "convenience" layer together. Feel free to use master as the npm artifact for now. I will publish a |
I updated sgit to operate asynchronously and better reflect git CLI naming conventions. Best of luck to you all. |
Just wondering 8 months later: I'm about to start a project that needs git and I have a rough time figuring out which of these projects to use. Has the view on that changed since 8 months back? |
@walterheck this project is still unmaintained at this point. nodegit might be a better approach. |
Had a feeling yeah, already started looking at nodegit here: On Mon, May 5, 2014 at 8:54 AM, Sam [email protected] wrote:
Best regards, Walter Heck Check out our upcoming Drupal and Puppet |
I had an epiphany this week: I've been selfish when it comes to Gitteh.
I've acknowledged several times in issues / emails that I don't really have enough time for Gitteh. But then, like most developers, I'm presented with a problem of some kind and want to solve it. Then suddenly Gitteh seems interesting again, and I feel all inspired and start making some promises to do things that I generally don't.
Today, I think node-gitteh holds the most promise out of most git libraries for Node. Here's why:
I think the design choices made for gitteh are sound, there's been some quibbling regarding some implementation choices (such as CS), but I think these can be addressed in the short-term easily enough.
SO. What I need is for someone to step up to the plate. I need someone who wants to become the primary maintainer for gitteh.
Please mention your interest here, and we'll hash it out. The person I pick to be the primary maintainer will ideally be someone who's shown a healthy amount of interest in gitteh over the past few months and has some Github projects demonstrating an understanding of Node, and C++ (read: best candidate would be someone who's written a Node native module before). Once I find the right person, I will ensure they are given contributor access to this repository.
The text was updated successfully, but these errors were encountered: