this post was submitted on 20 Dec 2023
35 points (97.3% liked)
Git
2541 readers
10 users here now
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Resources
Rules
- Follow programming.dev rules
- Be excellent to each other, no hostility towards users for any reason
- No spam of tools/companies/advertisements. It’s OK to post your own stuff part of the time, but the primary use of the community should not be self-promotion.
Git Logo by Jason Long is licensed under the Creative Commons Attribution 3.0 Unported License.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Nicely written! Going into my bookmarks.
I'm one of those who was unfortunate enough to use SVN. It was my first version control too. Committing takes forever - to fetch data from the server to see if the 'trunk' (the branch) was updated (I see why Torvalds hated it). Even committing caused conflicts sometimes. People also used to avoid branching, because merging back was hell! There was practically no merges that didn't end in merge conflicts. The biggest advance of merge workflow from those days was the introduction of 3-way merges. Practically all modern VCSs - including git and mercurial - use 3-way merge. 3-way merges cut down merge conflicts by a huge margin overnight. Git even uses 3-way merges even for seemingly unrelated tasks like revert, cherrypick, rebase, etc (and it works well for them. We barely even notice it!). Surprisingly though, 3-way merges were around since 1970s (the diff3 program). Why CVS and SVN didn't use it is beyond me.
Anyway, my next VCS was Bazaar. It's still around as Breezy. It is a distributed VCS like Git, but was more similar to SVN in its interface. It was fun - but I moved on to Mercurial before settling with Git. Honestly, Mercurial was the sweetest VCS I ever tried. Git interface really shows the fact that it is created by kernel developers for kernel developers (more on this later). Mercurial interface, on the other hand is well thought out and easy to figure out. This is surprising because both Git and Mercurial share a similar model of revisions. Mercurial was born a few days after Git. It even stood a chance for winning the race to become the dominant VCS. But Mercurial lost kernel developers' mindshare due to Python - it simply wasn't as fast as Git. Then GitHub happened and the rest is history.
I was a relatively late adopter of Git. But the signs of what you say are still there in Git. Git is still unapologetically based on the idea of keeping versioned folders and patches. Git actually has a dual personality based on these! While it treats commits as a snapshot of history (similar to versioned folders), many operations are actually based on patches (3-way merges actually). That includes merging, rebase, revert, cherrypick, etc. There's no getting around this fact. IMHO, the lack of understanding of this fact is what makes Git confusing for beginners.
Perhaps this is no more apparent than in the case of quilt. Quilt is a software that is used to manage a 'stack of patches'. It gives you the ability to absorb changes to source code into a patch and apply or remove a set of patches. This is as close you can get to a VCS without being a VCS. Kernel devs still use quilt sometimes and exchange quilt patch stacks. Git even has a command for importing quilt patch stacks - git-quiltimport. There are even tools that integrate patch stacks into Git - like stgit. If you haven't tried it yet, you should. It's hard to predict if you'll like it. But if you do, it becomes a powerful tool in your arsenal. It's like rebase on steroids. (aside: This functionality is built into mercurial).
I recently got into packaging for Linux. Trust me - there's nothing as easy or convenient as dealing with patches. It's closer to plain vanilla files than any VCS ever was.
As I understand, the biggest problem was that not everyone was given equal access. Most significantly, many developers didn't have access to the repo metadata. The metadata that was necessary to perform things like blame, bisect or even diffs.
That sounds accurate. To add more context, it was Andrew Tridgell who 'reverse engineered' it. He became the target of Torvald's ire due to this. He did reveal his 'reverse engineering' later. He telnetted into the server and typed 'help'.
I thought I should mention Junio Hamano. He was probably the second biggest contributor to git back then. Torvalds practically handed over the development of git to him a few months after its inception. Hamano has been the lead maintainer ever since. There is one aspect of his leadership that I really like. Git by no means is a simple or easy tool. There has been ample criticisms of it. Yet, the git team has tried sincerely to address them without hostility. Some of the earlier warts were satisfactorily resolved in later versions (for example, restore and switch are way nicer than checkout).
Same. I guess I'm an old guy, because I literally started with RCS, then the big step up that was CVS, and then used CVS for quite some time while it was the standard. SVN was always ass. I can't even really put my finger on what was so bad about it; I just remember it being an unpleasant experience, for all it was supposed to "fix" the difficulties with CVS. I much preferred CVS. Perforce was fine, and used basically the exact same model as SVN just with some polish, so I think the issue was the performance and interface.
Also, my god, you gave me flashbacks to the days when a merge conflict would dump the details of the conflict into your source file and you'd have to go in and clean it up manually in the editor. I'd forgotten about that. It wasn't pleasant.
Yeah, absolutely. I was going to talk about this a little but my thing was already long. The two most notable features of git are its high performance and its incredibly cryptic interface, and knowing the history makes it make a lot of sense why that is.
Yeah. I was present on the linux-kernel mailing list while all this was going on, purely as a fanboy, and I remember Linus's fanatical attention to performance as a key consideration at every stage. I actually remember there was some level of skepticism about the philosophy of "just download the whole history from the beginning of time to your local machine if you want to do anything" -- like the time and space requirements in order to do that probably wouldn't be feasible for a massive source tree with a long history. Now that it's reality, it doesn't seem weird, but at the time it seemed like a pretty outlandish approach, because with the VCS technologies that existed at the time it would have been murder. But, the kernel developers are not lacking in engineering capabilities, and clean design and several rounds of optimization to figure out clever ways to tighten things up made it work fine, and now it's normal.
That's cool. Yeah, I'll look into it; I have no need of it for any real work I'm doing right now but it sounds like a good tool to be familiar with.
I still remember the days of big changes to the kernel being sent to the mailing list as massive series of organized patchsets (like 20 or more messages with each one having a pretty nontrivial patchset to implement some piece of the change), with each patch set as a conceptually distinct change, so you could review them one at a time and at the end understand the whole huge change from start to finish and apply it to your tree if you wanted to. Stuff like that was why I read the mailing list; I just remember being in awe of the type of engineering chops and the diligence applied to everyone working together that was on display.
Agreed. I was a little critical-sounding of diff and patch as a system, but honestly patches are great; there's a reason they used that system for so long.
Sounds right. It sounds like your memory on it is better than mine, but I remember there being some sort of "export" where people who didn't want to use bk could look at the kernel source tree as a linear sequence of commits (i.e. not really making it clear what had happened if someone merged together two sequences of commits that had been developed separately for a while). It wasn't good enough to do necessary work, more just a stopgap if someone needed to check out the current development head or something, and that's it.
😆
I'll update my comment to reflect this history, since I didn't remember this level of detail.