Folks come to me to ask for help with Git.
Sometimes they can’t guess what git
subcommand they need. (Git 2.37 has 169.)
Sometimes they know what subcommand they want, but don’t know what flags to use. (git log
now has 149 flags and options.)
Sometimes they issued a command, and Git didn’t do what they expected 😱
Maybe you’ve had one of those problems yourself. Typically, their problem isn’t Git.
They even want Git to do something that it can do, easily. They’re just asking Git for it the wrong way 🤯
Usually these folks just have the wrong mental model of how Git works. They’ve learned a bunch of commands, drawn mental cartoons of how Git works, and then typed in a command based on that model.
They’re frustrated because they’ve built the wrong mental model.
The questions sometimes seem like a student saying,
“Dr. Haemer, Dr. Haemer! I understood everything
you said, except the difference between a loop and a CPU.”
I almost never answer with, “Oh, you just need to add the flag --foobarmumble
,” or “You need to use git frabitz
instead of git zazzle
,” or “Git just can’t do that.”
Instead, it’s, “Aha. You just need to understand how Git works … the big picture. Let’s start there.”
Q: “Wait. You’re telling me that the best way for me to solve my Git problems is to understand what it’s doing?”
A: “Yup.”
Git’s magic isn’t in pieces hidden from view, it’s magic is its
simple, open design.
You can watch it work, under the hood, yourself. And should. I suspect making it easy to watch also made it easier for Linus Torvalds to debug.
I’ll show you.
Everything from here on out will be on the command-line,
GUI interfaces to Git are just layered on top of shell-level equivalents.
Working on the command line removes an obfuscating layer.
Oh, and I’m using “Git” to mean the whole, distributed, version-control system, and git
when I mean the command.
I’m also going to assume you’re using Linux or something like it:
Unix, OS/X, Penguin, BSD, …
Linus designed and wrote both Linux and Git. Though Git is now pretty portable, guessing which OS you can expect it to make the most sense on is not much of a challenge.
Watching Git at Work
Begin by making a directory to work in:$ mkdir /tmp/scratch
$ cd /tmp/scratch
$ ls -a
That’s empty all right. Now put it under Git control.$ git init
$ ls # no files
$ ls -a # ah! a hidden directory
The .git
directory is where Git will stuff everything it knows about.
You haven’t even created any files of your own, much less committed any. What did that git init
command put into .git
?
The most useful tool to explore this is the tree
command,
which lays out directory hierarchies for you to see.
If your operating system didn’t supply tree
by default,
stop for a second to install it with your favorite package manager: apt
, brew
, … whatever.$ tree .git
Now you’re cooking.
Spend a few minutes looking through everything that’s there.
It’s mostly empty directories, plus a few files that are obviously boilerplate and templates.
Nothing useful.
Git knows it’s there, though. Try these:$ git status # before removing .git
$ rm -rf .git
$ git status # after removing it
Ask Git a question, and it looks for answers in .git
.
Want to wipe out a git repo and start over? Just remove .git
.
Okay, now put it back.$ git init
$ tree .git
That was easy. Next, make an empty file.$ touch my-empty-file
Does that do anything to .git
?$ tree .git
Doesn’t look like it. What’s Git think?$ git status
Now it sees a file outside of .git
but there’s no information about that file inside of .git
. And that’s what “untracked” means.
What would change if it were tracked? It tells you to use git add
to track it, so try that. Why not? You know that if something goes wrong, you can just start over with rm -rf .git
$ git add my-empty-file
$ tree .git
Oho!
There’s a new file here.
.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
And, since that’s in .git
, Git sees it, too.$ git status
Much of the work you do with Git adds and queries objects in .git/objects
, and that’s where I’ll be pointing out things from now on.
Go ahead and commit it, and watch what changes.$ git commit -m"My first commit: an empty file."
$ git status
$ tree .git/objects
The original .git/objects/e6/9de29*
is still there,
but you’ve created two new objects, so now you have three.
Two of the three are Git’s version of a file and of a directory.
Linus calls objects that hold files “blobs,” and objects that hold directories, “trees.”
– e6/9de29*
: a blob for the empty file
– bb/216ad*
: a tree for the directory containing that empty file
To stave off a potential, mental mix-up. take a short pause to think through that, carefully.
You’re juggling *two* filesystems here. One is your OS’s filesystem, which has a directory called .git/objects/
, with subdirectories and files.
The second is Git’s filesystem, which stores all its pieces as objects in that first filesystem. You’re going to explore this second filesystem.
Notice, especially, that the blob object for the empty file isn’t stored under the tree object in your OS’s filesystem. That blob is “in” that tree only in Git’s view of the world, and you’re about to see how that’s done.
Calling the files and directories for Git’s filesystems “blobs” and “trees” will help you keep straight which of the two filesystems you’re talking about.
Trees
In the Unix filesystem, you look at a directory’s content with the ls
command. In Linus’s Git filesystem, you can use git ls-tree
.
Try that now.
$ git ls-tree bb216ad97a6d296d1feedbc3e0973
43ce93f8f43
Git sees that this tree, has one blob (file), called `my-empty-file`, that it has permissions 100644
, and that the blob is e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
Linus sticks that blob in e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
so it doesn’t have to store every object in the same directory in the parent’s filesystem. That’s just an implementation detail.
If you’ve already guessed that the file bb/216ad*
is where Git put the tree bb216ad97a6d296d1feedbc3e097343ce93f8f43
, you’ve guessed correctly.
You’re already building a new, detailed, and *correct* mental model of what Git’s doing.
But what’s that third file?
Commits
To take your model to the next level, first take a peek inside those files.
$ cat .git/objects/e69de29*
$ cat .git/objects/bb216ad*
Ugh. They’re encoded in some weird way, so cat
isn’t useful.
Fortunately, Linus provides git cat-file
-p, which decodes and shows the contents of objects in his file system.git cat-file -p e69de29
Well, that doesn’t seem to do anything, right? Oh. Wait. That blob was the empty file. There’s no contents to show. Duh.
I’ll pause to point out a piece of syntax: There’s no slash in that name.
It’s e69de29
, not e6/9de29
. Linus spread Git’s objects out across subdirectories, but Git still thinks of them without the slashes.
Again: those subdirectories are just an implementation detail.
Luckily Git also lets you abbreviate names with the first few characters You can type git cat-file -p e69de29
, not
git cat-file -p e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
Since there was nothing in the blob, let’s look at the tree.git cat-file -p bb216ad
Now you’re cooking! That’s the tree all right. So, what’s the third object? Might as well peek at it.git cat-file -p 659e774
(Your third object will be named something different from mine, but at this point, I bet you can work out what to type to see yours.)
You’re looking at the commit itself.
Notice what’s in it: your name, your email address, and your commit message. Here’s another way to see the same information:git log
So, git log
looks in .git for your commit object, reads the information, and formats it in a pretty way. And now you see what that leading line of the log comment, “commit …”, means. It’s Git’s name for that commit object.
You can also see that the first line of the commit object says,tree bb216ad97a6d296d1feedbc3e097343ce93f8f43
So now you see how Git is connecting up all the pieces.
– A commit object keeps track of meta-information about the commit and points at the tree being committed.
– A tree keeps track of the blobs in it, and their human names.
– A blob stores the contents of a file.
What You Now Know
– Git stores all its information in the directory .git/
– git init
creates that directory
– Linus implements a user-level filesystem with the files under .git/objects/
.
– Each Git object is stored in a subdirectory of .git/objects/
. The subdirectory is the first two characters of the name. This is an implementation detail to keep you from having to store every object in the same directory in your OS’s filesystem.
The name of the object in your OS’s file
.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
is e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
,
which Git lets you abbreviate as e69de29
, thank goodness.
– There are three important flavors of objects: blobs, trees, and commits.
– Blobs are files, trees are directories, and commits are, um, commits.
– The objects are encoded, but you can use git cat-file -p
to look inside them.
– You can use git cat-file -p
to see the contents of a blob.
– You can use git cat-file -p
to see the contents of a tree: a list of objects, with their types, permissions and their human names.git ls-tree
serves up the same information in a slightly nicer format.
– You can use git cat-file -p
to see the contents of a commit: the commit message, timestamp, committer, and the tree you committed.git log
will format that information in loads of different, friendly ways.
– All these are bound together: commits point at trees, trees point at blobs and other trees.
– The Git filesystem is completely user-visible. You can see the whole thing. The Linux filesystem implements directories and files in a similar way — no surprise — but the implementation details are hidden. You can get a directory listing with ls
, but you can’t actually open up a directory and look at its guts with cat
.
There’s a lot more here to explore:
– tags & branches,
– SHA1s and object encoding,
– configuration files,
– remotes with their fetches, pushes & pulls,
– merges & rebases,
– indexes & packfiles,
– the git
command and its subcommands,
…
Yes, Git is big. But its design is also simple.
Now you know you can watch and understand how all these work. You can see into Git’s secrets for yourself.
If you want some guidance along the way, I can recommend some resources:
– One is Ian Miel’s Learn Git the Hard Way, available on Kindle for $10.
I think it does a good job of teaching Git by teaching how it works.
– A second is Git Under the Hood, a set of videos that I did for Pearson, and available either directly, or through O’Reilly.
– You can even see what Git looked like at the beginning and every step of its evolution: Linus Torvalds started writing released Git on April 3, 2005, released it three days later, and made it self-hosting the next day. You can clone the source with git clone https://github.com/git/git
, and then check out the very first version, or any version after that 💡