What is in that .git directory?
Well, I think most of you reading this blog use git
more or less on a daily basis, but have you ever looked into what is in the .git
folder that git creates? Let's explore it together and understand what is going on in there.
This is a blog version of a talk that I recently gave. Unfortunately I can't link to the recording :(.
git
at a basic level is just a bunch of text files linked to each other by filenames.
Let's start init #
As you all know, we start our git journey with a git init
. This gives the message that we all are probably used to by now, especially if you start and abandon a lot of side projects.
Initialized empty Git repository in /home/meain/dev/src/git-talk/.git/
Let's look at what is in the .git
repo as of now.
$ tree .git
.git
├── config
├── HEAD
├── hooks
│ └── prepare-commit-msg.msample
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
It seems to create a bunch of files and folders. What are all these? Let's go over them one by one.
config
is a text file that contains your git configuration for the current repo. If you look into it, you would see some basic settings for you repo like the author, filemode etc.HEAD
contains the current head of the repo. Depending on what you have set your "default" branch to be, it will berefs/heads/master
orrefs/heads/main
or whatever else you had set to. As you might have guessed, this is pointing torefs/heads
folder that you can see below, and into a file calledmaster
which does not exist as of now. This filemaster
will only show up after you make your first commit.hooks
contain any scripts that can be run before/after git does anything. I have written another blog here which goes a bit more into how git hooks work.objects
contains the git objects, ie the data about the files, commits etc in your repo. We will go in depth into this in this blog.refs
as we previously mentioned, stores references(pointers).refs/heads
contains pointers to branches andrefs/tags
contains pointers to tags. We will go into what these files look like soon.
Now we add a file #
Now that you have an idea of what the initial set of files in .git
is, let's perform the first action that adds something into the .git
directory. Let's create a file and add it(we are not committing it yet).
echo 'meain.io' > file
git add file
This does the following:
--- init 2024-07-02 15:14:00.584674816 +0530
+++ add 2023-07-02 15:13:53.869525054 +0530
@@ -3,7 +3,10 @@
├── HEAD
├── hooks
│ └── prepare-commit-msg.msample
+├── index
├── objects
+│ ├── 4c
+│ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de
│ ├── info
│ └── pack
└── refs
This causes two main changes as you can see. The first thing it modifies is the index
file. The index is what stores the information about what is currently staged. This is used to signify that the file named file
has been added to the index.
The second and more important change is the addition of a new folder objects/4c
and a file 5b58f323d7b459664b5d3fb9587048bb0296de
inside it.
But what is in that file? #
Here is where we get into the details of how git
stores things. Let's start with looking at what kind of data is present in that.
$ file .git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de
.git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de: zlib compressed data
Hmm, but what is the zlib compressed data?
$ zlib-flate -uncompress <.git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de
blob 9\0meain.io
Looks like it contains the type, size and data of the file named file
that we did a git add
on. In this case, the data says that it is a blob
of size 9
and the content is meain.io
.
OK, but what is with that filename? #
Well, good question. It comes from the sha1 of the content. If you take the zlib compressed data and pipe it through sha1sum
, you get the filename.
$ zlib-flate -uncompress <.git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de|sha1sum
4c5b58f323d7b459664b5d3fb9587048bb0296de
git
takes the sha1 of the content to be written, takes the first two characters, in this case 4c
, creates a folder and then uses the rest of it as the filename. git
creates folders from the first two chars to make sure we don't have too many files under the single objects
folder.
Say hello to git cat-file
#
In fact, since this is one of the more important parts of git, git also has a plumbing command to view the content of an object. You can use git cat-file
with -t
for type, -s
for size and -p
for content.
$ git cat-file -t 4c5b58f323d7b459664b5d3fb9587048bb0296de
blob
$ git cat-file -s 4c5b58f323d7b459664b5d3fb9587048bb0296de
9
$ git cat-file -p 4c5b58f323d7b459664b5d3fb9587048bb0296de
meain.io
Let's commit #
Now that we know what changes when we add a file, let's take this to the next level by committing.
$ git commit -m 'Initial commit'
[master (root-commit) 4c201df] Initial commit
1 file changed, 1 insertion(+)
create mode 100644 file
Here is what changed:
--- init 2024-07-02 15:14:00.584674816 +0530
+++ commit 2023-07-02 15:33:28.536144046 +0530
@@ -1,11 +1,25 @@
.git
+├── COMMIT_EDITMSG
├── config
├── HEAD
├── hooks
│ └── prepare-commit-msg.msample
├── index
+├── logs
+│ ├── HEAD
+│ └── refs
+│ └── heads
+│ └── master
├── objects
+│ ├── 3c
+│ │ └── 201df6a1c4d4c87177e30e93be1df8bfe2fe19
│ ├── 4c
│ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de
+│ ├── 62
+│ │ └── 901ec0eca9faceb8fe0a9870b9b6cde75a9545
│ ├── info
│ └── pack
└── refs
├── heads
+ │ └── master
└── tags
Woah, looks like there are a bunch of changes. Let's walk through them one by one. The first one is a new file COMMIT_EDITMSG
. As the name might suggest, it contains the (last) commit message.
If you where to run the git commit
command without the -m
flag, the way git
gets a commit message is to open an editor with the COMMIT_EDITMSG
file to let the user edit the commit message and once the user has updated it and exited the editor, git
uses the contents of the file as the commit message.
It also added a whole new folder logs
. This is a way for git to log all the commits changes in a repo. You will be able to see the changes in commits for all refs and HEAD
here.
The object
dir also got some changes, but I want you to first look into the refs/heads
directory where we now have the file master
. This as you might have guessed is the reference to the master
branch. Let's see what is in it.
$ cat refs/heads/master
3c201df6a1c4d4c87177e30e93be1df8bfe2fe19
Looks like it is pointing to one of the new objects. We know how to look at objects, let's do that.
$ git cat-file -t 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19
commit
$ git cat-file -p 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19
tree 62902ec0eca9faceb8fe0a9870b9b6cde75a9545
author Abin Simon <mail@meain.io> 1688292123 +0530
committer Abin Simon <mail@meain.io> 1688292123 +0530
Initial commit
You could have also done
git cat-file -t refs/heads/master
Well, looks like that is new kind of object. This seems to be a commit
object. The contents of the commit
object tells us that it contains a tree
object with the hash 62902ec0eca9faceb8fe0a9870b9b6cde75a9545
, which looks like the other object that got added when we did the commit. The commit
object also has the information about who the author and committer is, which in this case is both me. Lastly is also shows what the commit message for this commit was.
Now let's look at what the tree
object contains.
$ git cat-file -t 62902ec0eca9faceb8fe0a9870b9b6cde75a9545
tree
$ git cat-file -p 62901ec0eca9faceb8fe0a9870b9b6cde75a9545
100644 blob 4c5b58f323d7b459664b5d3fb9587048bb0296de file
A tree
object will contain the state of working directory in the form of other tree and blob objects. In this case, since we just have a single file named file
, you will just see a single object. If you see, the file is pointing to the original object that got added when we did a git add file
.
Here is what a tree for a more mature repo look like. More tree
objects are used inside tree
object linked from the commit
object to denote folders.
$ git cat-file -p 2e5e84c3ee1f7e4cb3f709ff5ca0ddfc259a8d04
100644 blob 3cf56579491f151d82b384c211cf1971c300fbf8 .dockerignore
100644 blob 02c348c202dd41f90e66cfeb36ebbd928677cff6 .gitattributes
040000 tree ab2ba080c4c3e4f2bc643ae29d5040f85aca2551 .github
100644 blob bdda0724b18c16e69b800e5e887ed2a8a210c936 .gitignore
100644 blob 3a592bc0200af2fd5e3e9d2790038845f3a5cf9b CHANGELOG.md
100644 blob 71a7a8c5aacbcaccf56740ce16a6c5544783d095 CODE_OF_CONDUCT.md
100644 blob f433b1a53f5b830a205fd2df78e2b34974656c7b LICENSE
100644 blob 413072d502db332006536e1af3fad0dce570e727 README.md
100644 blob 1dd7ed99019efd6d872d5f6764115a86b5121ae9 SECURITY.md
040000 tree 918756f1a4e5d648ae273801359c440c951555f9 build
040000 tree 219a6e58af53f2e53b14b710a2dd8cbe9fea15f5 design
040000 tree 5810c119dd4d9a1c033c38c12fae781aeffeafc1 docker
040000 tree f09c5708676cdca6562f10e1f36c9cfd7ee45e07 src
040000 tree e6e1595f412599d0627a9e634007fcb2e32b62e5 website
Making a change #
Let's make a change to the file and see how that works.
$ echo 'blog.meain.io' > file
$ git commit -am 'Use blog link'
[master 68ed5aa] Use blog link
1 file changed, 1 insertion(+), 1 deletion(-)
Here is what it does:
--- commit 2024-07-02 15:33:28.536144046 +0530
+++ update 2023-07-02 15:47:20.841154907 +0530
@@ -17,6 +17,12 @@
│ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de
│ ├── 62
│ │ └── 901ec0eca9faceb8fe0a9870b9b6cde75a9545
+│ ├── 67
+│ │ └── ed5aa2372445cf2249d85573ade1c0cbb312b1
+│ ├── 8a
+│ │ └── b377e2f9acd9eaca12e750a7d3cb345065049e
+│ ├── e5
+│ │ └── ec63cd761e6ab9d11e7dc2c4c2752d682b36e2
│ ├── info
│ └── pack
└── refs
Well, we added 3 new objects. One of them would be a blob
object with the new contents of the file, one would be a tree
object and the last one will be a commit
object.
Let's trace them again from the HEAD
or refs/heads/master
.
$ git cat-file -p refs/heads/master
tree 9ab377e2f9acd9eaca12e750a7d3cb345065049e
parent 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19
author Abin Simon <mail@meain.io> 1688292975 +0530
committer Abin Simon <mail@meain.io> 1688292975 +0530
Use blog link
$ git cat-file -p 9ab377e2f9acd9eaca12e750a7d3cb345065049e
100644 blob e5ec63cd761e6ab9d11e7dc2c4c2752d682b36e2 file
$ git cat-file -p e6ec63cd761e6ab9d11e7dc2c4c2752d682b36e2
blog.meain.io
Those paying attention might have noticed that the commit
object now has an additional key called parent
which links to the previous commit as this commit is created on top of the previous commit.
Creating a branch #
About time we created a branch. Let's do that with git branch fix-url
.
--- update 2024-07-02 15:47:20.841154907 +0530
+++ branch 2023-07-02 15:55:25.165204941 +0530
@@ -27,5 +28,6 @@
│ └── pack
└── refs
├── heads
+ │ ├── fix-url
│ └── master
└── tags
This adds a new file under the folder refs/heads
with a file as the branch name and the content as the id of the latest commit.
$ cat .git/refs/heads/fix-url
68ed5aa2372445cf2249d85573ade1c0cbb312b1
This is pretty much all there is to creating a branch. Branches in git
are really cheap. Tags also behave the same way, except that they are created under refs/tags
.
A file is also added under the logs
directory to store the commit history data similar to master
branch.
Checking out a branch #
Checking out in git
is git getting the tree
object of a commit and updating the files in your worktree to match the state recorded in it. In this case, since we are switching from master
to fix-url
, both of which point to the same commit
and underlying tree
object, git
does not have anything to do in the working tree.
git checkout fix-url
The only change that happens when you do a checkout inside .git
is that the .git/HEAD
file will now point to fix-url
.
$ cat .git/HEAD
ref: refs/heads/fix-url
Wile we are here, let me make a commit. I'm gonna need this to show what merging does later.
$ echo 'https://blog.meain.io'>file
$ git commit -am 'Fix url'
Merging #
There are primarily 3 ways of merging.
- The simplest and the most easiest is a fast forward merge. In this case you just update the commit a branch is pointing to a commit another branch is pointing to. This pretty much involves copying the hash in
refs/heads/fix-url
torefs/heads/master
. - The second one is rebase merge. In this case we first apply our changes on top of what main is currently pointing to one commit at a time and then perform something similar to a fast forward merge.
- The last one would be to just merge two branches using a separate merge commit. This is a bit different in that it will have two
parent
entries in its commit object. We will go a bit more into this towards the end.
First let's see what the graph looks like before a merge.
git log --graph --oneline --all
* 42c6318 (fix-url) Fix url
* 67ed5aa (HEAD -> master) Use blog link
* 3c201df Initial commit
Now to perform the merge:
$ git merge fix-url # updates refs/heads/master to the hash in refs/heads/fix-url
$ git log --graph --oneline --all
* 42c6318 (HEAD -> master) (fix-url) Fix url
* 67ed5aa Use blog link
* 3c201df Initial commit
Pushing #
Now that we have been playing around with our local git
repo for some time, let's see what happen when we push it. What is being sent to the git repo on the other side?
To show this, first let me create another git repo which can be used as remote for this repo.
$ mkdir git-talk-2
$ cd git-talk-2 && git init --bare
$ cd ../git-talk && git remote add origin ../git-talk-2
Btw, this change of adding a new remote is a config change and you can see that change in .git/config
file. I'm gonna let you go look what the change was on your own.
Now let's push.
$ git push origin master
Let's see what changed in our repo.
--- branch 2023-07-02 15:55:25.165204941 +0530
+++ remote 2023-07-02 17:41:05.170923141 +0530
@@ -22,12 +29,18 @@
│ ├── e5
│ │ └── ec63cd761e6ab9d11e7dc2c4c2752d682b36e2
│ ├── info
│ └── pack
├── ORIG_HEAD
└── refs
├── heads
│ ├── fix-url
│ └── master
+ ├── remotes
+ │ └── origin
+ │ └── master
└── tags
It added a new refs/remotes
to store the information on what all is available in different remotes.
But what gets sent to the other git repo? It is everything that is in objects
, and all the branches and tags under refs
that you explicitly push. That is all the other git instance needs to get your entire git history.
References #
- https://git-scm.com/book/en/v3/Git-Internals-Git-Objects
- https://matthew-brett.github.io/curious-git/reading_git_objects.html
- https://blog.meain.io/2020/bunch-of-git-stuff/
Feel free to checkout the discussion at Hacker News and Lobsters.