meain/blog

Oct 06, 2023 . 8 min

What is in that .git directory?

Well, I think most of you reading this blog use git more or less on a daily basis, but have you ever looked into what is in the .git folder that git creates? Let's explore it together and understand what is going on in there.

This is a blog version of a talk that I recently gave. Unfortunately I can't link to the recording :(.

git at a basic level is just a bunch of text files linked to each other by filenames.

Let's start init #

As you all know, we start our git journey with a git init. This gives the message that we all are probably used to by now, especially if you start and abandon a lot of side projects.

Initialized empty Git repository in /home/meain/dev/src/git-talk/.git/

Let's look at what is in the .git repo as of now.

$ tree .git

.git
├── config
├── HEAD
├── hooks
│   └── prepare-commit-msg.msample
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

It seems to create a bunch of files and folders. What are all these? Let's go over them one by one.

Now we add a file #

Now that you have an idea of what the initial set of files in .git is, let's perform the first action that adds something into the .git directory. Let's create a file and add it(we are not committing it yet).

echo 'meain.io' > file
git add file

This does the following:

--- init       2024-07-02 15:14:00.584674816 +0530
+++ add 2023-07-02 15:13:53.869525054 +0530
@@ -3,7 +3,10 @@
├── HEAD
├── hooks
│ └── prepare-commit-msg.msample
+├── index
├── objects
+│ ├── 4c
+│ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de
│ ├── info
│ └── pack
└── refs

This causes two main changes as you can see. The first thing it modifies is the index file. The index is what stores the information about what is currently staged. This is used to signify that the file named file has been added to the index.

The second and more important change is the addition of a new folder objects/4c and a file 5b58f323d7b459664b5d3fb9587048bb0296de inside it.

But what is in that file? #

Here is where we get into the details of how git stores things. Let's start with looking at what kind of data is present in that.

$ file .git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de
.git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de: zlib compressed data

Hmm, but what is the zlib compressed data?

$ zlib-flate -uncompress <.git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de
blob 9\0meain.io

Looks like it contains the type, size and data of the file named file that we did a git add on. In this case, the data says that it is a blob of size 9 and the content is meain.io.

OK, but what is with that filename? #

Well, good question. It comes from the sha1 of the content. If you take the zlib compressed data and pipe it through sha1sum, you get the filename.

$ zlib-flate -uncompress <.git/objects/4c/5b58f323d7b459664b5d3fb9587048bb0296de|sha1sum
4c5b58f323d7b459664b5d3fb9587048bb0296de

git takes the sha1 of the content to be written, takes the first two characters, in this case 4c, creates a folder and then uses the rest of it as the filename. git creates folders from the first two chars to make sure we don't have too many files under the single objects folder.

Say hello to git cat-file #

In fact, since this is one of the more important parts of git, git also has a plumbing command to view the content of an object. You can use git cat-file with -t for type, -s for size and -p for content.

$ git cat-file -t 4c5b58f323d7b459664b5d3fb9587048bb0296de
blob

$ git cat-file -s 4c5b58f323d7b459664b5d3fb9587048bb0296de
9

$ git cat-file -p 4c5b58f323d7b459664b5d3fb9587048bb0296de
meain.io

Let's commit #

Now that we know what changes when we add a file, let's take this to the next level by committing.

$ git commit -m 'Initial commit'
[master (root-commit) 4c201df] Initial commit
1 file changed, 1 insertion(+)
create mode 100644 file

Here is what changed:

--- init        2024-07-02 15:14:00.584674816 +0530
+++ commit 2023-07-02 15:33:28.536144046 +0530
@@ -1,11 +1,25 @@
.git
+├── COMMIT_EDITMSG
├── config
├── HEAD
├── hooks
│ └── prepare-commit-msg.msample
├── index
+├── logs
+│ ├── HEAD
+│ └── refs
+│ └── heads
+│ └── master
├── objects
+│ ├── 3c
+│ │ └── 201df6a1c4d4c87177e30e93be1df8bfe2fe19
│ ├── 4c
│ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de
+│ ├── 62
+│ │ └── 901ec0eca9faceb8fe0a9870b9b6cde75a9545
│ ├── info
│ └── pack
└── refs
├── heads
+ │ └── master
└── tags

Woah, looks like there are a bunch of changes. Let's walk through them one by one. The first one is a new file COMMIT_EDITMSG. As the name might suggest, it contains the (last) commit message.

If you where to run the git commit command without the -m flag, the way git gets a commit message is to open an editor with the COMMIT_EDITMSG file to let the user edit the commit message and once the user has updated it and exited the editor, git uses the contents of the file as the commit message.

It also added a whole new folder logs. This is a way for git to log all the commits changes in a repo. You will be able to see the changes in commits for all refs and HEAD here.

The object dir also got some changes, but I want you to first look into the refs/heads directory where we now have the file master. This as you might have guessed is the reference to the master branch. Let's see what is in it.

$ cat refs/heads/master
3c201df6a1c4d4c87177e30e93be1df8bfe2fe19

Looks like it is pointing to one of the new objects. We know how to look at objects, let's do that.

$ git cat-file -t 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19
commit

$ git cat-file -p 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19
tree 62902ec0eca9faceb8fe0a9870b9b6cde75a9545
author Abin Simon <mail@meain.io> 1688292123 +0530
committer Abin Simon <mail@meain.io> 1688292123 +0530

Initial commit

You could have also done git cat-file -t refs/heads/master

Well, looks like that is new kind of object. This seems to be a commit object. The contents of the commit object tells us that it contains a tree object with the hash 62902ec0eca9faceb8fe0a9870b9b6cde75a9545, which looks like the other object that got added when we did the commit. The commit object also has the information about who the author and committer is, which in this case is both me. Lastly is also shows what the commit message for this commit was.

Now let's look at what the tree object contains.

$ git cat-file -t 62902ec0eca9faceb8fe0a9870b9b6cde75a9545
tree

$ git cat-file -p 62901ec0eca9faceb8fe0a9870b9b6cde75a9545
100644 blob 4c5b58f323d7b459664b5d3fb9587048bb0296de file

A tree object will contain the state of working directory in the form of other tree and blob objects. In this case, since we just have a single file named file, you will just see a single object. If you see, the file is pointing to the original object that got added when we did a git add file.

Here is what a tree for a more mature repo look like. More tree objects are used inside tree object linked from the commit object to denote folders.

$ git cat-file -p 2e5e84c3ee1f7e4cb3f709ff5ca0ddfc259a8d04
100644 blob 3cf56579491f151d82b384c211cf1971c300fbf8    .dockerignore
100644 blob 02c348c202dd41f90e66cfeb36ebbd928677cff6    .gitattributes
040000 tree ab2ba080c4c3e4f2bc643ae29d5040f85aca2551    .github
100644 blob bdda0724b18c16e69b800e5e887ed2a8a210c936    .gitignore
100644 blob 3a592bc0200af2fd5e3e9d2790038845f3a5cf9b    CHANGELOG.md
100644 blob 71a7a8c5aacbcaccf56740ce16a6c5544783d095    CODE_OF_CONDUCT.md
100644 blob f433b1a53f5b830a205fd2df78e2b34974656c7b    LICENSE
100644 blob 413072d502db332006536e1af3fad0dce570e727    README.md
100644 blob 1dd7ed99019efd6d872d5f6764115a86b5121ae9    SECURITY.md
040000 tree 918756f1a4e5d648ae273801359c440c951555f9    build
040000 tree 219a6e58af53f2e53b14b710a2dd8cbe9fea15f5    design
040000 tree 5810c119dd4d9a1c033c38c12fae781aeffeafc1    docker
040000 tree f09c5708676cdca6562f10e1f36c9cfd7ee45e07    src
040000 tree e6e1595f412599d0627a9e634007fcb2e32b62e5    website

Making a change #

Let's make a change to the file and see how that works.

$ echo 'blog.meain.io' > file
$ git commit -am 'Use blog link'
[master 68ed5aa] Use blog link
1 file changed, 1 insertion(+), 1 deletion(-)

Here is what it does:

--- commit      2024-07-02 15:33:28.536144046 +0530
+++ update 2023-07-02 15:47:20.841154907 +0530
@@ -17,6 +17,12 @@
│ │ └── 5b58f323d7b459664b5d3fb9587048bb0296de
│ ├── 62
│ │ └── 901ec0eca9faceb8fe0a9870b9b6cde75a9545
+│ ├── 67
+│ │ └── ed5aa2372445cf2249d85573ade1c0cbb312b1
+│ ├── 8a
+│ │ └── b377e2f9acd9eaca12e750a7d3cb345065049e
+│ ├── e5
+│ │ └── ec63cd761e6ab9d11e7dc2c4c2752d682b36e2
│ ├── info
│ └── pack
└── refs

Well, we added 3 new objects. One of them would be a blob object with the new contents of the file, one would be a tree object and the last one will be a commit object.

Let's trace them again from the HEAD or refs/heads/master.

$ git cat-file -p refs/heads/master
tree 9ab377e2f9acd9eaca12e750a7d3cb345065049e
parent 3c201df6a1c4d4c87177e30e93be1df8bfe2fe19
author Abin Simon <mail@meain.io> 1688292975 +0530
committer Abin Simon <mail@meain.io> 1688292975 +0530

Use blog link

$ git cat-file -p 9ab377e2f9acd9eaca12e750a7d3cb345065049e
100644 blob e5ec63cd761e6ab9d11e7dc2c4c2752d682b36e2 file

$ git cat-file -p e6ec63cd761e6ab9d11e7dc2c4c2752d682b36e2
blog.meain.io

Those paying attention might have noticed that the commit object now has an additional key called parent which links to the previous commit as this commit is created on top of the previous commit.

Creating a branch #

About time we created a branch. Let's do that with git branch fix-url.

--- update      2024-07-02 15:47:20.841154907 +0530
+++ branch 2023-07-02 15:55:25.165204941 +0530
@@ -27,5 +28,6 @@
│ └── pack
└── refs
├── heads
+ │ ├── fix-url
│ └── master
└── tags

This adds a new file under the folder refs/heads with a file as the branch name and the content as the id of the latest commit.

$ cat .git/refs/heads/fix-url
68ed5aa2372445cf2249d85573ade1c0cbb312b1

This is pretty much all there is to creating a branch. Branches in git are really cheap. Tags also behave the same way, except that they are created under refs/tags.

A file is also added under the logs directory to store the commit history data similar to master branch.

Checking out a branch #

Checking out in git is git getting the tree object of a commit and updating the files in your worktree to match the state recorded in it. In this case, since we are switching from master to fix-url, both of which point to the same commit and underlying tree object, git does not have anything to do in the working tree.

git checkout fix-url

The only change that happens when you do a checkout inside .git is that the .git/HEAD file will now point to fix-url.

$ cat .git/HEAD
ref: refs/heads/fix-url

Wile we are here, let me make a commit. I'm gonna need this to show what merging does later.

$ echo 'https://blog.meain.io'>file
$ git commit -am 'Fix url'

Merging #

There are primarily 3 ways of merging.

  1. The simplest and the most easiest is a fast forward merge. In this case you just update the commit a branch is pointing to a commit another branch is pointing to. This pretty much involves copying the hash in refs/heads/fix-url to refs/heads/master.
  2. The second one is rebase merge. In this case we first apply our changes on top of what main is currently pointing to one commit at a time and then perform something similar to a fast forward merge.
  3. The last one would be to just merge two branches using a separate merge commit. This is a bit different in that it will have two parent entries in its commit object. We will go a bit more into this towards the end.

First let's see what the graph looks like before a merge.

git log --graph --oneline --all
* 42c6318 (fix-url) Fix url
* 67ed5aa (HEAD -> master) Use blog link
* 3c201df Initial commit

Now to perform the merge:

$ git merge fix-url # updates refs/heads/master to the hash in refs/heads/fix-url
$ git log --graph --oneline --all
* 42c6318 (HEAD -> master) (fix-url) Fix url
* 67ed5aa Use blog link
* 3c201df Initial commit

Pushing #

Now that we have been playing around with our local git repo for some time, let's see what happen when we push it. What is being sent to the git repo on the other side?

To show this, first let me create another git repo which can be used as remote for this repo.

$ mkdir git-talk-2
$ cd git-talk-2 && git init --bare

$ cd ../git-talk && git remote add origin ../git-talk-2

Btw, this change of adding a new remote is a config change and you can see that change in .git/config file. I'm gonna let you go look what the change was on your own.

Now let's push.

$ git push origin master

Let's see what changed in our repo.

--- branch	2023-07-02 15:55:25.165204941 +0530
+++ remote 2023-07-02 17:41:05.170923141 +0530
@@ -22,12 +29,18 @@
│ ├── e5
│ │ └── ec63cd761e6ab9d11e7dc2c4c2752d682b36e2
│ ├── info
│ └── pack
├── ORIG_HEAD
└── refs
├── heads
│ ├── fix-url
│ └── master
+ ├── remotes
+ │ └── origin
+ │ └── master
└── tags

It added a new refs/remotes to store the information on what all is available in different remotes.

But what gets sent to the other git repo? It is everything that is in objects, and all the branches and tags under refs that you explicitly push. That is all the other git instance needs to get your entire git history.

References #


Feel free to checkout the discussion at Hacker News and Lobsters.

← Home