談談.git 目錄

如果有使用git的朋友,可能會知道git是一個分散式版本控制系統。這表示repository會放一份在本機上面。仔細或是有看書的人,會知道這些repository沒有意外會放在工作目錄的.git下面。今天無聊來這個目錄戳戳看。

因為愈寫愈多,還是弄一下目錄好了

測試環境

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.2 LTS
Release:	14.04
Codename:	trusty

$ git --version
git version 2.3.0

$ git remote show
origin
$ git remote show origin
* remote origin
  Fetch URL: https://github.com/pcman-bbs/pcmanx.git
  Push  URL: https://github.com/pcman-bbs/pcmanx.git
  HEAD branch: master
  Remote branches:
    gtk3                          tracked
    master                        tracked
    next-release                  tracked
    show_web_search_in_popup_menu tracked
    transparent_background        tracked
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (up to date)

.git目錄列表

/tmp/pcmanx$ ls -gG --group-directories-first .git
total 56
drwxrwxr-x 2  4096 Mar  3 23:36 branches
drwxrwxr-x 2  4096 Mar  3 23:36 hooks
drwxrwxr-x 2  4096 Mar  3 23:36 info
drwxrwxr-x 3  4096 Mar  3 23:37 logs
drwxrwxr-x 4  4096 Mar  3 23:36 objects
drwxrwxr-x 5  4096 Mar  3 23:37 refs
-rw-rw-r-- 1   264 Mar  3 23:37 config
-rw-rw-r-- 1    73 Mar  3 23:36 description
-rw-rw-r-- 1    23 Mar  3 23:37 HEAD
-rw-rw-r-- 1 12256 Mar  4 23:11 index
-rw-rw-r-- 1    41 Mar  4 23:11 ORIG_HEAD
-rw-rw-r-- 1   829 Mar  3 23:37 packed-refs

背景知識

這邊只列出需要的背景知識,可能有錯。

  • Git 的基本上資料有分blob, tree, commit, tag這四種型態
  • Git 的branch可以視為放一個檔案,這個檔案存放某個commit的HASH資訊

多說無益,直接操作,先來看這四種型態是三小

$ git log
commit fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304
...

來看一下 fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304裡面是啥

$ git cat-file -p fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304
tree 2f6db7ea7db6b66cb4c07a98e78b580f67872db6
parent 3fb419f40c10d67bd1b9b18152fb43cca58ca411
...

可以看到有放

  • tree型態的HASH
  • 上一個commit的HASH
  • commit的訊息

接下來來看tree吧

$ git cat-file -p 2f6db7ea7db6b66cb4c07a98e78b580f67872db6
...
100644 blob 5bc4b3f69d730535eb3a8aee28d7ef9273225d50	.gitignore
...
040000 tree b0b46378d75b1737257fef9d9eb433489b2d0ea0	build

有沒有和Linux的File system很像?目錄tree資料放的是inodeHASH值和檔案名稱的對應表。

我們可以看到HASH的型態有tree和blob,理所當然來看看blob吧。

$ git cat-file -p 5bc4b3f69d730535eb3a8aee28d7ef9273225d50
## autotools generated
Doxyfile
INSTALL
Makefile
Makefile.in
...

很明顯這是一個資料存放的地方。

然後我們來看branch是不是真的放在檔案,檔案內容是某個commit HASH吧,直接看範例。

$ git checkout -b br1
Switched to a new branch 'br1'
$ git checkout -b br2
Switched to a new branch 'br2'
$ tree .git/refs/
.git/refs/
├── heads
│   ├── br1
│   ├── br2
│   └── master
├── remotes
│   └── origin
│       └── HEAD
└── tags

$ git log
commit fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304
...
$ cat .git/refs/heads/br1
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304
$ cat .git/refs/heads/br2
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304
$ cat .git/refs/heads/master
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304

.git第一層檔案

用file看一下,可以看到大部分都是純文字檔:

/tmp/pcmanx$ find .git -maxdepth 1 -type f -exec file {} \;
.git/index: Git index, version 2, 144 entries
.git/packed-refs: ASCII text
.git/config: ASCII text
.git/ORIG_HEAD: ASCII text
.git/description: ASCII text
.git/HEAD: ASCII text

接下來分別討論

index

先看下面的操作。

$ file .git/index
.git/index: Git index, version 2, 144 entries

$ find -type f | grep -v .git\/ | cut -c 3- | sort | wc -l
144

$ git ls-files | wc -l
144

有興趣的人可以把| wc -l去掉玩看看,基本上這個是在描述repository的目錄結構。也就是說,哪個檔案要放在那個目錄。更精確的來說,協助那個blob hash要放在哪個tree中。根據這邊的說法,這個檔案可以協助

  • 提供產生tree object時需要的資料
  • 提供比對working directory和tree的資訊
  • merge產生衝突時,可以從index取得的資料協助解決衝突。不過這邊不是很懂,就單純字面翻譯了。

packed-refs

簡單來說,考量效率,git提供pack ref目錄的功能,ref的概念請看前面背景知識。

一樣,看例子比較快。

先用git show-ref看看我們有哪些reference

$ git show-ref
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 refs/heads/master
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 refs/remotes/origin/HEAD
c86b440bafe24caecfe0dbfbbd73031a1bcd1178 refs/remotes/origin/gtk3
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 refs/remotes/origin/master
22da092eb00b78946fd70de24427d702a77b925b refs/remotes/origin/next-release
98ff18f84fad4686012bf6183734eeb1ec5d7f46 refs/remotes/origin/show_web_search_in_popup_menu
9542889d17d276321d6da2de8d1f9d95f76fb483 refs/remotes/origin/transparent_background
7a3629ab3da7c609cf3698319ab606ebae0998e7 refs/tags/0.3.2@221
f57350c72987263c2b745690bdc8013fbbd6067c refs/tags/0.3.3@243
522d1dc2568b5c7c256e05ca69412480e1afcfb3 refs/tags/0.3.4
8e9d9520125f64a0b894a5d575f0c2d98c3ec06b refs/tags/0.3.4@301
7031115a8ea1151efc1536b43a509774ca2c0777 refs/tags/0.3.7
3f4dcbe6e1aa0045fbeb9f1f761ce035e65defea refs/tags/1.1
098d158c8d8a7ce7c6b2c5c47c967c6137beced2 refs/tags/1.2

看看packed-refs裡面是三小?有沒有發現和上面很像啊?

$ cat .git/packed-refs
## pack-refs with: peeled fully-peeled
c86b440bafe24caecfe0dbfbbd73031a1bcd1178 refs/remotes/origin/gtk3
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 refs/remotes/origin/master
22da092eb00b78946fd70de24427d702a77b925b refs/remotes/origin/next-release
98ff18f84fad4686012bf6183734eeb1ec5d7f46 refs/remotes/origin/show_web_search_in_popup_menu
9542889d17d276321d6da2de8d1f9d95f76fb483 refs/remotes/origin/transparent_background
7a3629ab3da7c609cf3698319ab606ebae0998e7 refs/tags/0.3.2@221
f57350c72987263c2b745690bdc8013fbbd6067c refs/tags/0.3.3@243
522d1dc2568b5c7c256e05ca69412480e1afcfb3 refs/tags/0.3.4
8e9d9520125f64a0b894a5d575f0c2d98c3ec06b refs/tags/0.3.4@301
7031115a8ea1151efc1536b43a509774ca2c0777 refs/tags/0.3.7
3f4dcbe6e1aa0045fbeb9f1f761ce035e65defea refs/tags/1.1
098d158c8d8a7ce7c6b2c5c47c967c6137beced2 refs/tags/1.2

看一下ref目錄,上面的那些reference並不存在在.git/ref裡面

$ tree .git/refs/
.git/refs/
├── heads
│   └── master
├── remotes
│   └── origin
│       └── HEAD
└── tags

4 directories, 2 files

好,生一個branch看看。

$ git checkout -b I_got_a_new_ref
Switched to a new branch 'I_got_a_new_ref'

喔喔!.git/refs出現了剛才產生的ref

$ tree .git/refs/
.git/refs/
├── heads
│   ├── I_got_a_new_ref
│   └── master
├── remotes
│   └── origin
│       └── HEAD
└── tags

4 directories, 3 files

跑一下git gc整理一下

$ git gc
Counting objects: 5002, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1068/1068), done.
Writing objects: 100% (5002/5002), done.
Total 5002 (delta 3913), reused 5002 (delta 3913)

疑?剛才的branch不見了。

$ tree .git/refs/
.git/refs/
├── heads
├── remotes
│   └── origin
│       └── HEAD
└── tags

4 directories, 1 file

跑去pack-ref了

$ cat .git/packed-refs
## pack-refs with: peeled fully-peeled
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 refs/heads/I_got_a_new_ref
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 refs/heads/master
c86b440bafe24caecfe0dbfbbd73031a1bcd1178 refs/remotes/origin/gtk3
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 refs/remotes/origin/master
22da092eb00b78946fd70de24427d702a77b925b refs/remotes/origin/next-release
98ff18f84fad4686012bf6183734eeb1ec5d7f46 refs/remotes/origin/show_web_search_in_popup_menu
9542889d17d276321d6da2de8d1f9d95f76fb483 refs/remotes/origin/transparent_background
7a3629ab3da7c609cf3698319ab606ebae0998e7 refs/tags/0.3.2@221
f57350c72987263c2b745690bdc8013fbbd6067c refs/tags/0.3.3@243
522d1dc2568b5c7c256e05ca69412480e1afcfb3 refs/tags/0.3.4
8e9d9520125f64a0b894a5d575f0c2d98c3ec06b refs/tags/0.3.4@301
7031115a8ea1151efc1536b43a509774ca2c0777 refs/tags/0.3.7
3f4dcbe6e1aa0045fbeb9f1f761ce035e65defea refs/tags/1.1
098d158c8d8a7ce7c6b2c5c47c967c6137beced2 refs/tags/1.2

config

其實這個是個大哉問,就值得專門開一個章節討論。所以我整理到這個頁面,請自行前往閱讀。

ORIG_HEAD

這個檔案很有趣,一開始其實不存在。後來不知道怎麼突然跑出來,問了估狗和男人(man git reset)後得到的答案就是當reset的時候會把舊的HEAD hash放到這邊。為什麼要這樣做呢,當然是反悔用的。你可以man git commitORIG_HEAD就可以看看git commit --amend同效果的範例了。

description

懶得解釋,跳過

望文生義,當然是指向目前所在的commit。

等等!

指向目前所在的commit是三小? 一樣,來看例子。

目前clone下來,裡面放的是一個檔案,檔案裡面是什麼呢?一樣是個commit HASH

$ cat .git/HEAD
ref: refs/heads/master

$ cat .git/refs/heads/master
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304

$ git cat-file -t fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304
commit

那我直接checkout一個不是branch的可以嘛?當然沒問題。 可以看到git會把你checkout的commit HASH寫入.git/HEAD

$ strace git checkout 93f0addca5b2723ff0f1b744b032f359
...
open("/tmp/pcmanx/.git/HEAD.lock", O_RDWR|O_CREAT|O_EXCL, 0666) = 3
write(3, "93f0addca5b2723ff0f1b744b032f359"..., 40) = 40
write(3, "\n", 1)                       = 1
close(3)                                = 0
...
rename("/tmp/pcmanx/.git/HEAD.lock", "/tmp/pcmanx/.git/HEAD") = 0
...

$ cat .git/HEAD
93f0addca5b2723ff0f1b744b032f359938a286d

第一層目錄

branches

快過期的東西(出處),跳過。

hooks

hook,這個翻成中文很鳥,保留原文。基本上就是一組callback,特定的時候會去呼叫。我目前生意沒有做很大,有需要再去看。列出預設的檔案如下

$ ll .git/hooks/
total 48
drwxrwxr-x 2 wen wen 4096 Mar  9 23:10 ./
drwxrwxr-x 8 wen wen 4096 Mar  9 23:21 ../
-rwxrwxr-x 1 wen wen  452 Mar  9 23:10 applypatch-msg.sample*
-rwxrwxr-x 1 wen wen  896 Mar  9 23:10 commit-msg.sample*
-rwxrwxr-x 1 wen wen  189 Mar  9 23:10 post-update.sample*
-rwxrwxr-x 1 wen wen  398 Mar  9 23:10 pre-applypatch.sample*
-rwxrwxr-x 1 wen wen 1642 Mar  9 23:10 pre-commit.sample*
-rwxrwxr-x 1 wen wen 1239 Mar  9 23:10 prepare-commit-msg.sample*
-rwxrwxr-x 1 wen wen 1348 Mar  9 23:10 pre-push.sample*
-rwxrwxr-x 1 wen wen 4898 Mar  9 23:10 pre-rebase.sample*
-rwxrwxr-x 1 wen wen 3611 Mar  9 23:10 update.sample*

info

儲存資訊,廢話。目前裡面只有一個exclude,而且檔案內容還都是註解。但是這邊有說還有其他檔案。另外值得提的是exclude說明有提到.gitignore針對的是git status, git rm, git add, git clean。其他的操作還是得在.git/log/exclude中設定。

logs

這是存放log的地方。log是什麼呢?直接看個例子。

我們先新增一個測試檔案

$ touch test2
$ git add test2
$ git commit test2 -m "test2"
[master 09580e4] test2
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 test2

執行git reflog,可以看到你剛才的動作以及有當次HEAD的commit HASH已經被記起來。

$ git reflog
09580e4 HEAD@{0}: commit: test2
fd5d6ab HEAD@{1}: checkout: moving from 93f0addca5b2723ff0f1b744b032f359938a286d to master
93f0add HEAD@{2}: checkout: moving from master to 93f0addca5b2723ff0f1b744b032f359938a286d
...

這東西有什麼用處呢?當然是讓你反悔用的。再來看個例子

首先開一個branch

$ git checkout -b demo_reflog
Switched to a new branch 'demo_reflog'

新增並commit file1

$ touch file1
$ git add file1
$ git commit file1 -m "file 1"
[demo_reflog c982c70] file 1
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file1

然後新增並commit file2

$ touch file2
$ git add file2
$ git commit file2 -m "file 2"
[demo_reflog b85f4d4] file 2
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file2

切回master並且宰掉剛才的branch

$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.

$ git branch -D demo_reflog
Deleted branch demo_reflog (was b85f4d4).

假設上一個動作是手滑誤刪,能不能反悔。 YES YOU CAN!

先看看log

$ git reflog
fd5d6ab HEAD@{0}: checkout: moving from demo_reflog to master
b85f4d4 HEAD@{1}: commit: file 2
c982c70 HEAD@{2}: commit: file 1
...

可以知道上次branch最後commit的HASH,那麼我們就可以切過去然後開另外一個branch了。

$ git checkout -b i_am_back b85f4d4
Switched to a new branch 'i_am_back'

可以看到資料回來了。

$ git log --pretty=oneline --abbrev-commit
b85f4d4 file 2
c982c70 file 1
fd5d6ab Fix invisible cairo caret issue.
...

當然這還是險招,git本身保留是有保存期限的。有興趣問男人。man git gc

不知道有沒有會問:「啊不是要介紹.git/logs嘛?」

問得好!猜猜剛才git reflogs檔案從哪裡讀出來?

一樣請出strace大大

$ strace -e open git reflog
...
open("/tmp/pcmanx/.git/logs/HEAD", O_RDONLY) = 3
...

有看到.git/logs/HEAD吧?有沒有想看一下裡面是啥?至少我想看。

$ cat .git/logs/HEAD
...
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 c982c70f00875b83e4845ca35ef0ca4c70f73e64 Wen.Liao <censored> 1425997414 +0800	commit: file 1
c982c70f00875b83e4845ca35ef0ca4c70f73e64 b85f4d4dca4012a2d1785750061fae9e03e817e1 Wen.Liao <censored> 1425997438 +0800	commit: file 2
b85f4d4dca4012a2d1785750061fae9e03e817e1 fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 Wen.Liao <censored> 1425997673 +0800	checkout: moving from demo_reflog to master
fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304 b85f4d4dca4012a2d1785750061fae9e03e817e1 Wen.Liao <censored> 1425997732 +0800	checkout: moving from master to b85f4d4
b85f4d4dca4012a2d1785750061fae9e03e817e1 b85f4d4dca4012a2d1785750061fae9e03e817e1 Wen.Liao <censored> 1425997746 +0800	checkout: moving from b85f4d4dca4012a2d1785750061fae9e03e817e1 to i_am_back

精確的來說,.git/logs下面不只紀錄HEAD的log,還有各種ref的log。有興趣的朋友可以自行參考這邊,並且進去目錄逛逛。

objects

先看剛clone下來的目錄結構

$ tree .git/objects/
.git/objects/
├── info
└── pack
    ├── pack-6042320de285747900cc3263a47f7bab429cabf8.idx
    └── pack-6042320de285747900cc3263a47f7bab429cabf8.pack

來生一個目錄和並且commit一個東西進去。

$ mkdir tree
$ echo test > tree/file
$ git add tree/file
$ git commit tree/file -m "test tree and file"
[master 2ba01ef] test tree and file
 1 file changed, 1 insertion(+)
 create mode 100644 tree/file

然後看看這個目錄變化吧

$ tree .git/objects/
.git/objects/
├── 2b
│   └── a01ef5583f35dd57b5fb5bde732f73bc315b6d
├── 9d
│   └── aeafb9864cf43055ae93beb0afd6c7d144bfa4
├── de
│   └── f69c2c316574aaf328e546486fa750eb9c53a0
├── e1
│   └── b8ecbb1f19709f3a4867a0ffe08bb2e07acf19
├── info
└── pack
    ├── pack-6042320de285747900cc3263a47f7bab429cabf8.idx
    └── pack-6042320de285747900cc3263a47f7bab429cabf8.pack

6 directories, 6 files

接下來把目錄和檔案混起來,用git cat-file -p看看裡面是啥

$ git cat-file -p 2ba01ef5583f35dd57b5fb5bde732f73bc315b6d
tree def69c2c316574aaf328e546486fa750eb9c53a0
parent fd5d6abe34f41f8687c1fc5e2ab0a2f65c570304
author Wen.Liao <censored> 1425999641 +0800
committer Wen.Liao <censored> 1425999641 +0800

test tree and file
$ git cat-file -p 9daeafb9864cf43055ae93beb0afd6c7d144bfa4
test
$ git cat-file -p def69c2c316574aaf328e546486fa750eb9c53a0
100644 blob 5bc4b3f69d730535eb3a8aee28d7ef9273225d50	.gitignore
...
040000 tree e1b8ecbb1f19709f3a4867a0ffe08bb2e07acf19	tree
$ git cat-file -p e1b8ecbb1f19709f3a4867a0ffe08bb2e07acf19
100644 blob 9daeafb9864cf43055ae93beb0afd6c7d144bfa4	file

其實不難猜,你local commit的東西會被處理後變成object,並且依照hash分門別類地存放。

接下來我們來做個測試

$ git gc
Counting objects: 5006, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1070/1070), done.
Writing objects: 100% (5006/5006), done.
Total 5006 (delta 3914), reused 5001 (delta 3913)

可以看到,剛才的目錄全部消失,而pack目錄的檔案名稱也和前面的不同了。對照上面git gc的訊息,可以知道git gc做了壓縮,並且將這些object一起壓到pack目錄的檔案中了。

$ tree .git/objects/
.git/objects/
├── info
│   └── packs
└── pack
    ├── pack-57d163604c430e1919a18e97c5b1312291b62721.idx
    └── pack-57d163604c430e1919a18e97c5b1312291b62721.pack

2 directories, 3 files

所以我們可以從觀察中做出結論。Git object的存放有兩種型式

  • loose object:就是剛才看到用[HASH前兩碼]/[剩下HASH] 的目錄檔案結構
  • packed object: 把object壓縮成兩個檔案

而looose object型式經過git gc後會有可能轉換成packed object,反之亦然。詳細規則問男人。 由於git gc是用來清除整理local repository,也就是說做了git gc是有可能刪除用不到的資料。詳細情況一樣請問男人man git gc

refs

參考資料


书籍推荐