#8 `subbash/prompt`: Length of detached HEAD hash

Closed
opened 1 year ago by tukusejssirs · 2 comments

Default length of hash in git is 7 characters or, when there are many objects, it is automatically set to the lowest possible characters (although at least 7).

This may be overridden (either globally or per repo or as a command option) to any number (length) and the same rule applies: if you set it to 3 chars and there are more then 4095 (fff in hex) objects, it outputs 4 chars for every hash).

Now to the suggestion: create a variable hashSize (or a name of your choice) in the prompt function (e.g. right after PS1="") and then use it in line 138 as GBranch="($(echo ${GStatus} | awk 'match($0,/branch.oid [0-9a-fA-F]+/) {print substr($0,RSTART+11,RLENGTH-11)}' | cut -c1-$hashSize))".

However, there is one problem: possible colisions.

If I get hash of length of 4 chars in one of my repos, I get 4 chars; but when of 2 chars, the result is a hash of length of 4 chars, because git takes care of colisions:

$ git rev-parse --short=4 HEAD
ea4a

$ git rev-parse --short=2 HEAD
ea4a

Therefore, you should get the hash this way.

PS—If you want to get the hash of default length (set up either locally or globally in git), you can use git rev-parse --short HEAD.

Default length of hash in `git` is 7 characters or, when there are many objects, it is automatically set to the lowest possible characters (although at least 7). This may be overridden (either globally or per repo or as a command option) to any number (length) and the same rule applies: if you set it to 3 chars and there are more then 4095 (fff in hex) objects, it outputs 4 chars for every hash). Now to the _suggestion_: create a variable `hashSize` (or a name of your choice) in the prompt function (e.g. right after `PS1=""`) and then use it in [line 138](https://notabug.org/demure/dotfiles/src/master/subbash/prompt#L138) as `GBranch="($(echo ${GStatus} | awk 'match($0,/branch.oid [0-9a-fA-F]+/) {print substr($0,RSTART+11,RLENGTH-11)}' | cut -c1-$hashSize))"`. However, there is one problem: possible colisions. If I get hash of length of 4 chars in one of my repos, I get 4 chars; but when of 2 chars, the result is a hash of length of 4 chars, because `git` takes care of colisions: ```bash $ git rev-parse --short=4 HEAD ea4a $ git rev-parse --short=2 HEAD ea4a ``` Therefore, you should get the hash this way. PS—If you want to get the hash of default length (set up either locally or globally in `git`), you can use `git rev-parse --short HEAD`.
demure commented 1 year ago
Owner

So, I have a number of thoughts on this. Yes, it would be fairly simple to turn this into a variable. That being said, I really don't want to encourage anyone using a hash truncated to less than 7. The though of such short 'hashes' causes me to cringe, as such truncating can allow for collisions.

git -> git clone https://github.com/cirosantilli/test-many-commits-1m.git                                                                                                                                                                                                         
Cloning into 'test-many-commits-1m'...                                                                                                                                                                                                                                            
remote: Enumerating objects: 1000002, done.                                                                                                                                                                                                                                       
remote: Counting objects: 100% (1000002/1000002), done.                                                                                                                                                                                                                           
remote: Compressing objects: 100% (63/63), done.                                                                                                                                                                                                                                  
remote: Total 1000002 (delta 999937), reused 1000002 (delta 999937), pack-reused 0                                                                                                                                                                                                
Receiving objects: 100% (1000002/1000002), 65.91 MiB | 8.40 MiB/s, done.                                                                                                                                                                                                          
Resolving deltas: 100% (999937/999937), done.                                                                                                                                                                                                                                     
git -> cd test-many-commits-1m/
test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-1 | uniq -d | wc -l
58861
test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-2 | uniq -d | wc -l
3838
test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-3 | uniq -d | wc -l
252
test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-4 | uniq -d | wc -l
13
test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-5 | uniq -d | wc -l
1

While I do acknowledge that this is a giant number of commits to test against, I really don't like the idea of making a feature that encourages the possibly of hash collisions.

So, I have a number of thoughts on this. Yes, it would be fairly simple to turn this into a variable. That being said, I really don't want to encourage anyone using a hash truncated to less than 7. The though of such short 'hashes' causes me to cringe, as such truncating can allow for collisions. ``` git -> git clone https://github.com/cirosantilli/test-many-commits-1m.git Cloning into 'test-many-commits-1m'... remote: Enumerating objects: 1000002, done. remote: Counting objects: 100% (1000002/1000002), done. remote: Compressing objects: 100% (63/63), done. remote: Total 1000002 (delta 999937), reused 1000002 (delta 999937), pack-reused 0 Receiving objects: 100% (1000002/1000002), 65.91 MiB | 8.40 MiB/s, done. Resolving deltas: 100% (999937/999937), done. git -> cd test-many-commits-1m/ test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-1 | uniq -d | wc -l 58861 test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-2 | uniq -d | wc -l 3838 test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-3 | uniq -d | wc -l 252 test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-4 | uniq -d | wc -l 13 test-many-commits-1m [M] -> git log --pretty=format:"%h" | cut -c1-5 | uniq -d | wc -l 1 ``` While I do acknowledge that this is a giant number of commits to test against, I really don't like the idea of making a feature that encourages the possibly of hash collisions.
tukusejssirs commented 1 year ago
Poster

I agree on short hashes shorter than 7 hex digits. I always keep it unmodified (i.e. to 7 digits as it is set by default). I just wanted to note that (1) you hard-coded it to 8 digits; (2) it won’t change the number of digits if someone changes their config or when the minimum hash size that would not collide is higher than 8 digits (like the linux repo).

I agree on short hashes shorter than 7 hex digits. I always keep it unmodified (i.e. to 7 digits as it is set by default). I just wanted to note that (1) you hard-coded it to 8 digits; (2) it won’t change the number of digits if someone changes their config or when the minimum hash size that would not collide is higher than 8 digits (like the `linux` repo).
Sign in to join this conversation.
No Milestone
No assignee
2 Participants
Loading...
Cancel
Save
There is no content yet.