First Name*

Last Name*

Email ID

Phone*

College - Where did you study?*

One of the IITs

One of the NITs

One of the BITs

One of the IIITs

One of the NIDs

Agnel Charities' FR. C. Rodrigues Institute of Technology, Vashi, Navi Mumbai

Atal Bihari Vajpayee Indian Institute of Information Technology & Management Gwalior (IIIT)

B M S College of Engineering Basavanagudi,Bangalore(BMSCE)

B.R.A.C.T's Vishwakarma Institute of Information Technology, Kondhwa(VIIT)

Bansilal Ramnath Agarawal Charitable Trust's Vishwakarma Institute of Technology, Bibwewadi, Pune (VIT Pune)

Bhartiya Vidya Bhavan's Sardar Patel Institute of Technology , Andheri, Mumbai (SPIT)

Bhilai Institute of Technology, Bhilai House, Durg(BIT)

Bhilai Institute of Technology.

Birla Institute of Technology, Goa

Birla Institute of Technology, Hydrabad

Birla Institute of Technology, Mesra, Ranchi

Birla Institute of Technology, Pilani, Rajasthan

CHAITANYA BHARATHI INSTITUTE OF TECHNOLOGY(CBIT)

Coimbatore Institute Of Technology(CIT) (Autonomous)

College of Engineering, Pune (COEP)

CV Raman Global University

Dayananda Sagar College of Engineering Bangalore (DSCE)

Delhi Technological University, DTU Delhi

Desai University, (DDU), Nadiad

Dhirubhai Ambani Institute of Info. & Comm. Tech.,(DA-IICT)

Don Bosco Institute of Technology, Mumbai

Dr. Ambedkar Institute Of Technology Bangalore

Faculty Of Technology & Engineering(MSU), Vadodara

Faculty Of Technology And Engineering(GIA), Dharmsinh

Fr. Conceicao Rodrigues College of Engineering, Bandra,Mumbai

Garv Institute of Management & Technology.

Government College of Engineering, Amravati

Govt Engineering College, Bilaspur.

Govt Engineering College, Raipur.

Govt. Engineering College, Raipur (GEC Raipur)

IIIT Hyderabad

Indian Institute of Art and Design(IIAD), Delhi

Indian Institute of Engineering Science and Technology, Shibpur (IIEST Shibpur)

Indian Institute of Information Technology (IIIT) Pune

Indian Institute of Information Technology (IIIT)Kota, Rajasthan

Indian Institute of Information Technology Surat (IIIT)

Indian Institute of Information Technology(IIIT) Kilohrad, Sonepat, Haryana

Indian Institute of Information Technology(IIIT), Vadodara, Gujrat

Indian Institute of Information Technology, Design & Manufacturing, Kancheepuram (IIIT)

Indian Institute of Technology (BHU) Varanasi

Indian Institute of Technology (ISM) Dhanbad

Indian Institute of Technology Bhilai

Indian Institute of Technology Bhubaneswar

Indian Institute of Technology Bombay

Indian Institute of Technology Delhi

Indian Institute of Technology Dharwad

Indian Institute of Technology Gandhinagar

Indian Institute of Technology Goa

Indian Institute of Technology Guwahati

Indian Institute of Technology Hyderabad

Indian Institute of Technology Indore

Indian Institute of Technology Jammu

Indian Institute of Technology Jodhpur

Indian Institute of Technology Kanpur

Indian Institute of Technology Kharagpur

Indian Institute of Technology Madras

Indian Institute of Technology Mandi

Indian Institute of Technology Palakkad

Indian Institute of Technology Patna

Indian Institute of Technology Roorkee

Indian Institute of Technology Ropar

Indian Institute of Technology Tirupati

Indraprastha Institute of Information Technology Delhi (IIIT-Delhi)

INSTITUTE OF ENGINEERING & TECHNOLOGY,LUCKNOW (0052)(IET Lucknow)

Institute of Engineering and Management, Kolkata

Institute of Engineering and Technology, DAVV, Indore (1996)

Institute Of Technology, Nirma University Of Science & Technology, Ahmedabad

International Institute of Information Technology, Bhubaneswar

International Institute of Information Technology, Naya Raipur

Jabalpur Engineering College, Jabalpur, (JEC) (1947)

Jadavpur Uni

Jadavpur University

JSS Science and Technology University(Formerly SJCE) Mysore

K J Somaiya Institute of Engineering and Information Technology, Sion, Mumbai

K.J.Somaiya College of Engineering, Vidyavihar, Mumbai

Kalinga Institute of Industrial Technology

L.D.College Of Engineering, Ahmedabad (LDCE)

M S Ramaiah Institute of Technology Bangalore (MSRIT)

Madhav Institute of Technology & Science, Gwalior (1957)

MAEER’S MIT, Pune

Maharashtra Academy of Engineering and Educational Research

Maharashtra Institute of Technology (MIT)

Malaviya National Institute of Technology Jaipur

Manipal Institute of Technology (MIT)

Maulana Abul Kalam Azad University of Technology, Kolkata

Maulana Azad National Institute of Tehnology Bhopal

MIT Academy of Engineering,Alandi, Pune

MKSSS's Cummins College of Engineering for Women, Karvenagar,Pune

Motilal Nehru National Institute of Technology Allahabad

National Institute of Design(NID)

National Institute of Technology Calicut

National Institute of Technology Delhi

National Institute of Technology Durgapur

National Institute of Technology Hamirpur

National Institute of Technology Jalandhar

National Institute of Technology Karnataka, Surathkal

National Institute of Technology Patna

National Institute of Technology Raipur

National Institute of Technology, Andhra Pradesh

National Institute of Technology, Jamshedpur

National Institute of Technology, Kurukshreta

National Institute of Technology, Rourkela

National Institute of Technology, Silchar

National Institute of Technology, Tiruchirappalli

National Institute of Technology, Warangal

Netaji Subhas University of Technology, New Delhi (NSUT Delhi)

O U COLLEGE OF ENGG HYDERABAD (UCE)

P E S University (Electronic City Campus) Bangalore(PES)

P E S University (Ring Road Campus) Bangalore(PES)

Pandit Deendayal Petroleum University ,Gandhinagar(PDPU)

Pimpri Chinchwad Education Trust, Pimpri Chinchwad College of Engineering, Pune(PCCOE)

PSG College of Engineering and Technology

Pt. Dwarka Prasad Mishra Indian Institute of Information Technology, Design & Manufacture Jabalpur

Pune Institute of Computer Technology, Dhankavdi, Pune(PICT)

Punjab Engineering College, Chandigarh (PEC)

R. V. College of Engineering Bangalore(RVCE)

Sardar Patel Institute of Technology, Andheri, Mumbai

Sardar Vallabhbhai National Institute of Technology, Surat

School of Engineering and Applied Science, Ahmedabad (SEAS)

Shri G.S. Institute of Technology & Science, Indore (M.P.) (1952)

Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded

Shri Shankaracharya Technical Campus,(Shri Shankaracharya Group of Institutions).

Shri Vile Parle Kelvani Mandal's Dwarkadas J. Sanghvi College of Engineering, Vile Parle,Mumbai (DJSCE)

Silicon Institute of Technology

Sir M.Visveswaraya Institute of Technology Hunasemaranahalli,Bangalore,

SOA ITER, Bhubaneshwar

Sri Jayachamarajendra College of Engineering(Const. of JSS Univ.) Mysore

Sri Sivasubramaniya Nadar College Of Engg (Autonomous) (SSN)

Srishti Institute of Art and Design, Bangaluru

SSN CoE, Kalavakkam

Symbiosis Institute of Design(SID),Pune

The National Institute of Engineering Mysore (NIE)

Thiagarajar College Of Engineering (Autonomous) (TCE)

University Institute of Technology RGPV, Bhopal (1986)

University of Kalyani, Kalyani

University Visveswariah College of Engineering Bangalore (UVCE)

VASAVI COLLEGE OF ENGINEERING (VCE)

Veer Surendra Sai University of Technology

Veermata Jijabai Technological Institute(VJTI), Matunga, Mumbai

Vellore Institute of Technology(VIT Vellore)

Vidyalankar Institute of Technology,Wadala, Mumbai

Vishwakarma Government Engineering College, Chandkheda,Gandhinagar (VGECG)

Visvesvaraya National Institute of Technology, Nagpur

Vivekanand Education Society's Institute of Technology, Chembur, Mumbai

Walchand College of Engineering, Sangli (WCE)

Field of Study (Graduation)*

BTech

BDES/MDES

BCA

BSc

Others

Upload your CV*

Yes, I would like Talentica Software to contact me. Click here to read our full Privacy Policy.

First Name*

Last Name*

Email ID

Phone*

Message

Yes, I would like Talentica Software to contact me. Click here to read our full Privacy Policy.

Explanation of Git design principles through Git internals

May 21, 2018

Vinod Iyer

Contributor

May 21, 2018

Vinod Iyer

Contributor

In this blog we’ll explore the internals of how Git works. Having some behind-the-scenes working knowledge of Git will help you understand why Git is so much faster than traditional version control systems. Git also helps you recover data from unexpected crash/ delete scenarios.

For a developer, it is quite useful to understand the design principles of Git and see how it manages both speed of access (traversing to previous commits) and small disk space for repository.

In this blog, we will cover the following topics:

Initializing a new repository
Working directory and local repository
Git objects
Blob
Tree
Commit
Tag
Packs

I’m using Ubuntu 16.04 LTS, Zsh terminal and Git v2.7.4 for this blog, but you can use any operating system and terminal to follow along.

Initializing a new repository

Initialize a new Git repository with ‘git init’ command.

Create a couple of files by running the following commands.

$ echo “First file”>>first.txt

$ echo “Second file”>>second.txt

Run ls –la to display all the contents of the folder.
It should show .git directory and the two files we created.

Working directory and local repository

Working directory comprises of the files and folder we want to manage with Git Version Control System (VCS). In this case ~/testRepo folder constitutes our working directory except for ‘.git’ folder. .git folder forms the local repository. ‘.git’ folders contain everything that Git stores to manage our data.

Git objects

Git doesn’t store diff of the contents of the file. It stores snapshots of each file, that is each version of the file is stored exactly as it is at the point it is staged and committed. This is done for faster access, one of the core principles for development of Git.

The latest version of the file is always stored as is in Git as it is most likely the one to be used. Also, storing it makes it much faster for retrieve operations.

As further versions of the file are added, Git automatically creates pack files. We’ll discuss more on pack files later.

There are 4 types of git objects.

Blob
Tree
Commit
Tag

Let’s understand these with an example. First let’s check the contents of .git folder.

There are 5 directories and 3 files. Let’s start with objects.

Find command returned no results, i.e. there are currently no files in objects folder. Let’s stage our two files and then check.

We can see that there are two files. To explore these objects, we would need to use the command ’git cat-file’

From the main pages of the Git cat-file, “Provides content or type and size information for repository objects”

-p argument pretty-prints the object contents and –t returns the type.

The objectId(SHA-1 Id) is subpath of the file from .git/objects folder without the slashes.

For example, object Id of .git/objects/20/d5b672a347112783818b3fc8cc7cd66ade3008 is 20d5b672a347112783818b3fc8cc7cd66ade3008.

The type that was returned for both the objects is blob.

So blobs are objects used for storing content of the file. Blobs just store the content, no other information like file name.

Let’s commit our code now with ‘git commit’ command.

Next, run ‘git log’ to retrieve the commit ID.

Copy the commit ID and then run cat-file commands on it.

The object type that is returned is ‘commit’. It contains reference to a tree object, author name, committer name and the commit message.

Now let’s check the tree object.

The tree object contains reference to the blob files we saw earlier, and also the reference to file names. We can summarize our Git repository state at this point with the following object diagram:

Let’s add a folder to our repository.

Run the following commands. We’ll use ‘git add’ to add folder from working directory to the local repository.

$ mkdir fol1

$ echo “Third File”>> fol1/third.txt

$ git add fol1

$ git commit -m “Second commit”

Inspect the second commit object.

Notice that the tree reference has changed. Also, there is a parent object property. So commit object also stores the parent commit’s id. Let’s inspect the tree.

The folder we just added, fol1, is stored as a tree and it contains reference to a blob referencing third.txt file. So tree objects reference blobs or other sub tree objects.

Now let’s discuss tags. Tags are used to provide an alias to a commit SHA ID for future reference/use. There are two types of tags: lightweight tags and annotated tags.

Lightweight tags contain only the SHA-ID information.

The command ‘git tag light’ creates a file under .git/refs/tags/light which contains the commit Id on which the tag was created. No separate tag object is created. This is mostly used for development purposes, to easily remember and traverse back to a commit.

Annotated tags are usually used for release. They contain extra information like message and the tagger name along with the commit ID. Annotated tags can be created with ‘ git tag –a –m “<message>” ‘ command.

A separate tag object is created for annotated tag. You can list the tags created with ‘git tag’ command

Packs

Although Git stores the contents of the latest versions of object intact, the older versions are stored as deltas in pack files. Let’s understand with an example. We would need a slightly larger file to see the difference in size of the delta file and original file. Download GNU license web page.
Run the following command to download the license HTML file.

$ curl -L -O -C – https://www.gnu.org/licenses/gpl-3.0.en.html

You should now have gpl-3.0.en.H file in your working directory.

Add and commit the file.

$ git add gpl-3.0.en.html

$ git commit -m “Added gpl file”

Inspect the commit and get the blob info of the added file.

git cat-file -p 53550a1c9325753eb44b1428a280bfb2cd5b90ef

The last command returns the size of the blob. That is the blob containing content of gpl-3.0.en.html is 49641 bytes.

4. Edit the file and commit it again.

A new blob is created with a slightly larger size. Now let’s check the original blob.

The original blog still exists. Thus for each change, a snapshot of the file is stored with its contents intact.

5. Let’s pack the files, with ‘git gc’ command.

As you can see, all of our existing blob and commit objects are gone and are now replaced with pack files.

6. We can run ‘git verify-pack’ command to inspect the pack file.

Now let’s check the highlighted entries.

The blob with commit ‘6be03f’ is the second version of gpl-3.0.en.HTML and the one with commit ‘0f6718’ is the first version.

The third column in output represents the blob size. As you can see, the first blob is now reduced to 9 bytes and references the second blob which maintains its size of 49652. Thus the first blob is stored as a delta of second blob although the first one is older. This is because the newer version is most likely to be the one to be used. Git automatically calls pack when pushing to remote repository.

Conclusion

In this blog we explored how Git stores the files internally and looked at the various types of Git objects, i.e, blob, tree and commit and how they are linked to each other. Then we also looked at packs, which explained how Git compresses older versions of a file and saves storage space.