Make sense of what git actually is……

DevelopmentBlog

A deep dive into Git fundamentals: understanding Git as a database of snapshots rather than a collection of commands. Covers DAG architecture, commits, branches as pointers, HEAD, checkout/reset/revert operations, rebasing strategies, and practical recovery using reflog. Essential knowledge for full-stack developers working with complex backend systems and CI/CD pipelines.

We have all been there. It is 11:00 PM, you are trying to push a critical fix, and suddenly Git throws an error you don't understand. You try a git pull, then maybe a git merge, and things get worse. Finally, you find yourself Googling "how to undo git rebase," mass-copying commands from Stack Overflow, and praying you don't delete three days of work.

The uncomfortable truth is that many of us—even experienced full-stack developers—don't actually understand Git. We memorize commands for git like: commit, push, pull. It works until the moment it doesn't.

I used to treat Git like a black box, terrified of touching anything outside the standard workflow. But as I moved into more complex backend architectures and intense debugging sessions, I realized that memorizing commands wasn't enough. I needed to understand the data structure underneath.

In this post, I want to strip away the interface and look at Git for what it actually is: a database. Once you understand the graph architecture, you will never fear a merge conflict or a detached HEAD again.

The Database of Snapshots

To understand Git, you have to forget the idea that it stores "changes." It doesn't.

Git is a database, and its fundamental unit is the Commit.

A commit is not a diff. It is a complete snapshota photograph—of your entire project at a specific moment in time. When you run git commit, Git takes the state of every single file in your directory and saves it.

The Anatomy of a Commit

Every commit object in the database contains three specific things:

  1. The Snapshot Pointer: A link to the complete state of the project.
  2. Metadata: The author, timestamp, and commit message.
  3. The Parent Pointer: A link to the commit that came directly before it.

This parent pointer is crucial. When you make a new commit, Git saves the project state and links it back to where you were. This creates a chain. Children know their parents, but parents never know their future children.

The Directed Acyclic Graph (DAG)

If you visualize this, you get a structure known in computer science as a Directed Acyclic Graph (DAG).

  • Directed: Relationships go one way (Child → Parent).
  • Acyclic: There are no loops (a commit cannot be its own ancestor).
  • Graph: It is a web of nodes and connections.

This graph is your project's history. Every branch, merge, and feature is just a path through this graph. Because every commit is a full snapshot, you can jump to any node in this graph and see the project exactly as it existed, without needing to "replay" changes.

Here is an article to learn more about git DAG : https://medium.com/@a.kago1988/why-the-git-graph-is-a-directed-acyclic-graph-dag-f9052b95f97f

Exploration: Branches are Just Sticky Notes

The most common misconception I see is that branches are heavy, separate copies of the codebase. This is technically incorrect.

A branch in Git is nothing more than a pointer. It is a lightweight file containing 40 characters: the hash of a specific commit. You can think of a branch as a **sticky note** attached to a specific commit in the graph.

How Branching Actually Works

When you run git branch feature-login, Git simply creates a new sticky note pointing to your current commit. That is why branching is instantaneous—you aren't copying files; you are writing a checksum to a text file.

When you make a new commit while on a branch:

  1. Git creates the new commit object pointing back to the current parent.
  2. Git moves the "sticky note" (the branch pointer) forward to the new commit.

The HEAD Pointer

So how does Git know where you are? Enter HEAD.

HEAD is a special pointer that usually points to a branch name.

  • Normal State: HEAD → main → Commit C1.
  • Switching Branches: When you run git checkout feature, you are just moving the HEAD pointer to look at the feature sticky note.

The "Detached HEAD" State

This often scares developers, but it shouldn't. If you check out a specific commit hash instead of a branch name (e.g., git checkout a1b2c3d), HEAD points directly to that commit.

There is no branch sticky note in between. You are in a "Detached HEAD" state. You can make commits here, but since no branch pointer is tracking them, they will be orphaned as soon as you switch away. They effectively float in space until Git's garbage collector eats them.

Implementation: Checkout vs. Reset vs. Revert

The biggest source of data loss in Git comes from confusing the three commands that "undo" things. They manipulate the three distinct layers of Git differently:

  1. Working Directory: The actual files on your disk.
  2. Staging Area (Index): The waiting room for the next commit.
  3. Repository: The permanent history database.

Here is how the operations differ:

1. Checkout: The Safe Observer

git checkout only moves HEAD. It updates your working directory to match the snapshot HEAD is pointing to. Crucially, it does not change history. No branches move. It is a read-only operation regarding the graph structure.

2. Reset: The Time Traveler

git reset is dangerous because it moves the branch pointer backward. If you are on main and run git reset <commit-hash>, you are forcibly moving the main sticky note back to an older commit.

The commits you left behind become orphaned. But reset has three modes that determine what happens to your work in progress:

ModeCommandEffect on HistoryEffect on StagingEffect on Working DirUse Case
Softgit reset --softMoves Branch BackUnchangedUnchangedSquashing multiple commits into one.
Mixedgit reset (Default)Moves Branch BackResets to Match TargetUnchangedUnstaging work to commit it differently.
Hardgit reset --hardMoves Branch BackResets to Match TargetDELETES CHANGESAbandoning work completely.

Warning: I have watched developers lose days of work with git reset --hard. It wipes the working directory. Any uncommitted changes are gone forever.

3. Revert: The Forward Fix

git revert is the safest way to undo changes in a shared environment. It doesn't move pointers backward or rewrite history. Instead, it creates a new commit that mathematically negates the changes of a previous commit.

If Commit A added 50 lines of code, git revert A creates Commit B that deletes those 50 lines. The history remains linear, and you haven't destroyed the timeline for your teammates.

Production Insights: Rebase and The Safety Net

The Truth About Rebase

git rebase is often sold as a way to "clean up" history, but you need to understand the mechanism.

When you rebase a feature branch onto main:

  1. Git looks at your commits.
  2. It calculates the changes (diffs).
  3. It creates brand new commits with new hashes, applying those changes on top of the new main.
  4. It moves your branch pointer to the new commits.
  5. The old commits are orphaned.

The Golden Rule: Never rebase a branch that you have pushed to a shared repository.

If a colleague has pulled your branch, and you rebase it, you have effectively rewritten the history they are standing on. When they try to pull again, Git will see two divergent histories for the same code, leading to "merge conflict hell."

However, for local branches that haven't left my machine, I use rebase constantly. It keeps the history linear and makes debugging regressions much easier later on.

The Reflog: Your Undo Button

If you panicked and ran a git reset --hard or messed up a rebase, your work is likely still there.

Git rarely deletes anything immediately. It just hides it. You can access the Reference Log (Reflog):

Bash

git reflog

This command shows a chronological list of everywhere HEAD has pointed recently. Every commit, every checkout, every reset is logged here. If you can find the hash of the commit you "lost" in this list, you can simply checkout that hash and create a new branch to save it.

Official doc about git reflog : https://git-scm.com/docs/git-reflog
Git Log vs Git Reflog: https://gitprotect.io/blog/how-to-use-git-reflog-reflog-vs-log/

Broader Context & Future

Why does this deep dive matter?

In modern DevOps environments, we rely heavily on CI/CD pipelines. These pipelines trigger based on commit hooks. Understanding that a commit is a snapshot and a branch is a pointer helps you design better deployment strategies.

For example, "tagging" a release is just creating a permanent pointer to a specific commit snapshot. Rolling back a deployment is just pointing the production environment's HEAD to a previous snapshot.

As we move toward microservices (something I'm currently working on with backend architectures), keeping git history clean and understandable becomes critical for tracing bugs across distributed systems.

Actionable Summary

To master Git, stop thinking in commands and start thinking in graph topology:

  1. Commits are Snapshots: They are the immutable truth of your project at a point in time.
  2. Branches are Pointers: They are just mutable labels. Moving them is cheap.
  3. Visualize the Graph: Before running a complex command, picture the DAG. Are you moving HEAD (checkout) or moving the Branch (reset)?
  4. Rebase Locally, Merge Globally: Keep your local history clean, but never rewrite history that others are relying on.
  5. Reflog is King: If you mess up, check the reflog immediately.

The next time you are staring at a terminal at 11 PM, don't copy-paste blindly. Visualize the pointers, manipulate the graph, and fix the problem at the root. Sounds complicated but you’ll get used to it………….. sooner or later I guess 😅……………

PS: I should give credits where it’s due, this blog post is made because I saw a video about someone explaining about git the way I understood it… and I thought I should make a post about it as well so here is the link to the video as well : https://www.youtube.com/watch?v=Ala6PHlYjmw