Imagine if your database had the superpower of Git. What if every change to your data, every schema evolution, and every critical update was tracked, diffable, branchable, and mergeable, just like your application code? This isn’t a dream—it’s Dolt.

In the world of software development, Git has become an indispensable tool for managing code, collaborating with teams, and maintaining a complete history of changes. But what about data? Traditional relational databases offer some level of auditing through transaction logs or custom triggers, but they lack the native, powerful versioning capabilities that Git provides for code. This gap often leads to complex data management challenges, especially in collaborative environments or when dealing with critical data transformations.

This chapter introduces you to Dolt, the world’s first SQL database that natively supports Git-like version control. We’ll explore the fundamental concept of “Git for Data,” understand why it’s a game-changer for modern data workflows, and get your environment set up. By the end, you’ll have initialized your first Dolt database, made some changes, and seen how Dolt tracks your data’s evolution with familiar Git commands.

Why Version Control for Data Matters

When your data changes, understanding what changed, who changed it, and when it changed is crucial. In critical applications, data science projects, or regulatory environments, a clear, immutable history of your data isn’t just nice to have—it’s essential.

The Challenges of Unversioned Data

Without robust data versioning, you often face problems like:

  • Debugging Data Issues: Pinpointing when an erroneous data point was introduced or a schema change broke an application becomes a forensic nightmare. Imagine trying to roll back a specific data entry from weeks ago without a clear history!
  • Collaboration Headaches: Multiple data engineers or analysts working on the same dataset can inadvertently overwrite each other’s changes, leading to data inconsistencies and wasted effort. How do you merge independent changes to the same table?
  • Reproducibility Crisis: Data scientists struggle to reproduce machine learning model results because the underlying training data isn’t consistently versioned. Was the model trained on this version of the data or an older one?
  • Auditing and Compliance: Meeting regulatory requirements for data lineage and change tracking is cumbersome, often relying on custom, error-prone solutions.
  • Schema Evolution: Managing database schema changes across development, staging, and production environments is prone to errors and can cause downtime if not handled carefully.

Dolt’s Solution: Git for Your Data

Dolt addresses these challenges by embedding Git’s core principles directly into a SQL database. This means you can use familiar commands like commit, branch, merge, and diff on your data itself.

📌 Key Idea: Dolt treats your entire database—both schema and data—as a version-controlled repository, enabling the same robust workflows you use for code.

Understanding the Git-for-Data Paradigm

At its heart, Dolt is a relational database (either MySQL or PostgreSQL compatible) that stores its data in a content-addressable storage system, much like Git. This architecture allows it to track every change at a granular, cell-level.

Let’s break down the core Git concepts and how they apply to Dolt, helping you build a mental model for versioning your data:

  • Commit: In Git, a commit captures a snapshot of your codebase at a specific point in time. In Dolt, a commit captures a snapshot of your entire database (schema and data). Each commit has a unique identifier (hash), an author, a timestamp, and a descriptive message.
    • Why it matters: Commits provide an immutable, atomic record of your database’s state, allowing you to “time travel” to any previous version.
  • Branch: Git branches allow you to develop features or experiments in isolation without affecting the main codebase. Dolt branches let you do the same for your data. You can create a branch to experiment with new data imports, test schema changes, or develop new features that require isolated data environments.
    • Why it matters: Branches enable parallel development and experimentation without risking your production data.
  • Merge: When a feature branch is complete, you merge its changes back into the main branch. Dolt merges combine data and schema changes from different branches, intelligently handling conflicts.
    • Why it matters: Merging allows you to integrate validated changes from isolated branches back into your primary data stream.
  • Diff: Git’s diff command shows you the line-by-line differences between two versions of your code. Dolt’s diff shows you the cell-by-cell differences between two versions of your data or schema. This is incredibly powerful for auditing.
    • Why it matters: dolt diff provides precise visibility into what data (or schema) changed between any two points in history, simplifying debugging and auditing.
  • History (log): Just as git log shows your commit history, dolt log displays the chronological history of all commits made to your database.
    • Why it matters: The commit log provides a complete, auditable trail of every change, answering the who, what, and when for your data.
flowchart LR A[Initial Database] --> B{Branch or Direct}; B -->|Branch| C[Modify on Branch]; B -->|Direct| C; C --> D[Commit Changes]; D --> E{Review and Refine}; E -->|Yes| C; E -->|No| F{Merge Branch}; F -->|Yes| G[Merge to Main]; F -->|No| H[Final Database]; G --> H;

Figure 1.1: Simplified Git-for-Data Workflow in Dolt

Dolt vs. Doltgres: Choosing Your Flavor

Dolt offers two primary flavors, catering to different SQL ecosystems. Understanding the distinction is crucial for selecting the right tool for your project.

  1. Dolt (MySQL Compatible): This is the original Dolt, providing a MySQL-compatible interface. If you’re familiar with MySQL syntax, tools, or have existing applications built for MySQL, Dolt is your go-to choice. It speaks the MySQL wire protocol, meaning most MySQL clients and connectors work out of the box.
    • Why choose Dolt: Seamless integration with existing MySQL tools, drivers, and applications.
  2. Doltgres (PostgreSQL Compatible): Introduced to support the growing PostgreSQL ecosystem, Doltgres offers PostgreSQL compatibility. It’s ideal for developers and data engineers who prefer PostgreSQL’s features, syntax, or have existing PostgreSQL-based applications. It supports the PostgreSQL wire protocol.
    • Why choose Doltgres: Leverage PostgreSQL’s rich feature set, data types, and tooling, especially for projects requiring advanced SQL capabilities or migrating from existing PostgreSQL systems.

Quick Note: While the underlying Git-for-Data mechanics are identical, the SQL syntax and client tooling will differ between Dolt (MySQL) and Doltgres (PostgreSQL). For this guide, we’ll generally refer to dolt commands as they apply to both, but we’ll highlight Doltgres specifics where relevant, especially given our beginner-friendly project focus on PostgreSQL-style data.

Setting Up Your Dolt Environment

To get started, you’ll need to install Dolt and a compatible SQL client. We’ll aim for the latest stable release. As of 2026-06-06, we’ll proceed assuming Dolt v1.35.0 is the current stable release. Always check the official DoltHub releases page for the absolute latest version.

Step 1: Install Dolt

Dolt can be installed in several ways. We’ll cover the binary installation (recommended for local development) and Docker for containerized environments.

Option A: Install via Homebrew (macOS/Linux)

This is often the easiest way for macOS and many Linux distributions.

# Update Homebrew to ensure you get the latest packages
brew update

# Install Dolt
brew install dolt

# Verify the installation and check the version
dolt version

You should see output similar to dolt version 1.35.0.

Option B: Download Binary (macOS/Linux/Windows)

You can download the latest release directly from DoltHub’s releases page. Visit: https://github.com/dolthub/dolt/releases (As of 2026-06-06, look for v1.35.0 or the latest stable release).

For Linux (replace amd64 with arm64 if on an ARM-based system):

# Download the binary (adjust version and architecture as needed)
# Using 'v1.35.0' as the placeholder for the latest stable version on 2026-06-06
wget https://github.com/dolthub/dolt/releases/download/v1.35.0/dolt-linux-amd64

# Make the downloaded file executable
chmod +x dolt-linux-amd64

# Move it to a directory in your system's PATH (e.g., /usr/local/bin)
sudo mv dolt-linux-amd64 /usr/local/bin/dolt

# Verify the installation
dolt version

For Windows, download the .msi installer and follow the graphical instructions.

Option C: Using Docker

For a containerized environment, Dolt provides official Docker images. This is great for isolated testing or CI/CD pipelines.

# Pull the latest Dolt image (using v1.35.0 as the placeholder)
docker pull dolthub/dolt:v1.35.0

# Verify by running a simple command within the container
docker run dolthub/dolt:v1.35.0 dolt version

For Doltgres, the Docker image is dolthub/doltgres.

# Pull the latest Doltgres image (using v1.35.0 as the placeholder)
docker pull dolthub/doltgres:v1.35.0

# Verify
docker run dolthub/doltgres:v1.35.0 dolt version

Step 2: Install a SQL Client

You’ll need a SQL client to interact with your Dolt database.

  • For Dolt (MySQL compatible): The standard mysql command-line client (often available via your OS package manager, e.g., sudo apt install mysql-client on Debian/Ubuntu, brew install mysql-client on macOS) or a GUI tool like DBeaver, MySQL Workbench, or DataGrip.
  • For Doltgres (PostgreSQL compatible): The psql command-line client (part of the postgresql-client package on many systems, e.g., sudo apt install postgresql-client or brew install libpq on macOS) or a GUI tool like DBeaver, pgAdmin, or DataGrip.

For this guide, we’ll primarily use dolt sql for simplicity, which provides a built-in SQL shell. However, understanding how to connect with external clients is valuable for real-world applications.

Step-by-Step Implementation: Your First Dolt Database

Let’s create our first version-controlled database. We’ll simulate a simple customer database, tracking changes to both its schema and data.

Step 1: Initialize a New Dolt Repository

Open your terminal and navigate to a directory where you want to create your database project.

# Create a dedicated directory for your project
mkdir my-customer-data
cd my-customer-data

# Initialize a Dolt repository within this directory
dolt init

You should see output similar to:

Successfully initialized dolt data repository.

This command creates a hidden .dolt directory, just like git init creates a .git directory. This special directory is where Dolt stores all its versioning information, allowing it to track every change.

Step 2: Start the Dolt SQL Server

To interact with your Dolt database using SQL, you need to start the Dolt SQL server. This server will listen for SQL connections, just like a traditional MySQL or PostgreSQL server.

dolt sql-server

You’ll see output indicating the server is running, usually on port 3306 (for MySQL compatibility) or 5432 (for PostgreSQL compatibility if you initialized Doltgres). Keep this terminal window open; this is your server process.

Step 3: Connect to Your Dolt Database and Create a Table

Open a new terminal window. We’ll use the dolt sql command, which acts as a simple, built-in SQL client to connect to your running server.

# Connect to the running Dolt SQL server
dolt sql

You’ll now be in a SQL prompt, similar to mysql> or psql>. This prompt allows you to execute SQL commands directly against your Dolt database.

Let’s create a simple customers table. Notice the TIMESTAMP DEFAULT CURRENT_TIMESTAMP for created_at—a common practice to track when a record was added.

CREATE TABLE customers (
    id INT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Press Enter to execute the CREATE TABLE statement. You should see Query OK (or similar for Doltgres).

Now, let’s add some initial data to our new table. We’re inserting two customer records.

INSERT INTO customers (id, name, email) VALUES
(1, 'Alice Smith', '[email protected]'),
(2, 'Bob Johnson', '[email protected]');

After inserting, let’s verify that the data is there.

SELECT * FROM customers;

You should see your two customer records listed.

Step 4: Commit Your Changes

So far, we’ve created a table and added data, but these changes haven’t been “versioned” yet. They’re currently in your working set, much like changes you’ve made in your code editor before you git add and git commit.

Exit the dolt sql prompt by typing exit or \q (if using psql style).

Back in the terminal where you initialized Dolt (not the server terminal), use dolt status to see your uncommitted changes:

dolt status

You’ll see output indicating that the customers table has new data and schema changes. Dolt intelligently detects both.

Now, let’s commit these changes to your database’s history. This is the moment you create a permanent snapshot.

dolt add .
dolt commit -m "Initial commit: Created customers table and added two customers"

dolt add . stages all changes (both schema and data) for the next commit, similar to git add .. dolt commit -m "..." creates a new snapshot in your database’s history, along with a descriptive message.

You’ll see output confirming the commit, including a unique commit hash. Congratulations! You’ve just made your first data commit with Dolt.

Exploring Your Data’s History: dolt log and dolt diff

Now that we have a commit, let’s see how Dolt helps us track history and inspect granular changes.

Step 1: Make More Changes

Let’s add another customer and update an existing one to create new changes to track.

Start the dolt sql-server again if it’s not running, and connect with dolt sql in a new terminal.

# Add a new customer
INSERT INTO customers (id, name, email) VALUES
(3, 'Charlie Brown', '[email protected]');

# Update an existing customer's email
UPDATE customers SET email = '[email protected]' WHERE id = 1;

# Verify the current state of the table
SELECT * FROM customers;

You should now see three customers, with Alice’s email updated.

Step 2: Inspect Uncommitted Changes with dolt diff

Exit the dolt sql prompt. Back in your main terminal (where you run dolt commands), let’s see what changes we’ve made before committing them.

dolt diff

This command shows you the differences between your current working set (the changes you just made) and the last committed version of your database. You’ll see output indicating:

  • An INSERT for the new Charlie Brown record.
  • An UPDATE for Alice Smith’s email.

🧠 Important: dolt diff is incredibly powerful. It shows you cell-level changes (which values changed in which cells), making it easy to review exactly what data has been modified. This is far more granular than typical database audit logs.

Step 3: Commit the New Changes

Now that we’ve reviewed the changes, let’s commit them to our history.

dolt add .
dolt commit -m "Added Charlie Brown and updated Alice Smith's email"

You’ll receive a new commit hash, marking this as the second snapshot in your database’s history.

Step 4: View the Commit History with dolt log

Now that we have two commits, let’s look at the database’s complete history.

dolt log

You’ll see a list of your commits, ordered from newest to oldest, each with its unique commit hash, author, date, and the descriptive message you provided. This is your database’s complete, auditable history, just like a Git log for your code!

Mini-Challenge

It’s your turn to practice and solidify your understanding of Dolt’s core versioning capabilities.

Challenge:

  1. Add a new column phone_number to your customers table.
  2. Update one of your existing customers to include a phone number in this new column.
  3. Add another new customer with both an email and a phone number.
  4. Use dolt status to see all your pending changes (both schema and data).
  5. Use dolt diff to review the exact changes you’ve made (schema modification and data insertions/updates).
  6. Commit your changes with a descriptive message like “Added phone_number column and updated customer details.”
  7. Finally, use dolt log to confirm your new commit is part of the database history.

Hint:

  • Remember to start dolt sql-server and connect with dolt sql for SQL operations.
  • For adding a column: ALTER TABLE customers ADD COLUMN phone_number VARCHAR(20);
  • For updating data: UPDATE customers SET phone_number = '555-1234' WHERE id = 1;
  • Remember to dolt add . before dolt commit to stage your changes.

What to observe/learn: Pay close attention to how dolt diff displays not just data changes but also schema changes (e.g., adding a column). This clearly demonstrates Dolt’s ability to version everything in your database.

Common Pitfalls & Troubleshooting

As you get started with Dolt, you might encounter a few common issues. Here’s how to navigate them:

  1. Forgetting to Commit: Just like Git, changes you make in dolt sql (or via any external SQL client) are only persistent in the working set until you dolt add . and dolt commit. If you close your terminal without committing, your changes are still physically present, but they aren’t part of the version history. Always remember to commit after making meaningful changes!
    • Troubleshooting: If you made changes and dolt log doesn’t show them, run dolt status to see uncommitted changes, then dolt add . and dolt commit.
  2. Confusing dolt and git Commands: While the commands are intentionally similar, remember you’re interacting with dolt for database operations, not git. For example, it’s dolt branch, dolt merge, dolt remote—not git branch, etc.
    • Troubleshooting: If a command isn’t working, double-check that you’re using dolt as the prefix.
  3. Dolt SQL Server Not Running: You can’t connect to dolt sql or an external client if dolt sql-server isn’t running in a separate terminal. The client needs a server to connect to.
    • Troubleshooting: Ensure you have dolt sql-server running in one terminal window before attempting to connect from another.
  4. Port Conflicts: By default, Dolt (MySQL compatible) runs on port 3306 and Doltgres (PostgreSQL compatible) on port 5432. If you’re running another MySQL or PostgreSQL server on the same default port, Dolt might fail to start.
    • Troubleshooting: You can specify a different port using dolt sql-server --port <your_port_number>. For example, dolt sql-server --port 3307. Then, connect your client to that specified port.

Summary

In this introductory chapter, you’ve taken your first steps into the powerful world of Dolt:

  • We explored the critical need for data version control and how traditional databases often fall short in providing comprehensive historical tracking.
  • You learned about Dolt’s Git-for-Data paradigm, understanding how familiar concepts like commits, branches, merges, and diffs apply directly to your database schema and data.
  • We distinguished between Dolt (MySQL compatible) and Doltgres (PostgreSQL compatible), helping you understand which flavor to choose based on your ecosystem preferences.
  • You successfully installed Dolt and set up your local development environment.
  • Through hands-on exercises, you initialized your first Dolt database, created tables, inserted data, and, most importantly, committed your changes to create version snapshots.
  • You experienced the power of dolt log for viewing the complete history of your database and dolt diff for inspecting granular data and schema changes between versions.

You’ve now laid the foundation for treating your data with the same rigorous version control as your code. This paradigm shift will empower you to build more robust, auditable, and collaborative data workflows.

What’s Next? In the next chapter, we’ll dive deeper into Dolt’s advanced versioning capabilities, exploring how to use branching and merging to manage parallel data development, experiment with changes safely, and resolve conflicts, just like you would with source code.

References

  • DoltHub Documentation: https://docs.dolthub.com/
  • Dolt Installation Guide: https://docs.dolthub.com/introduction/installation
  • Dolt CLI Commands Reference: https://docs.dolthub.com/cli-reference/cli-commands
  • Dolt vs. Doltgres Comparison: https://docs.dolthub.com/introduction/dolt-vs-doltgres

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.