Imagine if your database had the superpower of Git. What if every change to your data, every schema evolution, and every critical update was tracked, diffable, branchable, and mergeable, just like your application code? This isn’t a dream—it’s Dolt.
In the world of software development, Git has become an indispensable tool for managing code, collaborating with teams, and maintaining a complete history of changes. But what about data? Traditional relational databases offer some level of auditing through transaction logs or custom triggers, but they lack the native, powerful versioning capabilities that Git provides for code. This gap often leads to complex data management challenges, especially in collaborative environments or when dealing with critical data transformations.
This chapter introduces you to Dolt, the world’s first SQL database that natively supports Git-like version control. We’ll explore the fundamental concept of “Git for Data,” understand why it’s a game-changer for modern data workflows, and get your environment set up. By the end, you’ll have initialized your first Dolt database, made some changes, and seen how Dolt tracks your data’s evolution with familiar Git commands.
Why Version Control for Data Matters
When your data changes, understanding what changed, who changed it, and when it changed is crucial. In critical applications, data science projects, or regulatory environments, a clear, immutable history of your data isn’t just nice to have—it’s essential.
The Challenges of Unversioned Data
Without robust data versioning, you often face problems like:
- Debugging Data Issues: Pinpointing when an erroneous data point was introduced or a schema change broke an application becomes a forensic nightmare. Imagine trying to roll back a specific data entry from weeks ago without a clear history!
- Collaboration Headaches: Multiple data engineers or analysts working on the same dataset can inadvertently overwrite each other’s changes, leading to data inconsistencies and wasted effort. How do you merge independent changes to the same table?
- Reproducibility Crisis: Data scientists struggle to reproduce machine learning model results because the underlying training data isn’t consistently versioned. Was the model trained on this version of the data or an older one?
- Auditing and Compliance: Meeting regulatory requirements for data lineage and change tracking is cumbersome, often relying on custom, error-prone solutions.
- Schema Evolution: Managing database schema changes across development, staging, and production environments is prone to errors and can cause downtime if not handled carefully.
Dolt’s Solution: Git for Your Data
Dolt addresses these challenges by embedding Git’s core principles directly into a SQL database. This means you can use familiar commands like commit, branch, merge, and diff on your data itself.
📌 Key Idea: Dolt treats your entire database—both schema and data—as a version-controlled repository, enabling the same robust workflows you use for code.
Understanding the Git-for-Data Paradigm
At its heart, Dolt is a relational database (either MySQL or PostgreSQL compatible) that stores its data in a content-addressable storage system, much like Git. This architecture allows it to track every change at a granular, cell-level.
Let’s break down the core Git concepts and how they apply to Dolt, helping you build a mental model for versioning your data:
- Commit: In Git, a commit captures a snapshot of your codebase at a specific point in time. In Dolt, a commit captures a snapshot of your entire database (schema and data). Each commit has a unique identifier (hash), an author, a timestamp, and a descriptive message.
- Why it matters: Commits provide an immutable, atomic record of your database’s state, allowing you to “time travel” to any previous version.
- Branch: Git branches allow you to develop features or experiments in isolation without affecting the main codebase. Dolt branches let you do the same for your data. You can create a branch to experiment with new data imports, test schema changes, or develop new features that require isolated data environments.
- Why it matters: Branches enable parallel development and experimentation without risking your production data.
- Merge: When a feature branch is complete, you merge its changes back into the main branch. Dolt merges combine data and schema changes from different branches, intelligently handling conflicts.
- Why it matters: Merging allows you to integrate validated changes from isolated branches back into your primary data stream.
- Diff: Git’s
diffcommand shows you the line-by-line differences between two versions of your code. Dolt’sdiffshows you the cell-by-cell differences between two versions of your data or schema. This is incredibly powerful for auditing.- Why it matters:
dolt diffprovides precise visibility into what data (or schema) changed between any two points in history, simplifying debugging and auditing.
- Why it matters:
- History (
log): Just asgit logshows your commit history,dolt logdisplays the chronological history of all commits made to your database.- Why it matters: The commit log provides a complete, auditable trail of every change, answering the who, what, and when for your data.
Figure 1.1: Simplified Git-for-Data Workflow in Dolt
Dolt vs. Doltgres: Choosing Your Flavor
Dolt offers two primary flavors, catering to different SQL ecosystems. Understanding the distinction is crucial for selecting the right tool for your project.
- Dolt (MySQL Compatible): This is the original Dolt, providing a MySQL-compatible interface. If you’re familiar with MySQL syntax, tools, or have existing applications built for MySQL, Dolt is your go-to choice. It speaks the MySQL wire protocol, meaning most MySQL clients and connectors work out of the box.
- Why choose Dolt: Seamless integration with existing MySQL tools, drivers, and applications.
- Doltgres (PostgreSQL Compatible): Introduced to support the growing PostgreSQL ecosystem, Doltgres offers PostgreSQL compatibility. It’s ideal for developers and data engineers who prefer PostgreSQL’s features, syntax, or have existing PostgreSQL-based applications. It supports the PostgreSQL wire protocol.
- Why choose Doltgres: Leverage PostgreSQL’s rich feature set, data types, and tooling, especially for projects requiring advanced SQL capabilities or migrating from existing PostgreSQL systems.
⚡ Quick Note: While the underlying Git-for-Data mechanics are identical, the SQL syntax and client tooling will differ between Dolt (MySQL) and Doltgres (PostgreSQL). For this guide, we’ll generally refer to dolt commands as they apply to both, but we’ll highlight Doltgres specifics where relevant, especially given our beginner-friendly project focus on PostgreSQL-style data.
Setting Up Your Dolt Environment
To get started, you’ll need to install Dolt and a compatible SQL client. We’ll aim for the latest stable release. As of 2026-06-06, we’ll proceed assuming Dolt v1.35.0 is the current stable release. Always check the official DoltHub releases page for the absolute latest version.
Step 1: Install Dolt
Dolt can be installed in several ways. We’ll cover the binary installation (recommended for local development) and Docker for containerized environments.
Option A: Install via Homebrew (macOS/Linux)
This is often the easiest way for macOS and many Linux distributions.
# Update Homebrew to ensure you get the latest packages
brew update
# Install Dolt
brew install dolt
# Verify the installation and check the version
dolt versionYou should see output similar to dolt version 1.35.0.
Option B: Download Binary (macOS/Linux/Windows)
You can download the latest release directly from DoltHub’s releases page.
Visit: https://github.com/dolthub/dolt/releases (As of 2026-06-06, look for v1.35.0 or the latest stable release).
For Linux (replace amd64 with arm64 if on an ARM-based system):
# Download the binary (adjust version and architecture as needed)
# Using 'v1.35.0' as the placeholder for the latest stable version on 2026-06-06
wget https://github.com/dolthub/dolt/releases/download/v1.35.0/dolt-linux-amd64
# Make the downloaded file executable
chmod +x dolt-linux-amd64
# Move it to a directory in your system's PATH (e.g., /usr/local/bin)
sudo mv dolt-linux-amd64 /usr/local/bin/dolt
# Verify the installation
dolt versionFor Windows, download the .msi installer and follow the graphical instructions.
Option C: Using Docker
For a containerized environment, Dolt provides official Docker images. This is great for isolated testing or CI/CD pipelines.
# Pull the latest Dolt image (using v1.35.0 as the placeholder)
docker pull dolthub/dolt:v1.35.0
# Verify by running a simple command within the container
docker run dolthub/dolt:v1.35.0 dolt versionFor Doltgres, the Docker image is dolthub/doltgres.
# Pull the latest Doltgres image (using v1.35.0 as the placeholder)
docker pull dolthub/doltgres:v1.35.0
# Verify
docker run dolthub/doltgres:v1.35.0 dolt versionStep 2: Install a SQL Client
You’ll need a SQL client to interact with your Dolt database.
- For Dolt (MySQL compatible): The standard
mysqlcommand-line client (often available via your OS package manager, e.g.,sudo apt install mysql-clienton Debian/Ubuntu,brew install mysql-clienton macOS) or a GUI tool like DBeaver, MySQL Workbench, or DataGrip. - For Doltgres (PostgreSQL compatible): The
psqlcommand-line client (part of thepostgresql-clientpackage on many systems, e.g.,sudo apt install postgresql-clientorbrew install libpqon macOS) or a GUI tool like DBeaver, pgAdmin, or DataGrip.
For this guide, we’ll primarily use dolt sql for simplicity, which provides a built-in SQL shell. However, understanding how to connect with external clients is valuable for real-world applications.
Step-by-Step Implementation: Your First Dolt Database
Let’s create our first version-controlled database. We’ll simulate a simple customer database, tracking changes to both its schema and data.
Step 1: Initialize a New Dolt Repository
Open your terminal and navigate to a directory where you want to create your database project.
# Create a dedicated directory for your project
mkdir my-customer-data
cd my-customer-data
# Initialize a Dolt repository within this directory
dolt initYou should see output similar to:
Successfully initialized dolt data repository.This command creates a hidden .dolt directory, just like git init creates a .git directory. This special directory is where Dolt stores all its versioning information, allowing it to track every change.
Step 2: Start the Dolt SQL Server
To interact with your Dolt database using SQL, you need to start the Dolt SQL server. This server will listen for SQL connections, just like a traditional MySQL or PostgreSQL server.
dolt sql-serverYou’ll see output indicating the server is running, usually on port 3306 (for MySQL compatibility) or 5432 (for PostgreSQL compatibility if you initialized Doltgres). Keep this terminal window open; this is your server process.
Step 3: Connect to Your Dolt Database and Create a Table
Open a new terminal window. We’ll use the dolt sql command, which acts as a simple, built-in SQL client to connect to your running server.
# Connect to the running Dolt SQL server
dolt sqlYou’ll now be in a SQL prompt, similar to mysql> or psql>. This prompt allows you to execute SQL commands directly against your Dolt database.
Let’s create a simple customers table. Notice the TIMESTAMP DEFAULT CURRENT_TIMESTAMP for created_at—a common practice to track when a record was added.
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Press Enter to execute the CREATE TABLE statement. You should see Query OK (or similar for Doltgres).
Now, let’s add some initial data to our new table. We’re inserting two customer records.
INSERT INTO customers (id, name, email) VALUES
(1, 'Alice Smith', '[email protected]'),
(2, 'Bob Johnson', '[email protected]');After inserting, let’s verify that the data is there.
SELECT * FROM customers;You should see your two customer records listed.
Step 4: Commit Your Changes
So far, we’ve created a table and added data, but these changes haven’t been “versioned” yet. They’re currently in your working set, much like changes you’ve made in your code editor before you git add and git commit.
Exit the dolt sql prompt by typing exit or \q (if using psql style).
Back in the terminal where you initialized Dolt (not the server terminal), use dolt status to see your uncommitted changes:
dolt statusYou’ll see output indicating that the customers table has new data and schema changes. Dolt intelligently detects both.
Now, let’s commit these changes to your database’s history. This is the moment you create a permanent snapshot.
dolt add .
dolt commit -m "Initial commit: Created customers table and added two customers"dolt add . stages all changes (both schema and data) for the next commit, similar to git add ..
dolt commit -m "..." creates a new snapshot in your database’s history, along with a descriptive message.
You’ll see output confirming the commit, including a unique commit hash. Congratulations! You’ve just made your first data commit with Dolt.
Exploring Your Data’s History: dolt log and dolt diff
Now that we have a commit, let’s see how Dolt helps us track history and inspect granular changes.
Step 1: Make More Changes
Let’s add another customer and update an existing one to create new changes to track.
Start the dolt sql-server again if it’s not running, and connect with dolt sql in a new terminal.
# Add a new customer
INSERT INTO customers (id, name, email) VALUES
(3, 'Charlie Brown', '[email protected]');
# Update an existing customer's email
UPDATE customers SET email = '[email protected]' WHERE id = 1;
# Verify the current state of the table
SELECT * FROM customers;You should now see three customers, with Alice’s email updated.
Step 2: Inspect Uncommitted Changes with dolt diff
Exit the dolt sql prompt. Back in your main terminal (where you run dolt commands), let’s see what changes we’ve made before committing them.
dolt diffThis command shows you the differences between your current working set (the changes you just made) and the last committed version of your database. You’ll see output indicating:
- An
INSERTfor the newCharlie Brownrecord. - An
UPDATEforAlice Smith’s email.
🧠 Important: dolt diff is incredibly powerful. It shows you cell-level changes (which values changed in which cells), making it easy to review exactly what data has been modified. This is far more granular than typical database audit logs.
Step 3: Commit the New Changes
Now that we’ve reviewed the changes, let’s commit them to our history.
dolt add .
dolt commit -m "Added Charlie Brown and updated Alice Smith's email"You’ll receive a new commit hash, marking this as the second snapshot in your database’s history.
Step 4: View the Commit History with dolt log
Now that we have two commits, let’s look at the database’s complete history.
dolt logYou’ll see a list of your commits, ordered from newest to oldest, each with its unique commit hash, author, date, and the descriptive message you provided. This is your database’s complete, auditable history, just like a Git log for your code!
Mini-Challenge
It’s your turn to practice and solidify your understanding of Dolt’s core versioning capabilities.
Challenge:
- Add a new column
phone_numberto yourcustomerstable. - Update one of your existing customers to include a phone number in this new column.
- Add another new customer with both an email and a phone number.
- Use
dolt statusto see all your pending changes (both schema and data). - Use
dolt diffto review the exact changes you’ve made (schema modification and data insertions/updates). - Commit your changes with a descriptive message like “Added phone_number column and updated customer details.”
- Finally, use
dolt logto confirm your new commit is part of the database history.
Hint:
- Remember to start
dolt sql-serverand connect withdolt sqlfor SQL operations. - For adding a column:
ALTER TABLE customers ADD COLUMN phone_number VARCHAR(20); - For updating data:
UPDATE customers SET phone_number = '555-1234' WHERE id = 1; - Remember to
dolt add .beforedolt committo stage your changes.
What to observe/learn:
Pay close attention to how dolt diff displays not just data changes but also schema changes (e.g., adding a column). This clearly demonstrates Dolt’s ability to version everything in your database.
Common Pitfalls & Troubleshooting
As you get started with Dolt, you might encounter a few common issues. Here’s how to navigate them:
- Forgetting to Commit: Just like Git, changes you make in
dolt sql(or via any external SQL client) are only persistent in the working set until youdolt add .anddolt commit. If you close your terminal without committing, your changes are still physically present, but they aren’t part of the version history. Always remember to commit after making meaningful changes!- Troubleshooting: If you made changes and
dolt logdoesn’t show them, rundolt statusto see uncommitted changes, thendolt add .anddolt commit.
- Troubleshooting: If you made changes and
- Confusing
doltandgitCommands: While the commands are intentionally similar, remember you’re interacting withdoltfor database operations, notgit. For example, it’sdolt branch,dolt merge,dolt remote—notgit branch, etc.- Troubleshooting: If a command isn’t working, double-check that you’re using
doltas the prefix.
- Troubleshooting: If a command isn’t working, double-check that you’re using
- Dolt SQL Server Not Running: You can’t connect to
dolt sqlor an external client ifdolt sql-serverisn’t running in a separate terminal. The client needs a server to connect to.- Troubleshooting: Ensure you have
dolt sql-serverrunning in one terminal window before attempting to connect from another.
- Troubleshooting: Ensure you have
- Port Conflicts: By default, Dolt (MySQL compatible) runs on port
3306and Doltgres (PostgreSQL compatible) on port5432. If you’re running another MySQL or PostgreSQL server on the same default port, Dolt might fail to start.- Troubleshooting: You can specify a different port using
dolt sql-server --port <your_port_number>. For example,dolt sql-server --port 3307. Then, connect your client to that specified port.
- Troubleshooting: You can specify a different port using
Summary
In this introductory chapter, you’ve taken your first steps into the powerful world of Dolt:
- We explored the critical need for data version control and how traditional databases often fall short in providing comprehensive historical tracking.
- You learned about Dolt’s Git-for-Data paradigm, understanding how familiar concepts like commits, branches, merges, and diffs apply directly to your database schema and data.
- We distinguished between Dolt (MySQL compatible) and Doltgres (PostgreSQL compatible), helping you understand which flavor to choose based on your ecosystem preferences.
- You successfully installed Dolt and set up your local development environment.
- Through hands-on exercises, you initialized your first Dolt database, created tables, inserted data, and, most importantly, committed your changes to create version snapshots.
- You experienced the power of
dolt logfor viewing the complete history of your database anddolt difffor inspecting granular data and schema changes between versions.
You’ve now laid the foundation for treating your data with the same rigorous version control as your code. This paradigm shift will empower you to build more robust, auditable, and collaborative data workflows.
What’s Next? In the next chapter, we’ll dive deeper into Dolt’s advanced versioning capabilities, exploring how to use branching and merging to manage parallel data development, experiment with changes safely, and resolve conflicts, just like you would with source code.
References
- DoltHub Documentation:
https://docs.dolthub.com/ - Dolt Installation Guide:
https://docs.dolthub.com/introduction/installation - Dolt CLI Commands Reference:
https://docs.dolthub.com/cli-reference/cli-commands - Dolt vs. Doltgres Comparison:
https://docs.dolthub.com/introduction/dolt-vs-doltgres
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.