In my work, I have often found myself needing my files split into multiple Git repositories. With time and practice I have discovered an easy way of doing it, and in this blog post I give you a guide and the code for you to do the same for your Git projects.
Written by Alejandro Exojo
2021/09/28
Most of the time, we think of a version control system like Git as a way to keep in the repository the exact state of a directory. There are well known exceptions, like the files that we add in the .gitignore
file. But sometimes there are situations where we might want to do something even fancier: Have some files of a directory in one repository, and others in another repository. This is not only possible, it’s surprisingly easy and doesn’t require complex Git knowledge with figures showing graphs, branches, etc.
And if you thought that it’s not something you’ll ever want, you’ll be surprised. I’ll give you a few examples where such a thing would be useful to me, and I’m certain you’ll be able to extrapolate and think of some more cases that will suit your needs. But feel free to skip to the next section if you are already convinced.
Use cases
Example 1: Git repositories depends on each other
Some time ago I was somewhat involved in the making of deb
packages for Debian, the Linux distribution. One thing that made me suffer often (and I saw others struggle as well, trust me), is that the files for the packaging depend on the project to package itself (because it’s a build that calls another build). But worse, the files for the packaging had to be placed inside the tree of the project to be packaged.
That is a huge nuisance, as the people who make the package and develop the software are almost always disjoint teams who will use different Git repositories. I saw the workflow of some packagers and it involved fiddling a lot with branches and low level commands like git read-tree
and git checkout-index
. It wasn’t easy. Our solution will be much simpler.
Example 2: Keeping per-developer files besides the team-wide ones
You must have noticed that if you open a qmake
or cmake
project with Qt Creator, it will create a file named like the one you just opened with .user
appended to it. You should not add that file to the repository, because, as the name implies, it’s only for the user who opened it (it might not be the same for two separate developers). But still you might want to keep that file safe, because often there are knobs that you need to turn to make a build happen in your computer, or to run the application with some environment variables or something similar.
I myself have also sometimes had a special file for Vim to store settings that would tweak the indentation or other code style settings that would apply to the project. I certainly can’t commit that, as it’s a file specific to me and my Vim setup. But again, if work is put into it, wouldn’t it be good to make sure it’s kept safe?
I’ve also sometimes wanted to keep a plain text file where I’m taking notes about the project. They might be notes to self, questions to ask, or just hours that I need to keep track of in order to report to the customer. This kind of file doesn’t have to be in the same place as the project code, but sometimes I have to admit it’s very convenient not having to change directory…
Example 3: Your UNIX configuration files
About ten years ago I made my very first commit to a Git repository: “A bunch of useful configurations”. This is a very common practice: Your configuration for your shell, your editor, and many other classic UNIX tools is mostly code or can be treated as such, and keeping it on an online repository can help a lot in synchronizing between two workstations, or to setup a new one quickly.
There are many, many, many tools to help in managing your “dotfiles” (as those configuration files are often named), into version control. The one that deserves a special mention because it works in a way very similar to what we are going to show today is vcsh.
But if those tools exist so abundantly it’s because there is a problem to solve. A typical “dotfile” is in the very root of your user directory (your $HOME
), like your ~/.bashrc
. But if you want to check out that file to the right place, it means that you will need to make the very root of your files a Git repository. That is so incredibly problematic for many reasons. For example, it would be like if all your files were in a repository tree, at risk to be destroyed by an accidental git clean
.
Concept recap
There are two paths that are important to Git and with which we are going to fiddle: GIT_DIR
and GIT_WORK_TREE
. Don’t worry if you’ve never seen those spelled out before; the concepts are so intuitive and simple that you’ll probably realize you’ve been thinking of those quite often.
GIT_DIR
is where your repository data is. It’s where Git stores all the data, and it’s in files and file formats that are opaque to you 99% of the time. For example it’s where all the versions of your README files are stored, where your stash is saved, or where the current branch gets registered. By default, when you init or clone a repository it’s your.git
directory.GIT_WORK_TREE
is where the files that you work with are. For example, it’s where the README file resides that your editor or IDE can open for you to edit and change, or where your compiler will find the files. By default it’s the parent directory where the.git
directory is.
But let’s look at an example. Say that you are in a directory with the following path (the ~
means the HOME directory in UNIX, and we will use it often in this text):
~/qt/qtbase/tests/auto/corelib/io/qfile
If you run git status
, Git will start looking for a .git
directory in the current directory, failing, then go recursively up till it finds one in ~/qt/qtbase/.git
. Then it sets the Git directory to it, and the work tree in the parent directory ~/qt/qtbase
.
We rarely need to set those variables ourselves because Git just figures out what we want very nicely. But if you have ever used submodules, you’ll likely have found that when you change directory from the main project to the submodule or vice versa, Git commands apply to a different repository (the “parent” or the “child”). If for some reason you want to operate on the superproject (“parent”) when your current directory is inside the submodule instead (“child”), you tell Git explicitly where you want the tool to work by setting those variables. We’ll look into this in the next section, but I’ve just spoiled 80% of how to achieve the workflow that this article is about.
The solution
As I mentioned in the previous section, and as you probably have figured out, half of the solution is to set these two variables. There is still a bit more to learn to have a polished solution. But first, let’s see it in action.
We will use the second use case example (having some personal notes mixed in with a repository shared with others), where one repository is named “project” and another one is called “notes”. I will assume the project repository exists but the notes one does not, so you’ll be able to follow along if you want to copy and paste the commands. Just use any project you might have handy (we will not make changes to it).
In the snippets below I’m showing a simplified version of my bash prompt. It leverages a shell function that comes with the bash-completion support in the git package, and it prints a branch name (so you’ll notice when you enter and leave a repository), and a character that will say something about the status of the repository. Nevertheless, I run git status
several times to show in which state we are in to clarify things, so feel free to ignore everything before the dollar sign. But the fact that the shell prompt changes after the commands shows that things are working as expected, and that commands are changing the repository, or that we are changing of repository.
First we create some file for private notes in the shared repository and create the second repository for them:
~$ cd project
~/project master=$ echo "Some private notes" > my-notes.txt
~/project master %=$ git status
On branch master
Your branch is up to date with 'origin/master'.
Untracked files:
(use "git add ..." to include in what will be committed)
my-notes.txt
nothing added to commit but untracked files present (use "git add" to track)
~/project master %=$ mkdir ../notes
~/project master %=$ cd ../notes
~/notes$ git init
Initialized empty Git repository in /home/alex/notes/.git/
~/notes master #$ cd -
/home/alex/project
~/project master %=$
Now we are ready to pull the trick that we were waiting for: Setting GIT_DIR
to the location of the repository for the notes. This will make all Git operations happen in the repository in ~/notes/.git
instead of ~/project/.git
, even when we are in ~/project
. This gets us half way through the state that we want though, so bear with me. After setting GIT_DIR
we can add the file and make a commit.
~/project master %=$ export GIT_DIR=~/notes/.git
~/project master #%$ git add my-notes.txt
~/project master +%$ git commit -m "Adding my notes"
[master (root-commit) e0a5132] Adding my notes
1 file changed, 1 insertion(+)
create mode 100644 my-notes.txt
~/project master %$ git status
On branch master
Untracked files:
(use "git add ..." to include in what will be committed)
.clang-format
.clang-tidy
.gitignore
CHANGELOG.md
LICENSE.md
README.md
docs/
src/
test/
nothing added to commit but untracked files present (use "git add" to track)
We have the inconvenience of some files showing up as not tracked in the repository of the notes. This might be acceptable to some people, but we will improve it later. First we will see what the main repository looks like now. If before we switched Git repositories by setting the variable, we will unset it this time so it gets found automatically:
~/project master %$ unset GIT_DIR
~/project master %=$ git status
On branch master
Your branch is up to date with 'origin/master'.
Untracked files:
(use "git add ..." to include in what will be committed)
my-notes.txt
nothing added to commit but untracked files present (use "git add" to track)
Again, untracked files. This is more of a problem, as we surely want to avoid accidentally adding and pushing the private file to the shared repository, so let’s fix it.
Ignoring files in Git repositories
You’ll probably be well aquianted with the .gitignore
file. A file that most Git repositories have to avoid having untracked files showing up in git status
, and avoid adding them accidentally. The problem is that such file is (for most Git repositories) shared in the repository, so you probably don’t want to let everyone in the world or in your team to know (or be bothered with) your certain file in your disk.
Not a problem though, because when ignoring files Git also looks at a file which is private to each copy of the repository: $GIT_DIR/info/exclude
. And remember that if GIT_DIR
is unset, just look for the .git
directory in the top of the repository (or ask Git with git rev-parse --git-dir
or git
).
rev-parse --absolute-git-dir
In our example, I’ll just type .git
because we are in the root of the files of the repository.
~/project master %=$ echo 'my-notes.txt' >> .git/info/exclude
~/project master=$ git st
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
Great! Now we have this main repository sorted out. Let’s work on the repository where we keep the notes. In this case we need to ignore a lot more instead of only one file, but it’s easy to ignore everything by default using an asterisk:
~/project master %$ export GIT_DIR=~/notes/.git
~/project master %$ echo '/*' >> $GIT_DIR/info/exclude
~/project master$ git st
On branch master
nothing to commit, working tree clean
Now we are ignoring everything, including the file we just added. This might seem like a problem when changing the file but it’s not, as it will only prevent the file from being added, but Git still will know that the file is tracked. So even if it’s ignored it will be able to operate on it normally. If we want to add another file though, we can still do it without undoing what we did. Either git add
will ask us to pass a flag to force the new file being added, or we can add an exception to the ignored files by using a special syntax to add a negated pattern:
~/project master$ echo "other notes" > other-notes.txt
~/project master$ git st
On branch master
nothing to commit, working tree clean
~/project master$ git add other-notes.txt
The following paths are ignored by one of your .gitignore files:
other-notes.txt
Use -f if you really want to add them.
~/project master$ echo '!other-notes.txt' >> $GIT_DIR/info/exclude
~/project master %$ git add other-notes.txt
~/project master +$ git commit -m "add other notes"
[master 191d6b4] add other notes
1 file changed, 1 insertion(+)
create mode 100644 other-notes.txt
The final touch
There is one last thing to polish right now. See what happens if we change directory:
~/project master$ cd src/
~/project/src master *$ git st
On branch master
Changes not staged for commit:
(use "git add/rm ..." to update what will be committed)
(use "git restore ..." to discard changes in working directory)
deleted: my-notes.txt
deleted: other-notes.txt
no changes added to commit (use "git add" and/or "git commit -a")
Now Git can’t find the files because it’s looking at the current directory to find out the root directory of the working copy (where the files reside, remember?). We want to move around the tree, but we can tell Git to find the tree at the place where we have it by using the other variable/concept that we introduced at the start:
~/project/src master *$ export GIT_WORK_TREE=~/project
~/project/src master$ git st
On branch master
nothing to commit, working tree clean
Done!
And remember, you just need to set or unset these two variables when you want to change the “context” to a different repository. If it’s too tedious to use, give this simple shell function a shot (add to your .bashrc
, .zshrc
, etc.):
altrepo()
{
if [ $# -gt 0 ]; then
# Make absolute the path in the first parameter.
path=`(cd "$1" ; pwd)`
export GIT_DIR="$path"/.git
export GIT_WORK_TREE=$PWD
else
unset GIT_DIR
unset GIT_WORK_TREE
fi
}
Once it’s set in your configuration and you reload the config (or start a new shell), call altrepo /some/path
to start using a repository in that path, and altrepo
with no arguments to stop. But beware, the script is very simple and does not do any error checking for the parameters.
Wrap up and further reading
We’ve seen that we can do nifty tricks with Git and that maybe we can do even more than we thought was possible, and we’ve seen some concepts that are useful if we want to learn other more common tools that come with Git, like git worktree
, which has some similarities with what we have done.
With git worktree
we can have the same repository checked out in two places or more, without making clones of it. This would mean we save a lot of space, have a common stash to pass changes from copy to copy, same configuration, etc. This tool is useful to, for example, be able to run side by side by two versions of an application at different versions and check for regressions (or just to work in parallel on two tasks when one is waiting in code review, etc.). Work trees have a kind of “pointer file” instead of the usual .git
directory, and the file contains the location where the actual .git
directory can be found. We could have used one of those “pointers” exactly the same way instead of the environment variable.
One last thing: If we switch to the repository where the notes are and we don’t have any variable set, it will tell us that the notes are deleted. Why? Because we added the notes files to the repository, but they are stored in the working copy of the other one! You can either ignore that if you want to work on them on the ~/project
directory. Or you can check them out to have two working copies of those files. That would be a bit similar to what you can do with git worktree
. Alternatively, you could make the repository with the notes a bare repository. That is just a repository without a working copy. But all of this is just left as exercise to the reader.