Using a plain Linux server as a git host
Companies looking to train their LLM are increasingly taking advantage of publicly hosted code to get their training material. Some developers (myself included) are protesting this movement by withdrawing from public code repositories such as GitHub and GitLab.
On most of these public repositories, you can make your repositories private, but if the repository owner decides to scrape your code, all bets are off.
In this post, I'll be describing how you can use your Linux server to host git repositories. This is a fairly basic setup with just SSH access (similar to how it works on GitHub), but no web-based interfaces for managing projects, pull requests, etc. The setup I'll describe will be usable for collaborative projects, though.
The whole set-up costs a few USD a month to run. For some people this is still prohibitively expensive, but it is what it is. Since it's a generic VPS, you can also host other things besides the git repos on the same machine, like, for instance, your blog, small toy apps, etc.
Note that all commands in this post should be executed with root permissions
unless otherwise noted.
Prerequisites
Other than SSH, you will need git. You'll also need to know a bit about
Linux and SSH, as I won't be covering those here.
Create a git user
To avoid issues with file/folder permissions, the simplest way to get going is
to create a separate user for git access. I won't be using this user to log into
the machine. I only want to be able to run git commands. Therefore, I'll change
the shell to /usr/bin/git-shell which comes with the git package.
useradd -m -s /usr/bin/git-shell git
For git-shell to work properly, I also need to create a folder called
git-shell-commands within the git user's home directory. This folder should
be accessible to the newly created git user.
cd /home/git
mkdir git-shell-commands
chown git.git git-shell-commands
Since I don't want an interactive shell, I can disable this feature by using
the git-shell-commands/no-interactive-login script with the following
contents:
#!/bin/bash
echo Hello there. You can look, but don't touch
To test this setup, I use the following command:
su git
It should bounce me right back into the current shell and echo the message as per the script.
Create a folder for git repos
Although I could technically create a folder for git repos under /home/git, I
want to be able to also work with the repos from other user accounts on the
machine (e.g., check it out to serve the contents). Therefore, I'll create the
folder in /git (in the root). Where exactly it's located isn't that
important. You could create it in /var/git or wherever you prefer.
The folder should have permissions such that git user can access its contents.
mkdir /git
chown git.git /git
chmod 770 /git
I've restirected access to the git user and git group. If I want to give
another user acccess to this folder, I will add them to the git group.
Creating repositories
To create a repository I ideally want to use the git user. However, this use
has no interactive login shell. Therefore, I will use a su command and
temporarily override the shell:
su git -s /bin/bash
Shared repositories are created similarly to the normal ones, but with one difference. I create so-called 'bare' repos, which only contains the git commit history and no physcial files. I can still re-create the physical files from them, but it savesa bit of storage if I'm only using these repos as a hub for communication.
git init --bare --shared /git/my-project
I can combine these commands into a one-liner and save it as bash function in
the regular user's .bashrc:
function mkshrepo {
sudo su git -s /bin/bash -c "git init --bare --shared /git/$1"
}
Giving users access
To give users access, I simply add their SSH public key to the authorized_keys
in /home/git/.ssh.
To clone a repository, I would do it as usual:
git clone git@<IP_ADDRESS>:/git/my-project
Though I don't typically like typing IP addresses (what if I move to another host?), so I create an alias in the SSH config:
host myserver
Hostname <IP_ADDRESS>
Now I can:
git clone git@myserver:/git/my-project