Second Edition (2006-12-03)
Copyright © 2006 Ludovic Brenta ludovic@ludovic-brenta.org
This document is free; you can redistribute it and modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 or, at your option, any later version.
This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this document; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston MA 02110-1301, USA.
I will evaluate each distributed version control system against the same scenario, which involves a “vendor branch”. The scenario goes like this (time grows to the right):
1 ------------ 2 ------------*- 3 ----------------> Vendor branch \ \ / \ --- 1 -------- 2 ------- 3 ---- * ----- 4 ------> Private branch |
In the diagram above, the asterisks denote merges. Along the way, I will want to review changes before committing them to my private branch or sending them upstream; similarly, the upstream author will want to review the changes I send before merging them into their repo.
In order to review upstream changes (e.g. from vendor 2 to vendor 3 in the above diagram), I'll want to either import successive tags made upstream, or tag each import myself. In the latter case, it is better if my local tags stay local; I don't want to have my working tags clutter upstream's repository.
By way of illustration, I will show this scenario in CVS and Meta-CVS as well as in the distributed version control systems considered. Both CVS and Meta-CVS handle the scenario quite well; why then would I look at distributed version control systems?
For each of the distributed version control systems under consideration, I assume that “upstream” uses that system, too. Therefore, I will no longer send patches by email, but rather “push” my changes with the DVCS. Similarly, I will no longer “import” from upstream, but “pull” from their repository.
The commands in the scenarios below are sometimes a bit long; I have deliberately used the long versions of all commands and option names, for the sake of readability. In practice, all the systems considered have shorthand versions of command and option names; for example, `co' stands for `checkout' in all systems that have a checkout command.
After the scenario, I evaluate each system's storage efficiency by creating a repository containing a few selected revisions of GCC's sources. The database will look like this:
r76006 -- gcc-3_4-branch --+ | | | r80844 gcc_3_4_0_release | | | r83996 gcc_3_4_1_release | | | r87129 gcc_3_4_2_release | | | r90112 gcc_3_4_3_release | | | r99965 gcc_3_4_4_release | | | r107795 gcc_3_4_5_release | r95538 -- gcc-4_0-branch --+ | | | r98492 gcc_4_0_0_release | | | r101728 gcc_4_0_1_release | | | r104510 gcc_4_0_2_release | r107209 -- gcc-4_1-branch --+ | | | r111560 gcc_4_1_0 release |
The repository will therefore contain four branches (trunk, 3.4, 4.0 and 4.1), one revision and one tag for each official release of GCC since 3.4, plus one tag for each branch point. The first revision will correspond to revision 76006 in the upstream Subversion repository; this is the branch point for the 3.4 branch.
The reason why storage efficiency is important is because I have a 20-Gb hard disk on my laptop, and no more than 8 Gb free at any one time. Also more storage means more network bandwidth required, and heavier backup requirements. Subversion is particularly bad in this respect, since the size of the repository containing all revisions and branches exploded from 3.5 Gb to 8.5 Gb when GCC switched from CVS to Subversion.
If you would like to compare the repository size in your favourite version control systems, please contact me; I may able to provide the repository in one of the already-tested version control systems.
One repository per machine, shared by all branches. The repository contains many modules. Working copies are separate and contain one module each. This is good.
One file in the repository per file handled. The repository's directory structure matches that of the working copies.
Tags and branches are expensive: they require writing to every file in the repository.
Commits are not atomic.
Average built-in documentation.
Does not keep track of merges; I must do that manually using the expensive tags.
Does not handle file moves and renames.
Does not handle binary files well.
Not distributed: we must send changes by mail, in the form of patches. But, possible to use CVSup to sync repositories.
The scenario requires one repository, one working copy, and one temporary working copy to import from. The repository contains two branches: the vendor branch and the trunk.
(create the working copy to import from)
$ cd upstream; cvs import -d the-module upstream UPSTREAM_1; cd ..
$ rm -rf upstream
$ cvs checkout -d private the-module; cd private
(edit)
$ cvs commit -m "private 1"
(recreate the working copy to import from)
$ cd ../upstream; cvs import -d the-module upstream UPSTREAM_2; cd ..
$ rm -rf upstream
(review upstream changes:)
$ cd private; cvs diff -u -r UPSTREAM_1 -r UPSTREAM_2 | less
(merge from upstream:)
$ cvs update -j UPSTREAM_1 -j UPSTREAM_2
(resolve conflicts)
$ cvs commit -m "private 2: merge from UPSTREAM_2"
(edit)
$ cvs commit -m "private 3"
(review my changes:)
$ cvs diff -Nu -r upstream | less
(send a patch upstream and let them merge manually into their repo:)
$ cvs diff -Nu -r upstream | mail maintainer@upstream.org
(recreate the working copy to import from)
$ cd ../upstream; cvs import -d the-module upstream UPSTREAM_3
(review upstream changes:)
$ cvs diff -u -r UPSTREAM_2 -r UPSTREAM_3 | less
(merge from upstream:)
$ cd ../private; cvs update -j UPSTREAM_2 -j UPSTREAM_3
(resolve conflicts)
(edit)
$ cvs commit -m "private 4: merged from UPSTREAM_3, and more edits."
I have not tried CVS itself, but the storage is very similar to that of Meta-CVS; see below.
One repository per machine, shared by all branches. The repository contains many modules. Working copies are separate and contain one module each. This is good.
Tags and branches are expensive: they require writing to every file in the repository.
One file in the repository per file handled, in one flat directory per module in the repository, plus one file called MAP describing the directory structure in the working copy. This meta-file is also versioned. The files in the repository have mangled names which do no change even when moving or renaming working files. Also, there is no Attic as in CVS; removed files are just removed from the MAP meta-file.
The working copy contains a directory called MCVS, where all the files with mangled names, and the working MAP file, reside. Then there are hard links from these mangled files to the demangled file names. The MCVS directory is a working copy from the underlying CVS repository, so it contains a CVS subdirectory. Meta-CVS re-syncs the files whevever hard links are broken. This is good.
Handles file moves and renames, implemented as simple changes to the MAP meta-file; also during imports performed with `mcvs grab'.
`mcvs grab' requires committing and tagging in two separate steps for a total of 3, compared to 1 in CVS. This is bad.
Commits are not atomic.
Does not handle binary files well.
Excellent built-in documentation.
Keeps track of merges, using a pair of tags per branch automatically.
Not distributed: we must send changes by mail, in the form of patches. But, possible to use CVSup to sync repositories.
Glitch: we must pipe the output of most commands through `mcvs filter' to demangle the file names.
The scenario requires one repository, one working copy, and one temporary working copy to import from. The repository contains the trunk and the vendor branch.
(create the working copy to import from)
$ cd upstream; mcvs grab -r upstream the-module mcvs tag UPSTREAM_1
$ cd ..; rm -rf upstream
$ mcvs checkout -d private the-module; cd private
(edit)
$ mcvs commit -m "private 1"
(recreate the working copy to import from)
$ cd ../upstream; mcvs grab -r upstream the-module
$ mcvs commit -m "imported upstream 2"
$ mcvs tag UPSTREAM_2
$ cd ..; rm -rf upstream
(review upstream changes:)
$ cd ../private
$ mcvs diff -N -r UPSTREAM_1 -r UPSTREAM_2 | mcvs filt | less
(merge from upstream:)
$ mcvs merge upstream
(resolve conflicts)
$ mcvs commit -m "private 2: merge from upstream 2"
(edit)
$ mcvs commit -m "private 3"
(review my changes:)
$ mcvs diff -Nu -r upstream | mcvs filt | less
(send a patch upstream and let them merge into their repo:)
$ mcvs diff -Nu -r upstream | mcvs filt | mail maintainer@upstream.org
(recreate the working copy to import from)
$ cd ../upstream; mcvs grab -r upstream the-module
$ mcvs commit -m "imported upstream 3"
$ mcvs tag UPSTREAM_3
$ cd ..; rm -rf upstream
(review upstream changes:)
$ cd ../private
$ mcvs diff -N -r UPSTREAM_2 -r UPSTREAM_3 | mcvs filt | less
$ mcvs merge upstream
(resolve conflicts)
(edit)
$ mcvs commit -m "private 4: merged from upstream 3, and more edits."
The module in the repository takes 474 megabytes. Each tag and branch adds approximately 10 megabytes to that, because tags and branches require adding some information to all files in the module.
Moreover, it takes approximately one hour to add one tag.
Time to perform a checkout is 15 minutes.
This chapter covers Bazaar-NG 0.11.
Starting with version 0.8, we can set up a “Shared Repository”, say
/var/lib/bzr, to contain history information for all branches.
Then we can perform “Lightweight checkouts” into other directories,
say ~/src/upstream and ~/src/private, much like with
other version control systems. Thus, the concept of “one working
copy = one branch = one repository” no longer applies.
No tags. This is bad. However the revision selector syntax makes it possible to select the common ancestor revision, so the need for tags is greatly reduced.
The storage format has changed since 0.7, and will probably change again in the future.
Serves repositories over HTTP, FTP, NFS, rsync, etc. This is good.
Still unacceptably slow, despite the huge progress made since 0.7. For example, commit is very slow. The larger the file, the longer it takes to commit changes to it (large files in GCC include, among others, Makefile.in which is over a megabyte, and several ChangeLog files, and committing Makefile.in alone takes roughly 30 seconds on my reference machine). See also the performance measurement for a lightweight checkout, below.
Excellent built-in documentation.
The scenario requires two local branches: upstream and private, in the repository, and two lightweight checkouts.
Upstream cannot easily review my patches if I just push them to their branch; they need a special incoming branch where I can push.
(create the repository:)
$ mkdir /var/lib/bzr
$ bzr init-repository /var/lib/bzr
$ bzr branch http://bazaar.upstream.org/upstream /var/lib/bzr/upstream
$ bzr checkout --lightweight /var/lib/bzr/upstream
$ bzr branch /var/lib/bzr/upstream /var/lib/bzr/private
$ bzr checkout --lightweight /var/lib/bzr/private
$ cd private
(edit)
$ bzr commit -m "private 1"
$ cd ../upstream; bzr pull
(review upstream's changes:)
$ cd ../private; bzr diff -r ancestor:../upstream..branch:../upstream
$ bzr merge
(resolve conflicts, then tell Bazaar-NG about resolved conflicts:)
$ bzr resolved $(bzr conflicts)
$ bzr commit -m "private 2: merge from upstream 2"
(edit)
$ bzr commit -m "private 3"
(review my changes:)
$ bzr diff -r ancestor:../upstream
(push my changes upstream:)
$ bzr push --remember http://bazaar.upstream.org/incoming
(get upstream 3 into my copy of upstream:)
$ cd ../upstream; bzr pull
(merge upstream 3 into private:)
$ cd ../private; bzr merge
(resolve conflicts, then tell Bazaar-NG about resolved conflicts:)
$ bzr resolved $(bzr conflicts)
(edit)
$ bzr commit -m "private 4: merged from upstream, and more edits."
The repository uses 405 megabytes. It contains one knit file and one index file per file under version control. The knit file is compressed using zlib and contains the entire history for the file.
Doing a lightweight checkout of GCC 3.4.2 takes a 13 minutes. The checkout time increases slowly with the size of the checkout (e.g. 14'13 for GCC 4.0.1, which is larger than 3.4.2).
One working copy = one branch = one repository; this is bad.
No concept of modules.
Tags are cheap. Tags can be local to a branch-repository, or global, i.e. replicated to other repositories by pulling or pushing. The default is global.
Cannot select the common ancestor of two revisions of a file on two branches.
Identifies changesets by their SHA1; records the ancestor of each changeset.
Efficient, append-only, compressed delta storage mechanism. Two files, data and index, for each managed file. Very fast, and the repository format is stable. When cloning a branch to another branch on the same filesystem, Mercurial uses hard links, thus sharing storage and making local cloning very cheap. A commit to any file breaks the hard links for the two repository files, thereby duplicating storage only for changed files.
Serves repositories over HTTP, FTP, NFS, rsync, etc. Built-in HTTP server to browse a repository graphically. This is good.
Very interesting patch manager, called “mq”, that manages a stack of patches (GNU diff format) applied on top of a baseline. The patches are under version control.
Average built-in documentation, on par with CVS but not better IMHO. Compensated for by the excellent online documentation.
Does not handle moves and renames well.
Requires an SSH account on remote repositories when pushing.
Does not support diff between branches; I have to resort to diff
-x .hg. This is very bad.
Upstream will want to set up an incoming branch where I can push, so they can review my changes before merging them into the upstream branch.
The scenario requires two local working copies/repositories/branches.
$ hg clone http://hg.upstream.org/upstream upstream
$ cd upstream; hg tag --local UPSTREAM_1
$ cd ..; hg clone upstream private; cd private
(edit)
$ hg commit -m "private 1"
$ cd ../upstream; hg pull; hg tag --local UPSTREAM_2
(review upstream changes:)
$ hg diff -r UPSTREAM -r UPSTREAM_2
$ cd ../private; hg pull; hg update --merge
(resolve conflicts)
$ hg commit -m "private 2: merge from upstream 2"
(edit)
$ hg commit -m "private 3"
(review my changes:)
$ diff -rNu -x .hg ../upstream . | less
(push my changes upstream:)
$ hg push http://hg.upstream.org/incoming
(get upstream 3 into my copy of upstream:)
$ cd ../upstream; hg pull
(merge upstream 3 into private:)
$ cd ../private; hg pull; hg update --merge
(resolve conflicts, then remove .orig files)
$ find . -name \*.orig -exec rm -i {} \;
(edit)
$ hg commit -m "private 4: merged from upstream, and more edits."
As explained, an initial clone of a Mercurial branch is very cheap, as it uses only hard links; but GCC exposes the problem that arises when many files diverge between branches, even if the divergence is small.
The four branches (four repositories) together use 686 megabytes of stoage; this is 45% more than Meta-CVS.
Checkout time is however much faster, thanks to the delta-compressed storage: 5 minutes.
Tags are cheap in terms both of space and time: 27 seconds, and negligible space. A tag is just one line in the .hgtags file, consisting of the SHA1 of the tagged revision, and the name of the tag.
One repository (called a “database”) contains many branches. Working copies are separate; this is good.
No concept of modules; branches are modules.
Branches must have globally unique names. By convention, they use names similar to Java package names, but there is no hard requirement. The conventional dots in branch names have no special meaning.
Serves selected branches over an efficient, special-purpose protocol called netsync; uses up one TCP port per repository served. Access to the database is concurrent, but has not yet been stress-tested for large repositories with many branches and connections. It is also not currently possible to host repositories on general-purpose HTTP servers, nor on SourceForge, Tigris, Alioth, Berlios or the like.
There is a contributed tool, “usher”, that allows serving several databases on one port number. Usher comes bundled with the Monotone source tarballs, and as source in the monotone package for Debian.
The repository is a single file, containing an SQLite database. This is good for backups, and SQLite has a command-line tool that allows massaging the database in case of emergencies (I know SQL, so this doesn't scare me).
The storage format (database schema) is not very stable, and requires dumping and restoring in case of change. Such changes have happened several times already. This is bad, but will get better as Monotone matures.
Tags and branches are cheap. This is good.
No local tags, i.e. upstream will see all of my tags after I push.
Branching can only happen after making local changes; this is counter-intuitive to me and error-prone; it is all too easy to make changes and commit to the upstream branch, forgetting to create the new branch. To alleviate this, you can just edit _MTN/options in the working copy and change the branch before you make any changes.
Cannot select the common ancestor of two revisions of a file on two branches (see Bug #18302 on Savannah), so manual tagging is sometimes required.
Requires all tags and commits to be signed by a RSA key; thus all changes are authenticated. The key is special to Monotone; this is good because it allows one to have many keys if desired, and does not use potentally sensitive keys for development. By convention, key names are email addresses, but this can change.
Requires an external editor that supports 3-way merge, such as emacs, to resolve conflicts. This is good because you cannot forget a conflict. When that editor exits, commits to the database immediately; there is no way to do more changes before the commit. This is bad. I would prefer it if, at least optionally, Monotone would allow additional edits before committing.
Merging is a database-only operation; it does not require a working copy and does not use the working copy you're in, if any. As a consequence, you always commit before merging. If you come from other version control systems, this model may be counter-intuitive, but it is actually very good as it always preserves your unmerged sources in the database. To illustrate why this is so good, here is an excerpt from a post by Brian May on comp.lang.ada:
Occasionally at my previous job (they used subversion) it was almost like a race to commit my changes first so I wouldn't have to deal with the conflicts. Didn't always work as planned though, as often the second person would accidently revert my changes by doing an update operation with the file still open in the editor. "Why did you revert my changes? The bug I fixed came back again. Did I break something?" "I didn't revert your change!" "Yes you did, in revision XYZ!"
Average built-in documentation, on par with CVS but not better. Compensated for by the excellent info manual.
Has built-in import commands from RCS and CVS that preserve history.
Has a built-in list of common file extensions to ignore, and ignores them by default. However, the GCC sources contain a number of .a files which are not archives; they are Ada source files that are part of the test suite. In the GCC scenario, I've had to override the built-in default. Thanks to the excellent documentation, it took me less than five minutes, even though I don't know Lua, the scripting language used to customise Monotone.
Notable changes between 0.24 and 0.26:
The scenario only involves just one database and one working copy of the private branch. This is very good.
The upstream branch is named org.upstream, and the private branch is named org.upstream-private.
The first part of the scenario is one-time only; it consists in creating a local database and key pair, and sending the public key to upstream.
The scenario for Monotone 0.26 is exactly the same but `monotone' becomes `mtn'.
(create a local repository:)
$ monotone db init --db=/var/lib/monotone/local.db
(create a key pair:)
$ monotone genkey my.name@my.domain
(send my public key upstream so they accept my patches:)
$ monotone pubkey my.name@my.domain | mail maintainer@upstream.org
Meanwhile, upstream also creates a database and keypair, but also accepts my public key and grants me read and write permission on the `org.upstream' branch. Here is how to do this:
$ monotone db init --db=/var/lib/upstream.db
$ monotone genkey maintainer@upstream.org
(create a new branch and wrking copy:)
$ monotone setup --db=/var/lib/upstream.db --branch=org.upstream upstream
(enter passphrase)
$ cd org.upstream
(create, add and commit files)
(grant everyone read permission to that branch only:)
$ cat > ~/.monotone/read_permissions
pattern "org.upstream"
allow "*"
^D
(grant write permission to everyone, provided we have their public
key in our keystore:)
$ cat > ~/.monotone/write_permissions
*
^D
(place the contributor's public key in ~/.monotone/keys)
(serve the org.upstream branch:)
$ monotone serve --db=/var/lib/upstream.db org.upstream
(enter passphrase)
Now, back on my machine, I can start pulling from upstream's repository, and make commits to my local database.
$ monotone --db=/var/lib/monotone/local.db \
pull server.upstream.org org.upstream
$ monotone --db=/var/lib/monotone/local.db \
checkout --branch=org.upstream org.upstream-private
$ cd org.upstream-private; monotone tag h: UPSTREAM_1
(enter passphrase)
(edit)
(commit to a new branch:)
$ monotone commit -m "private 1" --branch=org.upstream-private
(enter passphrase)
$ monotone pull; monotone tag h:org.upstream UPSTREAM_2
(enter passphrase)
(review upstream changes:)
$ monotone diff -r UPSTREAM_1 -r UPSTREAM_2 | less
(merge into the private branch:)
$ EDITOR=emacs monotone propagate org.upstream org.upstream-private
(resolve conflicts)
(exit from emacs)
(enter passphrase)
(edit)
(review my changes:)
$ monotone diff -r h:org.upstream | less
$ monotone commit -m "private 3"
(enter passphrase)
(merge to my local copy of the vendor branch:)
$ monotone propagate org.upstream-private org.upstream
(enter passphrase)
(push and pull at the same time, tag:)
$ monotone sync; monotone tag h:org.upstream UPSTREAM_3
(enter passphrase)
$ monotone diff -r UPSTREAM_2 -r UPSTREAM_3 | less
(merge into the private branch:)
$ EDITOR=emacs monotone propagate org.upstream org.upstream-private
(enter passphrase)
(edit)
$ monotone commit -m "private 4"
(enter passphrase)
The database, initially empty, takes a mere 183 megabytes, or 61% less than Meta-CVS. This is very, very good indeed. A large part of the savings are the result of Monotone compressing all files and changesets using gzip.
Tagging takes 10 seconds: slower than Mercurial, but constant-time.
It takes 3.5 minutes to do a fresh checkout of GCC 3.4.2 (in the middle of a branch), and then less than 2 minutes to update to the tip of the branch. Here again, Monotone beats Meta-CVS hands down.
The storage format changed in 0.26.
The database takes 166 megabytes, or 65% less than Meta-CVS. This is very good. Due to a memory leak introduced in 0.25 but which became serious only in 0.26, it takes 6 minutes to check out from the database. During that time, Monotone eats 190 Mb of RAM and uses all available CPU. See Bug #16601 on Savannah.
The storage format changed again in 0.30 but the changes only affect some cache structures; all versions above 0.26 are netsync-compatible with one another, i.e. they can replicate databases between them.
The database now takes 160 Mb after conversion from the 0.26 format. Because 0.30 fixes the memory leak mentioned above, the checkout time drops to 2 minutes: even better than 0.24 which was already the best of all tested version control systems.
Cogito, by Petr Baudis and others, is a distributed version built on top of GIT, Linus Torvalds' tree history storage system. My thanks to Petr Baudis for this chapter; Petr sent me a long and detailed email with the full scenario below, and I was intrigued enough to fill in the missing parts. What follows applies to cogito 0.18.1 and git 1.4.3.3.
A working copy contains a repository, like in Mercurial or Bazaar-NG. This is bad, I like my repositories to be separate. In particular, this means that there can be only one working copy per repository (the working copy is the repository's parent directory). It is possibe, like in Bazaar-NG, to create a repository without a working copy, but then, unlike in Bazaar-NG, every working copy will also contain a full repository. Cloning a working copy+repository on a single filesystem uses hard links where possible, so clones are “cheap”. Git does no have the same caveat as in Mercurial: any changes to a file in the repository would not break the link and destroy the benefits of shared storage; instead it will add new files to the repository.
A repository can contain many branches; some branches are flagged as being mirrors of remote branches. Each branch remembers which remote branch it is a mirror of. A repository can contain mirrored branches from many other repositories. This is good. Of course, it is possible to "switch" the working copy from a branch to another.
By default, when you clone a repository you get two branches: “origin” is a mirror of the cloned branch of the upstream repository, and “master” is the private branch where you commit. Thus, Cogito supports my workflow directly by default. When you push the upstream branch, Cogito makes sure that the head of “master” is a direct descendant of the head of “origin”, i.e. that no divergence has taken place.
Cogito can push and pull over many protocols: local files, HTTP, rsync, SSH, etc. This is good.
Supports tags, branches, and intelligent merge. This is good. By default, tags are local, which means they are not replicated when pushing or pulling repositories.
Can select the common ancestor of two revisions on different branches. This is very good, as it reduces the need for tagging in the first place.
When merging, if Cogito detects conflicts, it refrains from committing but instead places conflict markers in the files, just like CVS. This is bad, because you may forget to resolve a conflict and commit.
Git, the underlying storage engine, is a “content-addressed filesystem” where each revision (or delta) of each file, as well as each revision of the tree structure, is in a file whose name is the SHA-1 sum of the contents. This is similar to all the other systems, and guarantees integrity. Git allows to crypto-sign tags, too, thereby providing authentication. The authorisation mechanism relies on Unix file permissions in the repository.
By default, git stores each new revision of every file in a separate file in the repository. This is very inefficient. For this reason, there is a concept of “packing” a repository, which replaces all these objects with one single, large “pack” object containing all file data and all deltas. Packing is very efficient, because git can calculate arbitrary deltas i.e. between any versions of any files. Most other systems are restricted to forward or reverse diffs of a single file.
Thanks to Petr Baudis for this section. I have edited Petr's scenario to remove the use of tagging, which is unnecessary with a system that keeps track of branches and merges.
Note that sometimes you might find something is ommitted; e.g.
commit is missing -m or there's no | less; that's
intentional because less will be called automagically or the
commit message will be already prefilled (you'll still get to edit it
if you wish).
$ cg clone http://git.upstream.org/upstream
$ cd upstream
This creates a working copy and a new repository containing two branches:
origin which is equivalent to your upstream and always
only mirrors exactly what is in the upstream repository when you last
fetched ("pulled" in hg/mtn language) from it.
master which is equivalent to your private branch, it is
the branch that you have checked out and you commit to it
(edit)
$ cg commit -m "private 1"
(fetch from upstream:)
$ cg fetch
(review changes in origin since its last merge to master:)
$ cg diff -m
(merge into the master branch:)
$ cg merge
(resolve conflicts)
If there were no conflicts, cg merge will automatically do a
commit. If there were no local changes in your master branch
relative to the origin branch, there will be actually no commit
at all.
$ cg commit
(edit)
$ cg commit -m "private 3"
(review my changes:)
$ cg diff -r origin
(push my changes upstream; Cogito remembers that 'origin' is
associated with http://git.upstream.org/upstream:)
$ cg push
(get upstream 3 and merge it to my private branch at the same
time; this is basically equiv. to cg-fetch && cg-merge:)
$ cg update
The repository, not counting the working copy, takes 409 megabytes
after initial creation. After running git repack -a -d as
recommended by Petr, which takes about half an hour, the repository
takes 81 megabytes.
If not packed, the repository eats inodes like crazy: one per commit per file, plus one per commit to the directory structure (add, remove, rename files or directories), plus one per branch, plus one per tag.
Git compresses all files using zlib, like Mercurial and Monotone.
Out of the four distributed version control systems, two fully support my current workflow: Monotone and Cogito. They also fully support the efficient storage afforded by multiple branches in a repository.
Bazaar-NG lacks tags, so it does not allow me to review past upstream changes between arbitrary points.
Mercurial lacks the ability to diff between branches. Also, it does not allow several branches per repository. The hard-linking trick reduces this problem only at a coarse level; in projects such as GCC, almost all files change, but share a long common history on the trunk; Mercurial's storage in this case is less efficient.
Git lacks the ability to have multiple working copies from the same repository (each copy would get a copy of the repo). Copies are cheap (using hard links) only if on the same filesystem. However Git has the best storage efficiency around, if you use packing.
Since July 2006 I have switched from Meta-CVS to Monotone 0.24, then 0.28 for my personal projects. I'd be reluctant to change to a system that doesn't use the commit-before-merge model; currently Monotone is the only system that uses that model.
| Version Control System | Size | tags | branches | checkout | integ | sign | auth
|
| CVS | 474 | expensive | yes | 15' | no | no | no
|
| Meta-CVS | 474 | expensive | yes | 15' | no | no | no
|
| Bazaar-NG 0.11 | 405 | no | yes | 13' | no | yes | no
|
| Mercurial 0.9 | 686 | cheap, local | no | no | yes | no | no
|
| Monotone 0.24-0.25 | 183 | cheap | yes | 2.5' | yes | yes | yes
|
| Monotone 0.26-0.29 | 165 | cheap | yes | 6' | yes | yes | yes
|
| Monotone 0.30-0.31 | 160 | cheap | yes | 2' | yes | yes | yes
|
| Git 1.4.3.3 (unpacked) | 409 | cheap, local | yes | no | yes | yes | no
|
| Git 1.4.3.3 (packed) | 81 | cheap, local | yes | no | yes | yes | no
|