Best Practices for Version Control

Source code version control systems have been around for decades, but sometimes I suspect people are using them just because everybody else is, or because their manager told them to do so, or because it’s company policy. Although most people will agree that using version control is a prerequisite for any serious software project, many programmers only utilize a small percentage of the possibilities and advantages such systems can provide.

Here are some of my thoughts on what I consider best practices for using version control systems. In short, they can be described with seven basic sentences:

  1. Put everything under version control.
  2. Create sandbox home folders.
  3. Use a common project structure and naming convention.
  4. Commit often and in logical chunks.
  5. Write meaningful commit messages.
  6. Do all file operations in the version control system.
  7. Set up change notifications.

These recommendations are based on my own experience and preferences with using CVS and Subversion over the years, but the principles should easily transfer to other systems as well.

1. Put Everything Under Version Control

Any files associated with any project you are working on that may be of interest to anyone else—or even only to yourself—should be put under version control. Note that this is not limited to source code and files related to the implementation of a project, but also includes documents such as meeting minutes, specifications, architecture and design documents, artwork, configuration files and install scripts. When doing research for a project and gathering information from external resources, I also like to add those to the repository. Some examples are product brochures, protocol specifications, book references and links to company web sites. E-mail correspondence, scans of whiteboard notes or a concept drawing on a napkin are also useful to store for later reference.

Although some people think it’s silly to archive files that never change in a version control system, I find great value in having every document related to a project stored in the same place. It makes finding things so much easier—which can save you a lot of time when you don’t have to dig through hundreds of e-mails to locate that specification you got six months ago but didn’t have time to start implementing until now. Also, in the area of software development, there is no such thing as a document that never changes (or at least, there shouldn’t be, because you always remember to update your documentation, right?). If you are working on a project where many documents are produced by non-technical or non-programming people (i.e. people who don’t use version control), consider setting up automatic synchronization between project file shares and the version control repository.

When documentation is kept in a wiki, things might be a bit different. If the wiki itself keeps track of changes—which any decent wiki will do—there may be no need to store this data in a separate system. If your wiki is backed by a database, you may consider putting the database itself under version control, but some people will view this as redundant (after all, you have automated backups of all your databases, right?). I don’t have any preferences on how this should be solved, as long as all documents related to a project is stored on a central server with associated revision history.

For document formats that require processing before being readable, such as DocBook, LyX and LaTeX files, I prefer also committing them in a more readable form, like PDF or HTML. Some may argue this violates the DRY principle, but it also makes the documents easier to read for people who don’t have the required processing tools installed (or who are just lazy). This can be very useful when distributing documents by linking to them directly in the repository (i.e. via HTTP), but do take care to update both versions when making changes to such files—or even better, automate it.

2. Create Sandbox Home Folders

To encourage developers to use the version control system also for their own documents, (experimental) projects and tools, I recommend creating home folders in the repository, giving each user a sandbox to play with. In my experience, many useful tools have started out as simple scripts in a developer’s home folder and evolved into powerful utilities over time, so why not keep the revision history from day one? This also allows less experienced developers to experiment with branching, tagging and merging, hopefully encouraging them to use those features in “real” projects as well.

3. Use a Common Project Structure and Naming Convention

I recommend a consistent naming convention for all files and folders in a project. Preferably, an effort should be made to maintain the convention between projects throughout the repository. This makes it easier to locate files by partially guessing their name or location. For example, finding the source code for a project with many sub-folders will be much easier if the folder containing source code is named src rather than something totally arbitrary.

Using a common project structure can also be valuable for automated tools. For example, if all projects have a readme.txt or readme.html in their root folder, one can easily implement a script to generate a web page with a brief description of each project in the repository. If you are using an automated build system, such as Apache Maven, some of this structure may already defined for you. Ideally, the project structure and naming policies should be described in your coding conventions or similar guidelines.

4. Commit Often and in Logical Chunks

It’s better to have a broken build in your working repository than a working build on your broken hard drive.

I prefer to follow the basic work cycle described in the Subversion book. This means that you should always update your working copy before doing any changes to files. In general it’s preferred to commit changes in logical chunks. Changes that belong together should be committed together, changes that don’t shouldn’t. This can make the resulting revision history significantly more useful on systems with atomic commits when changes span multiple files.

If you are doing many changes to a project at the same time, split them up into logical parts and commit them in multiple sessions. This makes it much easier to track the history of individual changes, which will save you a lot of time when trying to find and fix bugs later on. For example, if you are implementing feature A, B and C and fixing bug 1, 2 and 3, that should result in a total of at least six commits, one for each feature and one for each bug. If you are working on a big feature or doing extensive refactoring, consider splitting your work up into even smaller parts, and make a commit after each part is completed. Also, when implementing independent changes to multiple logical modules, commit changes to each module separately, even if they are part of a bigger change.

Ideally, you should never leave your office with uncommitted changes on your hard drive. If you are working on projects where changes will affect other people, consider using a branch to implement your changes and merge them back into the trunk when you are done. When committing changes to libraries or projects that other projects—and thus, other people—depend on, make sure you don’t break their builds by committing code that won’t compile. However, having code that doesn’t compile is not an excuse to avoid committing. Use branches instead.

5. Write Meaningful Commit Messages

If you have nothing to say about what you are committing, you have nothing to commit.

Always write a comment when committing something to the repository. Your comment should be brief and to the point, describing what was changed and possibly why. If you made several changes, write one line or sentence about each part. If you find yourself writing a very long list of changes, consider splitting your commit into smaller parts, as described earlier. Prefixing your comments with identifiers like Fix or Add is a good way of indicating what type of change you did. It also makes it easier to filter the content later, either visually, by a human reader, or automatically, by a program.

If you fixed a specific bug or implemented a specific change request, I also recommend to reference the bug or issue number in the commit message. Some tools may process this information and generate a link to the corresponding page in a bug tracking system or automatically update the issue based on the commit.

Here are some examples of good commit messages:

Changed paragraph separation from indentation to vertical space.
...
Fix: Extra image removed.
Fix: CSS patched to give better results when embedded in javadoc.
Add: A javadoc {@link} tag in the lyx, just to show it's possible.
...
- Moved third party projects to ext folder.
- Added lib folder for binary library files.
...
Fix: Fixed bug #1938.
Add: Implemented change request #39381.

Many developers are sloppy about commenting their changes, and some may feel that commit messages are not needed. Either they consider the changes trivial, or they argue that you can just inspect the revision history to see what was changed. However, the revision history only shows what was actually changed, not what the programmer intended to do, or why the change was made. This can be even more problematic when people don’t do fine-grained commits, but rather submit a week’s worth of changes to multiple modules in one large pile. With a fine-grained revision history, comments can be useful to distinguish trivial from non-trivial changes in the repository. In my opinion, if the changes you made are not important enough to comment on, they probably are not worth committing either.

6. Do All File Operations in the Version Control System

Whenever you need to copy, delete, move or rename files or folders in the repository, do so using the corresponding file operations in the version control system.1 If this is done only on the local file system, the history of those changes will be lost forever. I consider structural changes just as important as changes to the files themselves, so there is no reason why not to let the version control system keep track of them. Also, when people know all their changes can be undone, the threshold for doing radical restructuring and major refactoring will be lowered, which can have a significant impact on preventing the build-up of technical debt.

7. Set Up Change Notifications

To monitor changes in the repository as they happen, I recommend setting up change notifications to send out an e-mail or update an RSS feed whenever a commit is made. Some systems support notifications directly via event hooks—sometimes with default implementations provided—while others may require external cron jobs, daemons or custom scripts to provide this feature.

My recommendation is that all developers subscribe to change notifications, since they can have many advantages. Obviously, they are useful if you want to see what changes are being done to projects you are working on or have an interest in (i.e. a library your project is using), but they might also encourage—or scare—people into writing more useful commit messages, since they know someone might actually be reading them.

Typically the notifications will also contain extracts of the files that were changed, making them useful for light-weight code reviews. Programmers who monitor source code changes can keep an eye out for code smells or violations of the coding conventions, and if you are lucky, you might even learn something by reading other people’s code.

Here’s an example of what a commit notification e-mail can look like:

From: svn-commit@company.com
Sent: Wednesday, March 05, 2008 11:23 AM
To: svn-commit@company.com
Subject: [SVN:CompanyRepository] r6523 - trunk/documents/templates

Author: anders
Date: 2008-03-05 11:23:08 +0100 (Wed, 05 Mar 2008) New Revision: 6523

Modified:
    trunk/documents/templates/document.lyx
Log:
Changed paragraph separation from indentation to vertical space.

Modified: trunk/documents/templates/document.lyx
===================================================================
--- trunk/documents/templates/document.lyx 2008-03-05 09:22:49 UTC (rev 6522)
+++ trunk/documents/templates/document.lyx 2008-03-05 10:23:08 UTC (rev 6523)
@@ -32,7 +32,7 @@
\footskip 1cm
\secnumdepth 3
\tocdepth 3
-\paragraph_separation indent
+\paragraph_separation skip
\defskip medskip
\quotes_language english
\papercolumns 1

If you are working on a large project or there are many active projects in your repository, you may find it useful to create separate notifications for each module or project. If notifications are sent via e-mail, you can also configure the subject field to indicate which module or repository the notification belongs to, making them possible to process with standard e-mail filtering rules.

Conclusion

If you are already doing all of the above, great for you! If not, adding even a few of these to your work habits can make a difference. Of course, not everyone is in a position to change the structure of their project or the repository configuration, but any programmer can make their life easier with logically grouped commits and meaningful commit messages. Consider giving it a try, you might like it.

Please share your thoughts.

Notes:

  1. CVS has very limited support for file oprations, which is a good reason to switch to Subversion.
Be Sociable, Share!
Share

8 Comments

  1. Posted August 5, 2008 at 12:56 | Permalink

    I recommend taking a look at KDE’s commit policy for some good practical advice. Some of the rules are specific to KDE, but most are suitable for any project.

  2. Posted August 21, 2008 at 21:52 | Permalink

    You may also want to check out this interesting discussion on work cycles and commit frequencies (Coding Horror) and these notes on using version control with continuous integration systems (Tech Rock Guy).

  3. Posted October 30, 2008 at 12:35 | Permalink

    Very useful information.

    Also if possible, could you please share some links to effectively use ‘trunk’ and ‘branch’ in Subversion?

    Thanks.

  4. Posted January 7, 2009 at 19:19 | Permalink

    Anders, thanks for this list. I’ve combined this with some thoughts from others and just posted this: http://blog.bstpierre.org/version-control-habits

    Interested in any feedback you might have on the topic. Thanks!

    -Brian

  5. Dave Cantrell
    Posted December 12, 2009 at 01:58 | Permalink

    Anders,

    Just wanted to say this is a great post. I was looking for exactly this kind of information and you summed everything up very nicely. Thanks for the info!

  6. Posted January 25, 2010 at 12:32 | Permalink

    Very usefull tips!
    Thanks Anders

  7. Posted May 18, 2010 at 03:49 | Permalink

    What do you think about Visual Source Safe in comparison to CVS. Thanks for the post.

  8. Andreas Wehler
    Posted July 4, 2012 at 11:24 | Permalink

    2. Create sandbox home folders
    =====
    Good idea. Perhaps additionally a true sandbox would be a good idea also. It holds temporary and void contents by definition within a separate repository. Commits to this sandbox will not increment the official revision number nor probably trigger any notifications. This may lower someone’s barrier to experiment with a new tool.

    Thanks for the great contribution!

    Andreas

6 Trackbacks

  1. [...] Anders Sandvig, Best Practices for Version Control [...]

  2. [...] Best Practices for Version Control « loop label Good Version Control Tips for Programmers! (tags: blog svn programming best standards bestpractices practices subversion sourcecontrol git versioncontrol tips) [...]

  3. [...] within a version control system.  If you would like to read the full article, you can find it at: http://blog.looplabel.net/2008/07/28/best-practices-for-version-control/. This entry was posted in Uncategorized. Bookmark the permalink. ← Seven Basics [...]

  4. [...] Here’s an extract from my recommended best practices for version control: [...]

  5. By Appropriate Checkin Comments « MTR on November 17, 2011 at 14:56

    [...] One developer writes: Many developers are sloppy about commenting their changes, and some may feel that commit messages are not needed. Either they consider the changes trivial, or they argue that you can just inspect the revision history to see what was changed. However, the revision history only shows what was actually changed, not what the programmer intended to do, or why the change was made. This can be even more problematic when people don’t do fine-grained commits, but rather submit a week’s worth of changes to multiple modules in one large pile. With a fine-grained revision history, comments can be useful to distinguish trivial from non-trivial changes in the repository. In my opinion, if the changes you made are not important enough to comment on, they probably are not worth committing either. [...]

  6. […] Commit messages (check-in comments) are a very important part of the software development process. Source control keeps track of all changes to a file, but we need good commit messages to understand the intention behind a particular change to a file. Anders Sandvig has a good overview of how to write good commit messages in this blog post. […]

Post a Comment

Your email is never shared. Required fields are marked *

*
*