Chapter 6. The Difference Tools

Table of Contents

1. Binary Files
2. Interfacing
2.1. diff_command
2.2. merge_command
3. When No Diff is Required
4. Using diff and merge
4.1. diff_command
4.2. merge_command
5. Using fhist
5.1. diff_command
5.2. merge_command

This chapter describes the difference commands in the project configuration file. Usually these commands are used by the aegis -DIFFerence command when differencing files, but they may be used to accomplish some other things.

The default setting is for Aegis to reject filenames which contain shell special characters. This ensures that filenames may be substituted into the commands without worrying about whether this is safe. If you set the shell_safe_filenames field of the project aegis.conf file to false, you will need to surround filenames with the ${quote filename} substitution. This will only quote filenames which actually need to be quoted, so users usually will not notice. This command applies to all of the various filenames in the sections which follow.

1. Binary Files

Aegis doesn't particularly care whether your files are binary or text. However, your difference and merge tools certainly will. In general, you need format-specific difference and merge tools for each of the file formats used in your project. Unfortunately, most vendors of software which make use of proprietary file formats do not supply difference and merge tools.

The simplest compromise is to treat all files as text, with manual repairs for binary files.

A more elegant solution is to use a shell script invoked by the diff_­command in the project aegis.conf file. This shell script examines the file to determine the file format, and then runs the appropriate difference tool. Similar considerations apply to the merge_­command field.

Please note that this support is not present in Aegis itself because (a) it would cause code bloat, and (b) it is entirely possible to do with a shell script launched by diff_command.

2. Interfacing

The diff command is configured by a field of the project configuration file (aegis.conf).

2.1. diff_command

This command is used by aed to produce a difference listing when file in the development directory was originally copied from the current version in the baseline.[20]

All of the command substitutions described in aesub are available. In addition, the following substitutions are also available:

${ORiginal}

The absolute path name of a file containing the version originally copied. Usually in the baseline.

${Input}

The absolute path name of the edited version of the file. Usually in the development directory.

${Output}

The absolute path name of the file in which to write the difference listing. Usually in the development directory.

An exit status of 0 means successful, even of the files differ (and they usually do). An exit status which is non-zero means something is wrong.

The non-zero exit status may be used to overload this command with extra tests, such as line length limits. The difference files must be produced in addition to these extra tests.

2.2. merge_command

This command is used by aed to produce a difference listing when file in the development directory is out of date compared to the current version in the baseline.

All of the command substitutions described in aesub are available. In addition, the following substitutions are also available:

${ORiginal}

The absolute path name of a file containing the version originally copied. Usually in a temporary file.

${Most_Recent}

The absolute path name of a file containing the most recent version. Usually in the baseline.

${Input}

The absolute path name of the edited version of the file. Usually in the development directory. Aegis usually moves the source file aside, so that the output can replace the source file.

${Output}

The absolute path name of the file in which to write the difference listing. Usually in the development directory. This is usually the name of a change source file.

An exit status of 0 means successful, even of the files differ (and they usually do). An exit status which is non-zero means something is wrong.

3. When No Diff is Required

It is possible to configure a project to omit the diff step as unnecessary, by the following setting:

diff_command = "exit 0";

This disables all generation, checking and validation of difference files for each change source file. The merge functions of the aediff(1) command are unaffected by this setting.

4. Using diff and merge

These two tools are available with most flavours of UNIX, but often in a very limited form. One severe limitation is the diff3 command, which often can only cope with 200 lines of differences. The best alternative is to use GNU diff, which has context differences available, and a far more robust diff3(1) implementation.

See the earlier Interfacing section for substitution details.

4.1. diff_command

The entry in the configuration file looks like this:

diff_command =
  "set +e; diff -c $original "
  "$input > $output; test $? -le 1";

This needs a little explanation:

  • This command is always executed with the shell's -e option enabled, causing the shell to exit on the first error. The "set +e" turns this off.

  • The diff command exits with a status of 0 if the files are identical, and a status of 1 if they differ. Any other status means something horrible happened. The "test" command is used to change this to the exit status aegis expects.

The -c option says to produce a context diff. You may choose to use the -u option, to produce uni-diffs, if your diff command supports it.

You may also wish to consider ignoring white space in comparisons, as these tend to be cosmetic changes and not very interesting to code reviewers. The -b option of GNU Diff will ignore changes to the amount of white space, and the -w option will ignore white space altogether.

Binary files will often cause modern versions of GNU Diff to exit with an exit status of 2, which is probably reasonable most of the time. If your project contains binary files, you may want to treat all files as text files. Use the GNU Diff -a option in this case.

A useful alternative, available with more recent versions of GNU Diff, is the -U option. This is a more compact form than the -c option, and is able to give the whole file as context.

diff_command =
  "set +e; diff -U999999 $original "
  "$input > $output; test $? -le 1";

The exit status must once again be taylored, however the output will be the whole source for context, with changes marked by `+' and `-' in the left margin. This, reviewers need only search for /^[-+]/ in order to see all edit made to the file.

4.2. merge_command

Note: The merge(1) command is better than this use of the diff3(1) command. See the RCS chapter for more details.

The entry in the configuration file looks like this:

merge_command =
  "(diff3 -e $MostRecent $original  \
   $input | sed -e '/^w$$/d' -e     \
   '/^q$$/d'; echo '1,$$p' ) | ed - \
   $MostRecent > $output";

This needs a lot of explanation.

  • The diff3 command is used to produce an edit script that will incorporate into $MostRecent, all the changes between $original and $input. You may want the -a option, to treat all files as ACSII.

  • The sed command is used to remove the "write" and "quit" commands from the generated edit script.

  • The ed command is used to apply the generated edit script to the $MostRecent file, and print the results on the standard output, which are redirected into the $output file.

5. Using fhist

The fhist program by David I. Bell also comes with two other utilities, fcomp and fmerge which use the same minimal difference algorithm.

See the earlier Interfacing section for substitution details.

5.1. diff_command

The entry in the configuration file looks like this:

diff_command =
  "fcomp -w $original $input "
  "-o $output";

The -w option produces an output of the entire file, with insertions and deletions marked by "change bars" in the left margin. This is superior to context difference, as it shows the entire file as context.

For more information, see the fcomp manual entry.

5.2. merge_command

The entry in the configuration file looks like this:

merge_command =
  "fmerge $original $MostRecent \
   $input -o $output -c /dev/null";

The output of this command is similar to the output of the merge_command in the last section. Conflicts are marked in the output. For more information, see the fmerge manual entry.



[20] Or this is logically the case.