Table of Contents
Aegis is decoupled from the history mechanism. This allows you to use the history mechanism of your choice, SCCS or RCS, for example. You may even wish to write your own.
The intention of this is that you may use a history mechanism which suits your special needs, or the one that comes free with your flavour of Unix™ operating system.
Aegis uses the history mechanism for file history and so does not require many of the features of SCCS or RCS. This simplistic approach can sometimes make the interface to these utilities look a little strange.
In order to track project source file renames and yet preserve a continuous history, the name of each source file and the name of each corresponding history file have nothing in common. The history file will have the same name (both on the local repository and any remote repository it is in) no matter how many times the source file is renamed.
Each source file is assigned universally unique identifier (UUID) when it is first created. This attribute, unlike the source file's name, is immutable and thus is suitable for use when forming the name of the history file.
The history mechanism interface is found in the
project configuration file called
aegis.conf
relative to the root of the baseline.
It is a source file and subject to the same controls
as any other source file.
The history fields of the file are described as follows
This field is used to create a new history. The command is always executed as the project owner. Substitutions available for the command string are:
absolute path of source file
absolute path of history file
In addition, all substitutions described in aesub(5) are available.
This command should be identical to the history_put_command
otherwise mysterious things can happen when branches are ended.
This field is used to get a file from history. The command may be executed by developers. Substitutions available for the command string are:
absolute path of history file
edit number, as given by the history_query_command.
absolute path of destination file
In addition, all substitutions described in aesub(5) are available.
This field is used to add a new change to the history. The command is always executed as the project owner. Substitutions available for the command string are:
absolute path of source file
absolute path of history file
In addition, all substitutions described in aesub(5) are available.
This command should be identical to the history_create_command
otherwise mysterious things can happen when branches are ended.
This field is used to query the topmost edit of a history file. Result to be printed on the standard output. This command may be executed by developers. Substitutions available for the command string are:
absolute path of history file
In addition, all substitutions described in aesub(5) are available.
This field describes the content style which the history tool is capable of working with.
The history tool can only cope with files which contain printable ASCII characters, plus space, tab and newline. The file must end with a newline. This is the default.
The history tool can only cope with files which do not contain the NUL character. The file must end with a newline.
The history tool can cope with files without any limitation on the form of the contents.
When a file is added to the history (by either the
history_create_command or the history_put_command
field) it is examined for conformance to this limitation. If there is
a problem, the file is encoded in either the MIME quoted printable or
the MIME Base 64 encoding (see RFC 1521), whichever is smaller, before
being given to the history tool. The file in the baseline is unchanged.
On extract (the history_get_command field) the encoding is reversed,
using information attached to the change file information. This is
because each put could use a different encoding (although in practice,
file contents rarely change that dramatically, and the same encoding is
likely to be deduced every time).
Many history tools (e.g. RCS) can modify the contents of the file when it is committed. While there are usually options to turn this off, they are seldom used. The problem is: if the commit changes the file, the source in the repository now no longer matches the object file in the repository - i.e. the history tool has compromised the referential integrity of the repository.
By default, when this happens Aegis issues a fatal error (at intergate
pass time). You can turn this into a warning if you are convinced
this is irrelevant. This would only make sense if the substition only
ever occurs in comments. See aepconf(5) for more information
on the values for this field.
The default setting is for Aegis to reject filenames which contain shell
special characters. This ensures that filenames may be substituted
into the commands without worrying about whether this is safe. If you
set the shell_safe_filenames field of the project aegis.conf
file to false, you will need to surround filenames with the
${quote filename} substitution. This will only
quote filenames which actually need to be quoted, so users usually will
not notice. This applies to all of the various filenames in
the commands in the sections which follow.
The source distribution contains numerous configuration examples in a directory called lib/config.example/ which is installed into /opt/aegis/share/config.example/ by default. In the interests of accuracy, it may be best to copy configurations from there, rather than copy-type the ones below.
The aesvt(1) command is distributed with Aegis. It supports binary files, has versy small history files, and has good end-to-end behaviour. The entries for the commands are listed below.
This command is used to create a new file history. This command is always executed as the project owner.
The following substitutions are available:
absolute path of the source file
absolute path of the history file
The entry in the
aegis.conf
file looks like this:
history_create_command = "aesvt -checkin " "-history $history " "-f $input" ;
It is essential that the history_create_command and the
history_put_command are identical. It is a historical accident
that there are two separate commands: before Aegis supported branches,
this was not a requirement.
This command is used to get a specific edit back from history. This command is always executed as the project owner.
The following substitutions are available:
absolute path of the history file
edit number, as given by history_query_command
absolute path of the destination file
The entry in the
aegis.conf
file looks like this:
history_get_command = "aesvt -checkout " "-history $history " "-edit $edit " "-o $output" ;
This command is used to query what the history mechanism calls the top-most
edit of a history file.
The result may be any arbitrary string,
it need not be anything like a number,
just so long as it uniquely identifies the edit
for use by the
history_get_command
at a later date.
The edit number is to be printed on the standard output.
This command is always executed as the project owner.
The following substitutions are available:
absolute path of the history file
The entry in the
aegis.conf
file looks like this:
history_query_command = "aesvt -query " "-history $history" ;
The lib/config.example/aesvt file in the Aegis distribution (installed as /opt/aegis/share/config.example/aesvt by default) contains all of the above commands, so that you may readily insert them into your project configuration file.
Also, there are some subtleties to writing the commands, which are not present in the above examples. In particular, being able to support file names which contain characters which are special to the shell requires the use of the ${quote} substitution around all of the files names in the commands.
In addition, it is possible to store meta-date with each version.
For example: “Description=${quote ($version) ${change
description}}” inserts the version number and the brief description
into the file's log. This means that using the aesvt -list option
will provide quite useful summaries.
The aesvt(1) command is able to cope with binary files. Set
history_content_limitation =
binary_capable;
so that Aegis knows that no encoding is required.
The entries for the commands are listed below. SCCS uses a slightly different model than Aegis wants, so some maneuvering is required. The command strings in this section assume that the SCCS command sccs is in the command search PATH, but you may like to hard-wire the path, or set PATH at the start of each command. (It is also possible that you need to say “delta” instead of “sccs delta”. if this is the case, this command needs to be in the path.) You should also note that the strings are always handed to the Bourne shell to be executed, and are set to exit with an error immediately a sub-command fails.
One further assumption is that the ae-sccs-put(1) command, which is distributed with Aegis, is in the command search path. This insulates some of the weirdness that SCCS carries on with, and makes the commands below comprehensible.
This command is used to create a new project history. The command is always executed as the project owner.
The following substitutions are available:
absolute path of the source file
absolute path of the history file
The entry in the
aegis.conf
file looks like this:
history_create_command =
"ae-sccs-put -y$version -G$input "
" ${d $h}/s.${b $h}";
It is important that
the history_create_command and
the history_put_command be the same.
This is necessary for branching to work correctly.
This command is used to get a specific edit back from history. The command may be executed by developers.
The following substitutions are available:
absolute path of the history file
edit number, as given by history_query_command
absolute path of the destination file
The entry in the
aegis.conf
file looks like this:
history_get_command =
"get -r'$e' -s -p -k "
" ${d $h}/s.${b $h} > $o";
This command is used to add a new "top-most" entry to the history file. This command is always executed as the project owner.
The following substitutions are available:
absolute path of source file
absolute path of history file
The entry in the
aegis.conf
file looks like this:
history_put_command =
"ae-sccs-put -y$version -G$input "
" ${d $h}/s.${b $h}";
Note that the SCCS file is left in the not-edit state, and that the source file is left in the baseline.
It is important that
the history_create_command and
the history_put_command be the same.
This is necessary for branching to work correctly.
This command is used to query what the history mechanism calls the top-most
edit of a history file.
The result may be any arbitrary string,
it need not be anything like a number,
just so long as it uniquely identifies the edit
for use by the
history_get_command
at a later date.
The edit number is to be printed on the standard output.
This command may be executed by developers.
The following substitutions are available:
absolute path of the history file
The entry in the
aegis.conf
file looks like this:
history_query_command =
"get -t -g ${d $h}/s.${b $h}";
Note that "get" reports the edit number on stdout.
The lib/config.example/sccs file in the Aegis
distribution contains all of the above commands (installed as
/opt/aegis/share/example.config/sccs by default) so that you may
readily insert them into your project configuration file (called
aegis.conf by default, see aepconf(5) for how to call it
something else).
Also, there are some subtleties to writing the commands, which are not
present in the above examples. In particular, being able to support file
names which contain characters which are special to the shell requires
the use of the ${quote} substitution around all of the files names in
the commands.
In addition, it is possible to have a much more useful description for
the -y option. For example: “-y${quote ($version) ${change
description}}” inserts the version number and the brief description
into the file's log. This means that using the sccs prs(1) command
will provide quite useful summaries.
The entries for the commands are listed below. RCS uses a slightly different model than aegis wants, so some maneuvering is required. The command strings in this section assume that the RCS commands ci and co and rcs and rlog are in the command search PATH, but you may like to hard-wire the paths, or set PATH at the start of each. You should also note that the strings are always handed to the Bourne shell to be executed, and are set to exit with an error immediately a sub-command fails.
In these commands, the RCS file is kept unlocked, since only the owner will be checking changes in. The RCS functionality for coordinating shared access is not required.
One advantage of using RCS version 5.6 or later is that binary files are supported, should you want to have binary files in the baseline.
This command is used to create a new file history. This command is always executed as the project owner.
The following substitutions are available:
absolute path of the source file
absolute path of the history file
The entry in the
aegis.conf
file looks like this:
history_create_command = "ci -u -d -M -m$c -t/dev/null \ $i $h,v; rcs -U $h,v";
The "ci -u" option is used to specify that
an unlocked copy will remain in the baseline.
The "ci -d" option is used to specify that
the file time rather than the current time is to be used for the new revision.
The "ci -M" option is used to specify that
the mode date on the original file is not to be altered.
The "ci -t" option is used to specify that
there is to be no description text for the new RCS file.
The "ci -m" option is used to specify that
the change number is to be stored in the file log
if this is actually an update (typically from
aenf
after
aerm
on the same file name).
The "rcs -U" option is used to specify that
the new RCS file is to have unstrict locking.
It is essential that the history_create_command and the
history_put_command are identical. It is a historical accident
that there are two separate commands: before Aegis supported branches,
this was not a requirement.
This command is used to get a specific edit back from history. This command is always executed as the project owner.
The following substitutions are available:
absolute path of the history file
edit number, as given by history_query_command
absolute path of the destination file
The entry in the
aegis.conf
file looks like this:
history_get_command = "co -r'$e' -p $h,v > $o";
The "co -r option is used to specify
the edit to be retrieved.
The "co -p option is used to specify that
the results be printed on the standard output;
this is because the destination filename will
never
look anything like the history source filename.
This command is used to add a new "top-most" entry to the history file. This command is always executed as the project owner.
The following substitutions are available:
absolute path of source file
absolute path of history file
The entry in the
aegis.conf
file looks like this:
history_put_command = "ci -u -d -M -m$c -t/dev/null \ $i $h,v; rcs -U $h,v";
Uses ci to deposit a new revision, using -d and -M as described
for history_create_command. The -m flag stores the change number
in the file log, which allows rlog(1) to be used to find the Aegis
change numbers to which each revision of the file corresponds.
The "ci -u" option is used to specify that
an unlocked copy will remain in the baseline.
The "ci -d" option is used to specify that
the file time rather than the current time is to be used for the new revision.
The "ci -M" option is used to specify that
the mode date on the original file is not to be altered.
The "ci -m" option is used to specify that
the change number is to be stored in the file log,
which allows
rlog
to be used to find the change numbers to which
each revision of the file corresponds.
You might want to use -m$p,$c instead which stores both the
project name and the change number. Or -m$version, which will
be composed of the branch and the delta. These make it much easier to
track changes across branches.
It is essential that the history_create_command and the
history_put_command are identical. It is a historical accident
that there are two separate commands: before Aegis supported branches,
this was not a requirement.
This command is used to query what the history mechanism calls the top-most
edit of a history file.
The result may be any arbitrary string,
it need not be anything like a number,
just so long as it uniquely identifies the edit
for use by the
history_get_command
at a later date.
The edit number is to be printed on the standard output.
This command is always executed as the project owner.
The following substitutions are available:
absolute path of the history file
The entry in the
aegis.conf
file looks like this:
history_query_command =
"rlog -r $h,v | "
"awk '/^revision/ {print $$2}'";
RCS also provides a merge program, which can be used to provide a three-way merge.
All of the command substitutions described in aesub are available. In addition, the following substitutions are also available:
The absolute path name of a file containing the version originally copied. Usually in a temporary file.
The absolute path name of a file containing the most recent version. Usually in the baseline.
The absolute path name of the edited version of the file. Usually in the development directory. Aegis usually moves the original source file aside, so that the output may have the source file's name.
The absolute path name of the file in which to write the difference listing. Usually in the development directory, usually the name of a change source file.
The entry in the
aegis.conf
file looks like this:
merge_command = "set +e; " "merge -p -L baseline -L C$c " " $mr $orig $in > $out; " "test $? -le 1";
The "merge -L" options are used to specify
labels for the baseline and the development directory,
respectively,
when conflict lines are inserted into the result.
The "merge -p" options is used to specify that
the results are to be printed on the standard output.
It is important that this command does not move its input and output files around, otherwise this contradicts the warnings Aegis may issue to the user. (In previous versions of Aegis, this was necessary, however this is no longer the case.)
Warning: The version of diff3(1) available to RCS merge(1) has a huge impact on its performance and utility. You need to grab and install GNU diff to get the best results. Unfortunately the diff tool used by RCS merge(1) is determined at compile time. This means that you need to build and install GNU diff package before you build and install GNU RCS package.
Many history tools (including RCS) can modify the contents of the file when it is committed. While there are usually options to turn this off, they are seldom used. The problem is: if the commit changes the file, the source in the repository now no longer matches the object file in the repository - i.e. the history tool has compromised the referential integrity of the repository.
history_put_trashes_file = warn;
If you use RCS keyword substitution, you will need this line. (The default is to report a fatal error.)
Another reason for this option is that it tells Aegis it needs to recalculate the file's fingerprint after a checkin.
The lib/config.example/rcs file in the Aegis distribution (installed as /opt/aegis/share/config.example/rcs by default) contains all of the above commands, so that you may readily insert them into your project configuration file.
Also, there are some subtleties to writing the commands, which are not present in the above examples. In particular, being able to support file names which contain characters which are special to the shell requires the use of the ${quote} substitution around all of the files names in the commands.
In addition, it is possible to have a much more useful description for
the -m option. For example: “-m${quote ($version) ${change
description}}” inserts the version number and the brief description
into the file's log. This means that using the rlog(1) command
will provide quite useful summaries.
RCS (version 5.6 and later) is able to cope with binary files. It does so by saving a whole copy of the file at each check-in.
If you want Aegis
to transparently encode all such files, simply leave the
history_content_limitation field unset.
If you want to check-in binary files, add the -kb option to
each of the rcs -U commands in the fields above, and also set
history_content_limitation = binary_capable;
so that Aegis knows that no encoding is desired.
If you use RCS keywords, such as $id$ or $log$,
this will result in the file in the baseline being changed by RCS at
integrate pass. This is after the build. The result is that the
source files no longer match the object files. Oops.
While such mechanisms are essential when using only a simple history tool, far more information may be obtained using the file history report (aer file_history filename), rendering such crude methods unnecessary.
In addition to expected expansions in file header comments, this can also be very destructive if, for example, such a string appeared in a uuencoded or MIME base 64 encoded file.
If you wish to prevent RCS from performing keyword expansion, used the
rcs -kb option.
If, however, you wish to keep using keyword expansion, set
history_tool_trashes_file = warning;
to cause Aegis to warn you, rather than fail.
The fhist program was written by David I. Bell and is admirably suited to providing a history mechanism with out the "cruft" that SCCS and RCS impose.
Please note that the [# edit #] feature needs to be avoided,
or the
-Forced_Update
(-fu) flag needs to be used in addition to the
-Conditional_Update
(-cu) flag,
otherwise updates will complain that
“Input file "XXX" contains edit A
instead of B for module "YYY"”
The
history_create_command
and the
history_put_command
are intentionally identical.
This minimizes problems when using branches.
This command is used to create a new project history. The command is always executed as the project owner.
The following substitutions are available:
absolute path of the source file
absolute path of the history file
The entry in the
aegis.conf
file looks like this:
history_create_command =
"fhist ${b $h} -create -cu "
"-i $i -p ${d $h} -r";
Note that the source file is left in the baseline.
This command is used to get a specific edit back from history. The command may be executed by developers.
The following substitutions are available:
absolute path of the history file
edit number, as given by history_query_command
absolute path of the destination file
The entry in the
aegis.conf
file looks like this:
history_get_command =
"fhist ${b $h} -e '$e' -o $o "
"-p ${d $h}";
Note that the destination filename will never look anything like the history source filename, so the -p is essential.
This command is used to add a new "top-most" entry to the history file. This command is always executed as the project owner.
The following substitutions are available:
absolute path of source file
absolute path of history file
The entry in the
aegis.conf
file looks like this:
history_put_command =
"fhist ${b $h} -create -cu "
"-i $i -p ${d $h} -r";
Note that the source file is left in the baseline.
This command is used to query what the history mechanism
calls the "top-most" edit of a history file.
The result may be any arbitrary string,
it need not be anything like a number,
just so long as it uniquely identifies the edit
for use by the
history_get_command
at a later date.
The edit number is to be printed on the standard output.
This command may be executed by developers.
The following substitutions are available:
absolute path of the history file
The entry in the
aegis.conf
file looks like this:
history_query_command =
"fhist ${b $h} -l 0 "
"-p ${d $h} -q";
The lib/config.example/fhist file in the Aegis distribution (installed as /opt/aegis/share/config.example/fhist by default) contains all of the above commands, so that you may readily insert them into your project configuration file.
By default, FHist is unable to cope will NUL characters in its input files, however this is the only limitation. By default, Aegis expects that history tools are only able to cope with printable ASCII text. To tell it ontherwise, set
history_content_limitation = international_text;
in the project aegis.conf file.
Aegis will transparently encode binary files (files which contain NUL characters) on entry and exit from the history tool. This means that you may have binary files in your project without configuring anything special.
FHist (version 1.7 and later) has support for binary files. The fhist
-binary option may be used to specify that the file is binary,
that it may contain NUL characters. It is essential that you have
consistent presence or absence of the -binary option for each file
when combined with the -CReate, -Update, -Conditional_Update and
-Extract options. Failure to do so will produce inconsistent results.
This means that you have to always use the -binary option in the
history_create_command and history_put_command fields. You
have to decide right at the very beginning if your project history
will ever have binary files, or will never have binary files. You can't
change your mind later. If you choose to use the -binary option, set
history_content_limitation = binary_capable;
However, Aegis
would transparently encode all such files, if you leave the
history_content_limitation field set for international text.
In some cases, Aegis' encoding will be more efficient than fhist's.
And you have the advantage of being able to change your mind later.
When you have files which exist for long periods of time, particularly files such as the ones typically used by history tools, which are generally appended to, without modification of the bulk of the file, there is a very real possibility that a block of the file could become corrupted over the years.[17] Unless you access the file versions contained within that block, you have no way of knowing whether or not the history file is OK. (Arguably, the operating system should check for this, but many do not, and in any case the error may not be detectable at that level.)
Using Aegis, you can add a simple checksum to your history files which will detect many cases of corruption such as this, for all of the commonly used history tools. Note: it cannot detect all corruptions (nothing can) but it will detect more than many operating systems will.
You don't need to use this technique with SCCS or aesvt(1), they already have checksums in their files.
In general, you need to do three things:
You need to create some kind of checksum of your history file each time
you modify it. Something like md5sum(1) from the GNU Fileutils
would be good. Store the checksum in a file next to the history file.
This would be done in the
history_create_command and
history_put_command fields of the project aegis.conf file.
Each time the file is read, you need to verify the file's checksum.
Use the same checksum utility as before, and then compare it using, say,
cmp(1); it it fails (either an IO error, or the checksum doesn't
compare equal) then don't proceed with the history file access. You may
need to repair or replace the disk. You will need to restore from backup
(yesterday's backup, see below). This would be done at the beginning
of the history_create_command, history_put_command,
history_get_command and history_query_command fields
of the project aegis.conf file.
Because you may not actually interact with the file for years at a time, you need to check the file fingerprints much more often. Daily or at least weekly is suggested. You do this with a cron(1) job run nightly which compares all of the history files with their md5sum(1) checksums. Email failures to the system administrator and the project administrators. By doing this nightly, you not only avoid backing-up corrupted files, you will always know on which backup tape the good copy resides - yesterday's.
In order to implement this,
you need to modify some fields of your project aegis.conf file as follows:
You need to test if the history file and its checksum file exist, and
check the checksum if this is the case. Then, use whichever history tool
you choose (see the previous sections of this chapter). If it succeeds,
run md5sum(1) over the history file (not the source file)
and store the checksum in a file next to the history tool's file.
Using the same filename plus a .md5sum extension makes the
cron(1) job easier to write.
You need to test if the file exists (it may, for example, be an old project to which you have recently added this technique) and check the checksum if this is the case. Then, use your history tool as normal. If it succeeds, run md5sum(1) over the history file (not the source file) as in the create case.
You need to test if the file exists (it may, for example, be an old project to which you have recently added this technique) and check the checksum if this is the case. Then use your history tool as normal.
This command is only used at aeipass file, immediately after one
of the history_create_command or history_put_command
commands. It is up to you whether you think you need to add a guard as
for the history_get_command field.
Rather than run md5sum(1) on the history files each time you modify them, you could use gzip(1) to obtain some minor compression, but it also provides and Adler32 checksum of the file. For files with long histories, this can be tedious to unpack every time you need to extract an old version, but such operations are frequently I/O bound, and so there may be no perceived slowness by the user..
In addition to your history files, Aegis maintains a database of file
meta-data. In order to add a checksum to the various file making up
the database, turn on the compressed_database project attribute.
In addition to compressing the database (a minor savings) it also adds
an Adler32 checksum.
You can check this in the cron(1) job by using gzcat(1)
sent to /dev/null.
[17] See also Saltzer, J.H. et al (1981) End-to-end arguments in system design, http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf