Chapter 11. Geographically Distributed Development

Table of Contents

1. Introduction
1.1. Risk Reduction
1.2. What to Send
1.3. Methods and Topologies
1.4. The Rest of this Chapter
2. Manual Operation
2.1. Manual Send
2.2. Sending Baselines
2.3. Sending Branches
2.4. Manual Receive
2.5. Getting Started
3. Sneaker Net
4. Automatic Operation
4.1. Sending
4.2. Receiving
5. World Wide Web
5.1. Server
5.2. Browser
5.3. Hands-Free Tracking
6. Security
6.1. Trojan Horses
6.2. PGP
6.3. Sorcerer's Apprentice
7. Patches
7.1. Send
7.2. Receive
7.3. Limitations

This chapter describes various methods of collaboratively developing software using Aegis, where the collaborating sites are separated by administrative domains or even large physical distances.

While many Open Source projects on the Internet typify such development, this chapter will also describe techniques suitable for commercial enterprises who do not wish to compromise their intellectual property.

1. Introduction

The core of the distribution method is the aedist(1) command. In its simplest form, the command

aedist -send | aedist -receive

will clone a change set locally. This may appear less than useful (after all, the aeclone(1) command already exists) until you consider situations such as

aedist -send | e-mail | aedist -receive

where e-mail represents the sending, transport and receiving of e-mail. In this example, the change set would be reproduced on the e-mail recipient's system, rather than locally. Similar mechanisms are also possible for web distribution.

1.1. Risk Reduction

Receiving change sets in the mail, however, comes with a number of risks:

  • You can't just commit it to your repository, because it may not even compile.

  • Even if it does compile, you want to run some tests on it first, to make sure it is working and doesn't break anything.

  • Finally, you would always check it out, to make sure it was appropriate, and didn't do more subtle damage to the source.

While these are normal concerns for distributing source over the Internet, and also internally within companies, they are the heart of the process employed by Aegis. All of these checks and balances are already present. The receive side simply creates a normal Aegis change, and applies the normal Aegis process to it.

  • The change set format is unpacked into a private work area, not directly into the repository. This is a normal Aegis function.

  • The change set is then confirmed to build against the repository. All implications flowing from the change are exercised. Build inconsistencies will flag the change for attention by a human, and the change set will not be committed to the repository. This is a normal Aegis function.

  • The change set is tested. If it came accompanied by tests, these are run. Also, relevant tests from the repository are run. Test inconsistencies will flag the change for attention by a human, and the change set will not be committed to the repository. This is a normal Aegis function.

  • Once the change set satisfies these requirements, it must still be reviewed by a human before being committed, to validate the change set for suitability and completeness. This is a normal Aegis function.

1.2. What to Send

While there are many risks involved in receiving change sets, there also problems in figuring out what to send.

At the core of Aegis' design is a transaction. Think of the source files as rows in a database table, and each change-set as a transaction against that table. The build step represents maintaining referential integrity of the database, but also represents an input validation step, as does the review. And like databases, the transactions are all-or-nothing affairs, it is not possible to commit “half” a transaction.

As you can see, Aegis changes are already elegantly validated, recorded and tracked, and ideally suited to being packaged and sent to remote repositories.

1.3. Methods and Topologies

In distributed systems such as described in this chapter, there are two clear methods of distribution:

  • The “push” method has the change set producer automatically send the change-set to a registered list of interested consumers. This is supported by Aegis and aedist.

  • The “pull” method has the change set producer make the change sets available for interested consumers to come and collect. This is supported by Aegis and aedist.

These are two ends of a continuum, and it is possible and common for a mix-and-match approach to be taken.

There are also many ways of arranging how distribution is accomplished, and many of the distribution arrangements (commonly called topologies, when you draw the graphs) are supported by Aegis and aedist:

  • The star topology has a central master repository, surrounded by contributing satellite repositories. The satellites are almost always “push” model, however the central master could be either “push” or “pull” model.

  • The snowflake topology is like a hierarchical star topology, with contributors feeding staging posts, which eventually feed the master repository. Common for large Open Source Internet projects. Towards the master repository is almost always “push” model, and away from the master is almost always “pull” model.

  • The network topology is your basic anarchic autonomous collective, with change sets flying about peer-to-peer with no particular structure. Often done as a “push” model through an e-mail mailing list.

All of these topologies, and any mixture you can dream up, are supported by Aegis and aedist. The choice of the right topology depends on your project and your team.

1.4. The Rest of this Chapter

Aegis is the ideal medium for hosting distributed projects, for all the above reasons, and the rest of this chapter describes a number of different ways of doing this:

  • The second section will describe how to perform these actions manually, both send and receive, as this demonstrates the method efficiently, and represents a majority of the use made of the mechanism.

  • The third section will show how to automate e-mail distribution and receipt. Automated e-mail distribution is probably the next most common use.

  • The fourth section will show how to configure distribution and receipt using World Wide Web servers and browsers.

  • The fifth section deals with security issues, such as validating messages and coping with duplicate storms.

2. Manual Operation

This section describes how to use aedist manually, in order to send and receive change sets.

2.1. Manual Send

In order to send a change set to another site, it must be packaged in a form which captures all of the change's attributes and the contents of the change's files. This package must be compressed and encoded in a form which will survive the various mail transport agents it must pass through, and then given to the local mail transport agent. This is done by a single command

% aedist -send -c number | \
  mail joe.blow@example.com
%

All of the usual Aegis command line options are available, so you could specify the project on the command line if you needed to.

This command will send the sources from the development directory, if the change is not yet completed. This is useful for collaboration between developers, but it isn't reviewed and integrated, so beware.

It is more usual to send a change which has been completed. In this case the version of the file which was committed is sent. If necessary, the history files will be consulted to reconstruct this information. See the “Automatic Send” section, below, for more discussion of this.

There are many options for customizing the e-mail message sent to joe.blow@example.com, see aedist(1) for more information.

2.2. Sending Baselines

In order to send the entire contents of the repository to someone, you use a very similar command:

% aedist -send -baseline | \
  mail joe.blow@example.com
%

This can be a very large change set, because it is all files of the project.

2.3. Sending Branches

There are times when remote developers are not interested in a blow-by-blow update of your repository. Instead they want to have updates from time to time. In order to send them the current state of your active development branch, in this example “example.4.2”, you would use a command of the form

% aedist -send -p example.4 -c 2 | \
  mail joe.blow@example.com
%

Notice how the correspondence between branches and change sets is exploited. The baseline of a branch is the development directory of the “super change” is represents.

Branch change sets like this are smaller than whole baselines, because they include only the files altered by this branch, rather then the state of every file in the project.

2.4. Manual Receive

The simplest form of receiving a change set is to save it from your e-mail program into a file, and then

% aedist -receive -file filename
...lots of information...
%

where filename is where you saved the e-mail message. If your e-mail program is able to write to a pipe, you can use a simpler form. This example uses the Rand Mail Handler's show(1) command:

% show | aedist -receive
...lots of information...
%

Each of these examples assumes that you have used the same project name locally as that of the sender (it's stored in the package). If this isn't the case, you will need to use the -project option to tell aedist which project to apply the change to.

The actions performed by aedist on receive are not quite a mirror of what it does on send. In particular, send usually extracts its information from the repository, but receive does not put the change set directly into the repository.

On receipt of a change set, aedist creates a new change with its own development directory, and unpacks the change set into it, in much the same way as a change would normally be performed by a developer. (Indeed, the receiver must be an authorized developer.)

Once the change is unpacked, it goes through the usual development cycle of build, difference and test. If any portion of this fails, aedist will stop with a suitable error message. If all goes well, development of the change will end, and it will be left in the being reviewed state.

At this point, a local reviewer must examine the change, and it proceeds through the change integration process as normal.

If there is a problem with the change, it can be dealt with as you would with any other defective change - by developing it some more. Or you can email the sender telling them the problem and use aedbu(1) and aencu(1) to entirely discard the change.

2.5. Getting Started

In order to receive a change, you must have a project to receive it into. Also, changes tend to be the difference between an existing repository and what it is to become. You need some way to get the starting point of the differences before you can apply any differences. This section describes one way of doing this.

You start by creating a normal Aegis project in the usual way. That is covered earlier in this User Guide. It helps greatly if you give your local project exactly the same name as the remote project. It doesn't need the same pathname for the project directory, just the same project name.

Once you have this project created, request the remote repository send you a “baseline” change, as described above. Once you have received this, and it is integrated successfully, you are ready to receive and apply change sets. This is an inherently “pull” activity, as the source may never have heard of you before. The initial baseline may arrive by e-mail, or floppy disk, or you may retrieve it from the web, it all depends how the project is being managed.

You will be warned about "potential trojan horse" files in the baseline change set. This is normal, because you are receiving all project configurations file, build files and test files. All of these contain executable commands that will be executed. Caveat emptor. Make sure you trust the source.

3. Sneaker Net

Another common method of transporting data, sometimes a quite large amount of it, is to write it onto transportable media and carry it.

To write a change set onto a floppy, you would use commands such as

% mount /mnt/floppy
% aedist -send -no-base64 \
  -o /mnt/floppy/change.set
% umount /mnt/floppy
%

The above command assumes the floppy is pre-formatted, and that there is a user-permitting line in the /etc/fstab file, as is common for many Linux distributions. The change.set can be any filename you like, but is usually project-name and change-number related.

It takes a very sizable change set to fail to fit on a 1.44MB floppy, because they are compressed (and change sets exceeding 8MB of source are rare, even for huge projects). The -no-base64 option is used to avoid the MIME base 64 encoding, which is necessary for e-mail, not not necessary in this case. The receive side will automatically figure out there is no MIME base 64 encoding.

Reading the change set is just as simple, as it closely follows the other commands for receiving commands sets.

% mount /mnt/floppy
% aedist -rec -f /mnt/floppy/change.set
...lots of output...
% umount /mnt/floppy
%

This technique will work for any of the disks available these days including floppies, Zip, Jaz, etc.

4. Automatic Operation

This section describes how to use aedist to automatically send change sets via e-mail.

4.1. Sending

Change sets can be sent automatically when a change passes integration. You do this by setting the integrate_­pass_­notify_­command field of the project attributes.

In this example, the “example” project sends all integrations to all the addresses on the example-developers mailing list. (The mailing list is maintained outside of Aegis, e.g. by Majordomo.) The relevant attribute (edited by using the aepa(1) command) looks like this:

integrate_pass_notify_command =
  "aedist -p $project -c $change | \
  mail example-users";

Please note that project attributes are inherited by branches when they are created. If you don't want all branches to broadcast all changes, you need to remember to clear this project attribute from the branch once the branch has been created. Alternatively, use the $version substitution to decide who to send the change to.

4.2. Receiving

write this section

You need to set up an e-mail alias, with a wrapper around it - you probably don't want "daemon" as a registered developer.

While aedist(1) makes every attempt to spot potential trojan attacks, you really, really want PGP validation (or similar industrial strength digital signatures) before you accept this kind of input.

5. World Wide Web

This section describes how to use aeget(1) and aedist(1) to transport change sets using the World Wide Web. This requires configuration of the web server to package and send the change sets, and configuration of the browser to receive and unpack the change sets. You can also automatically track a remote site, efficiently downloading and applying new change sets as they appear.

5.1. Server

Aegis has a read-only web interface to its database, it is a web server CGI interface. If you are running Apache, or similar, all you have to do is copy (or symlink, if you have symlinks enabled) the /opt/aegis/bin/aeget file into the web server's cgi-bin directory. For example, the default Apache install would need the following command:

ln -s /opt/aegis/bin/aeget /var/www/cgi-bin/aeget

5.2. Browser

You need to set the appropriate mailcap entry, so that application/aegis-change-set is handled by aedist --receive.

Edit the /etc/mailcap file, and add the lines

# Aegis
application/aegis-change-set;/opt/aegis/bin/aedist -receive -f %s

You may need to restart your web browser for this to take effect.

5.3. Hands-Free Tracking

Clients of sites using a web server, such as the various developers in an open sourec project, it is possible to automatically "replay" change sets on the server which have not yet been incorporated at your site.

The command

aedist --replay -f name-of-web-server

will automatically download any remote change sets not present in the local repository. It downloads them by using the aedist(1) command. It uses commands of the form

aedist --receive -f url-of-change-set

to download the change sets, which have to go through all of the usual Aegis process before vecoming part of your local repository. This includes code review, unless you have configured the develop_­end_­action field of the project configuration to be goto_­awaiting_­development.

If you add this command to a crontab(1) entry, you can check to see if there are change sets to synchronize with once a day, or however often you set the line to run.

6. Security

This section deals with security issues. Security isn't just “keep the bad guys out”, it actually covers availability, integrity and confidentiality.

Availability:

refers to the system being available for use by authorised users. Denial of service and crashes are examples of bad things in this area.

Integrity:

refers to the system being in a known good state. Corrupted change sets and un-buildable repositories are examples of bad things in this area.

Confidentiality:

refers to the system being available only to authorised users. For many Open Source projects, this isn't a large concern, but for corporate users of Aegis, non-disclosure of change-sets as they cross the Internet is a serious requirement.

As you can see, a strategy of “keep the bad guys out” is necessary, but not sufficient, to satisfy security.

This section covers the above security issues, as applied to the use of aedist to move change sets around.

6.1. Trojan Horses

“A Trojan horse is an apparently useful program containing hidden functions that can exploit the privileges of the user [running the program], with a resulting security threat. A Trojan horse does things that the program user did not intend.”[27]

In order to forestall this threat, aedist will cease development of the change if it detects the potential for a Trojan horse. These include...

  • Changing the project aegis.conf file. This file contains the build command and the difference commands, both of which would be run before a reviewer had a chance to confirm they were acceptable.

  • Changing any of the files named in the trojan_horse_suspect field of the project aegis.conf file. This lets you cover things like the build tool's configuration file (e.g. the Makefile or the cookbook), and any scripts or code generators which would be run by the build.

This isn't exhaustive protection, and it depends on keeping the trojan horse suspect list up-to-date. (It accepts patterns, so it's not too onerous.) For better protection, you need to validate the sender and the message.

6.2. PGP

PGP can be used to validate that the source of a change set distribution is really someone you trust.

anyone want to advise me what to put here?

6.3. Sorcerer's Apprentice

In a push system, with a central master server and a collection of contributors, all of which are using automatic send, as described above, a potential explosion of redundant messages is possible. Viz:

  • Contributor integrates a change, which is dispatched to the master server.

  • Maintainer integrates the change set into the master repository.

  • Master repository automatically dispatches the change set to all of the contributors.

  • Each of the contributors receives and integrates the change, each of which are dispatched to the master server.

  • The master server is inundated with change sets it already has.

  • If these change sets were to be integrated, the storm repeats, growing exponentially every time it goes around the loop.

To prevent this, aedist does several things...

  • Before the change is built, an aecpu --unchanged is run. If there is nothing left, the change is abandoned, because you already have it. (This doesn't always work, because propagation delays may try to reverse a subsequent local change.)

  • When a change set is sent, an RFC 822 style header is added to the description. This includes From and Date. When a change set is received, a Received line is added. Too many Received lines causes the change set to be dropped - for a star topology the maximum is 2. (This doesn't always work, because the description could be edited to rip it off again.) (This doesn't always work, because the maintainer may edit it in some ways before comitting it, and forget to rip off (enough of) the header.) (This doesn't always work, because hierarchical topologies will group change sets together.) (This doesn't always work, because a baseline pull will group change sets together.)

  • Set the description to indicate it was received by aedist? Use this to influence the decision to send it off again at integrate pass? How?

7. Patches

In the open source community, patches are common way of sharing enhancements to software. This was particularly common before the World Wide Web, and usenet was the more common medium of distribution. Patches also have the advantage of being fairly small and usually tansportable by email with few problems.

7.1. Send

If you are participating in an open source project, and using Aegis to manage your development, the aepatch -send command may be used to construct a patch to send to the other developers.

It is very similar in operation to the aedist(1) command, however it is intended for non-Aegis-using recipients.

To send a change to someone (a completed change, or one in progress) simply use a command such as

% aepatch -send -c number | \
  mail joe.blow@example.com
%

to send your change as a patch. Note that it will be compressed (using GNU Zip) and encoded (using MIME base 64), which produces small files which are going to survive email transport.

7.2. Receive

The simplest way of receiving a patch and turn it into a change set is to save it from your e-mail program into a file, and then

% aepatch -receive -file filename
...lots of information...
%

where filename is where you saved the e-mail message. If your e-mail program is able to write to a pipe, you can use a simpler form. This example uses the Rand Mail Handler's show(1) command:

% show | aepatch -receive
...lots of information...
%

Each of these examples assumes that you have already set the project name, either via aeuconf(5) or ae_p(1), or you could use the -project option.

The actions performed by aepatch on receive are not quite a mirror of what it does on send. In particular, send usually extracts its information from the repository, but receive does not put the change set directly into the repository.

On receipt of a change set, aepatch creates a new change with its own development directory, copies the files into it, and applies the patch to the files. The receiver must be an authorized developer.

Once the patch is applied, it goes through the usual development cycle of build, difference and test. If any portion of this fails, aepatch will stop with a suitable error message. If all goes well, development of the change will end, and it will be left in the being reviewed state.

At this point, a local reviewer must examine the change, and it proceeds through the change integration process as normal.

If there is a problem with the change, it can be dealt with as you would with any other defective change - by developing it some more. Or you can email the sender telling them the problem and use aedbu(1) and aencu(1) to entirely discard the change.

7.3. Limitations

Despite a great similarity of command line operations and operation, the aepatch command should not be thought of as an equivalent for the aedist command, or a replacement for it.

The information provided by aedist -send is sufficiently complete to recreate the change set at the remote end. No information is lost. In contrast, the aepatch -send command is limited to that information a patch file (see the patch(1) command, from the GNU Diff utils). There is no guarantee that the aepatch -send output will be given to aepatch -receive; it must work with patch(1), and similar tools.

Conversely, there is no guarantee that the input to aepatch -receive came from aepatch -send. It can and must be able to cope with the outout of a simple diff -r -N -c command, with no additional information.

All this means, use aedist wherever possible. The aepatch command is to simplify and streamline communication with non-Aegis developers.



[27] Summers, Rita C., Secure Computing Threats and Safeguards, McGraw-Hill, 1997.