Managing Update Conflicts in Bayou,

Managing Update Conflicts in Bayou,

a Weakly Connected Replicated Storage System

Douglas B. Terry, Marvin M. Theimer, Karin Petersen, Alan J. Demers,

Mike J. Spreitzer and Carl H. Hauser

Computer Science Laboratory

Xerox Palo Alto Research Center

Palo Alto, California 94304 U.S.A.

Abstract

Bayou is a replicated, weakly consmtent storage system designed for a mobile computing environment that includes porta-

ble machines with less than ideal network connectivity. To maxi- mize availabdity, users can read and write any accessible replica.

Bayou’s design has focused on supporting apphcation-specific mechanisms to detect and resolve the update conflicts that natu-

rally arise in such a system, ensuring that replicas move towards

eventual consistency, and defining a protocol by which the resolu- tion of update conflicts stabilizes, It includes novel methods for confhct detection, called dependency checks, and per-write con- flict resolution based on client-provided merge procedures. To

guarantee eventual consistency, Bayou servers must be able to roll- back the effects of previously executed writes and redo them according to a global serialization order. Furthermore, Bayou per-

mits clients to observe the results of all writes received by a server, mchrding tentative writes whose conflicts have not been ultimately resolved. This paper presents the motivation for and design of these mechanisms and describes the experiences gained with an

initial implementation of the system.

1. Introduction

The Bayou storage system prowdes an mfrastrtrcture for col-

laborative applications that manages the conflicts introduced by concurrent activity while relying only on the weak connectivity available for mobile computing. The advent of mobile computers,

in the form of laptops and personal digital assistants (PDAs) enables the use of computational facilities away from the usual work setting of users. However, mobile computers do not enjoy the

connectivity afforded by local area networks or the telephone sys- tem. Even wireless media, such as cellular telephony, will not per-

mit continuous connectivity until per-mmute costs decline enough

to justify lengthy connections. Thus, the Bayou design requires

only occasional, patr-wise communication between computers.

This model takes into consideration characteristics of mobile com-

puting such as expensive connection time, frequent or occasional disconnections, and that collaborating computers may never be all connected simultaneously [1, 13, 16].

The Bayou architecture does not include the notion of a “dis- connected” mode of operation because, in fact, various degrees of

Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notiea is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to recktribute to lists, requires prior specific permission aodlor a fee.

SIGOPS ’95 12/95 CO, USA 01995 ACM 0-89791-71 5-4/9510012…$3.50

“connectedness” are possible. Groups of computers may be pafii-

tioned away from the rest of the system yet remain connected to each other. Supporting disconnected workgroups is a central goal

of the Bayou system. By relying only on pair-wise communication

in the normal mode of operation, the Bayou design copes with arbitrary network connectivity.

A weak connectivity networking model can be accommodated only with weakly consistent, replicated data. Replication is

reqtured since a single storage site may not be reachable from

mobile clients or within disconnected workgroups. Weak consis-

tency is desired since any replication scheme providing one copy serializablhty [6], such as reqturing clients to access a quorum of replicas or to acquire exclusive locks on data that they wish to

update, yields unacceptably low write availability in par-btioned

networks [5]. For these reasons, Bayou adopts a model in which chents can read and write to any replica without the need for explicit coordination with other rephcas. Every computer eventu- ally receives updates from every other, either directly or indirectly, through a chain of pair-wise interactions.

Unhke many previous systems [12, 27], our goal m designing

the Bayou system was not to provide transparent rephcated data support for existing file system and database applications. We believe that applications must be aware that they may read weakly

consistent data and also that their wrrte operations may conflict with those of other users and applications. Moreover, applications

must be revolved m the detection and resolution of conflicts since

these naturally depend on the semantics of the application. To this end, Bayou provides system support for applicatlon-

specific confllct detection and resolution. Previous systems, such

as Locus [30] and Coda [17], have proven the value of semantic conflict detection and resolution for file directories, and several systems are exploring conflict resolution for file and database con- tents [8, 18, 26]. Bayou’s mechamsms extend this work by letting

applications exploit domain-specific knowledge to achieve auto-

matic conflict resolution at the granularity of individual update

operations without compromising security or eventual consistency. Automatic conflict resolution is highly desirable because lt

enables a Bayou replica to remam available. In a replicated system

with the weak connectiwt y model adopted by Bayou, conflicts may be detected arbitrarily far from the users who introduced the

conflicts. Moreover, conflicts may be detected when no user is present. Bayou does not take the approach of systems that mark conflicting data as unavadable until a person resolves the conflict. Instead, clients can read data at all times, including data whose

conflicts have not been fully resolved either because human inter- vention M needed or because other conflicting updates may be propagating through the system. Bayou provides interfaces that

make the state of a replica’s data apparent to the application. The contributions presented in this paper are as follows: we

introduce per-update dependency checks and merge procedures as

J.72

 

 

a general mechamsm for application-specific conflict detection and

resolution; we define two states of an update, committed and tenta-

tive, which relate to whether or not the conflicts potentially intro-

duced by the update have been ultimately resolved; we present

mechanisms for managing these two states of an update both from

the perspective of the clients and the storage management require-

ments of the replicas; we describe how replicas move towards

eventual consistency; and, finally, we discuss how security is pro- vided in a system like Bayou.

2. Bayou Applications

Tbe Bayou replicated storage system was designed to support a

variety of non-real-time collaborative applications, such as shared

calendars, mad and bibliographic databases, program develop- ment, and document editing for disconnected workgroups, as well

as applications that might be used by individuals at different hosts

at different times. To serve as a backdrop for the discussion in fol-

lowing sections, this section presents a quick overview of two applications that have been implemented thus far, a meeting room scheduler and a bibliographic database.

2.1 Meeting room scheduler

Our meeting room scheduling application enables users to reserve meeting rooms. At most one person (or group) can reserve

the room for any given period of time. This meeting room schedul- ing program M intended for use after a group of people have

already decided that they want to meet in a certain room and have determined a set of acceptable times for the meeting. It does not help them to determine a mutually agreeable place and time for the

meeting, it only allows them to reserve the room. Thus, it is a

much simpler application than one of general meeting scheduling. Users interact with a graphical interface for the schedule of a

room that indicates which times are already reserved, much like the display of a typical calendar manager. The meeting room

scheduling program periodically re-reads the room schedule and refreshes the user’s display. This refresh process enables the user to observe new entries added by other users. The user’s display might be out-of-date with respect to the confirmed reservations of

the room, for example when it is showing a local copy of the room

schedule on a disconnected laptop. Users reserve a time slot simply by selecting a free time period

and filling in a form describing the meeting that is being sched-

uled. Because the user’s display might be out-of-date, there is a

chance that the user could try to schedule a meeting at a time that was already reserved by someone else. To account for this possi-

bdity, users can select several acceptable meeting times rather than just one. At most one of the requested times will eventually be reserved.

A user’s reservation, rather than being immediately confirmed

(or rejected), may remain “tentative” for awhile. While tentative, a meeting may be rescheduled as other interfering reservations

become known. Tentative reservations are indicated as such on the display (by showing them grayed). The “outdatedness” of a calen-

dar does not prevent it from being useful, but simply increases the

likelihood that tentative room reservations will be rescheduled and

finally “committed” to less preferred meeting times. A group of users, although disconnected from the rest of the

system, can immediately see each other’s tentative room reserva-

tions if they are all connected to the same COPY of the meeting room schedule. If, instead, users are maintaining private copies on

their laptop computers, local communication between the machines will eventually synchronize all copies within the group.

2.2 Bibliographic database

Our second application allows users to cooperatively manage

databases of bibliographic entries. Users can add entries to a data-

base as they find papers in the library, in reference lists, via word

of mouth, or by other means. A user can freely read and write any

copy of the database, such as one that resides on his laptop. For the

most part, the database is append-only, though users occasionally

update entries to fix mistakes or add personal annotations.

As is common in bibliographic databases, each entry has a

unique, human-sensible key that is constructed by appending the

year in which the paper was published to the first author’s last

name and adding a character if necessary to distinguish between

multiple papers by the same author in the same year. Thus, the first paper by Jones et al, in 1995 might be identified as “Jones95” and subsequent papers as “Jones95b’, “Jones95c”, and so on,

An entry’s key 1s tentatively assigned when the entry is added.

A user must be aware that the assigned keys are only tentative and

may change when the entry is “committed.” In other words, a user

must be aware that other concurrent updaters could be trying to assign the same key to different entries. Only one entry can have

the key; the others will be assigned alternative keys by the system.

Thus, for example, if the user employs the tentatively assigned key

in some fashion, such as embedding it as a citation in a document, then he must also remember later to check that the key assigned when the entry was committed is in fact the expected one.

Because users can access inconsistent database copies, the same bibliograpblc entry may be concurrently added by different users with different keys. To the extent possible, the system detects

duplicates and merges their contents into a single entry with a sin-

gle key. Interestingly, this is an application where a user may choose to

operate in disconnected mode even if constant connectivity were

possible. Consider the case where a user is in a university library looking up some papers. He occasionally types bibliographic refer- ences into his laptop or PDA. He may spend hours m the library

but only enter a handful of references. He is not likely to want to keep a cellular phone connection open for the duration of his visit. Nor will he want to connect to the university’s local wireless net- work and subject himself to student hackers. He wdl more likely

be content to have his bibliographic entries integrated into his database stored by Bayou upon returning to his home or office.

3. Bayou’s Basic System Model

In the Bayou system, each data collection is replicated in full at

a number of servers. Applications running as clienrs interact with the servers through the Bayou application programming interface

(API), which is implemented as a client stub bound with the appli- cation. This API, as well as the underlying client-server RPC pro- tocol, supports two basic operations: Read and Wrife. Read operations permit queries over a data collection, while Write oper-

ations can insert, modify, and delete a number of data items in a collection. Figure 1 illustrates these components of the Bayou architecture. Note that a chent and a server may be co-resident on a

host, as would be typical of a laptop or PDA running in isolation.

Access to one server is sufficient for a client to perform useful work. The client can read the data held by that server and submit

Writes to the server. Once a Write is accepted by a server, the cli-

ent has no further responsibility for that Write. In particular, the client does not wait for the Write to propagate to other servers, In other words, Bayou presents a weakly consntent replication model with a read-uny/write-any style of access. Weakly consistent repli-

cation has been used previously for availability, simplicity and

scalability in a variety of systems [3, 7, 10, 12, 15, 19].

173

 

 

Anti-entropy

Figure 1. Bayou System Model

While individual Read and Write operations are performed at a

single server, clients need not confine themselves to interacting with a single server. Indeed, in a mobile computing environment,

switching bet ween servers is often desirable, and Bayou provides session guarantees to reduce client-observed inconsistencies when accessing different servers. The description of session guarantees has been presented elsewhere [29].

To support application-specific conflict detection and resolu-

tion, Bayou Writes must contain more than a typical file system write or database update. Along with the desired updates, a Bayou Write carries information that lets each server receiving the Write

decide if there is a conflict and if so, how to fix it. Each Bayou Write also contains a globally unique WriteID assigned by the server that first accepted the Write.

The storage system at each Bayou server conceptually consists of an ordered log of the Writes described above plus the data resulting from the execution of these Writes. Each server performs each Write locally with conflicts detected and resolved as they are

encountered during the execution. A server immediately makes the effects of all known Writes available for reading.

In keeping with the goal of requiring as little of the network as

possible, Bayou servers propagate Writes among themselves dur-

ing pair-wise contacts, called anti-entropy sessions [7]. The two

servers involved in a session exchange Write operations so that

when they are finished they agree on the set of Bayou Writes they

have seen and the order in which to perform them. The theory of epidemic algorithms assures that as long as the

set of servers is not permanently partitioned each Write will even-

tually reach all servers [7]. This holds even for communication patterns in which at most one pair of servers is ever connected at once. In the absence of new Writes from clients, all servers will eventually hold the same data. The rate at which servers reach con- vergence depends on a number of factors including network con- nectivity, the frequency of anti-entropy, and the policies by which

servers select anti-entropy partners. These policies may vary

according to the characteristics of the network, the data, and its servers. Developing optimal anti-entropy policies is a research topic in its own right and not further discussed in this paper.

4. Conflict Detection and Resolution

4.1 Accommodating application semantics

Supporting application-specific conflict detection and resolu- tion is a major emphasis in the Bayou design, A basic tenet of our

work is that storage systems must provide means for an application to specify its notion of a conflict along with its policy for resolving

conflicts. In return, the system implements the mechanisms for

reliably detecting conflicts, as specified by the application, and for

automatically resolving them when possible. This design goal fol- lows from the observation that different applications have different

notions of what it means for two updates to conflict, and that such conflicts cannot always be identified by simply observing conven- tional reads and writes submitted by the applications.

As an example of application-specific conflicts, consider the meeting room scheduling application discussed in Section 2.1. Observing updates at a coarse granularity, such as the whole-file level, the storage system might detect that two users have concur-

rently updated different replicas of the meeting room calendar and conclude that their updates conflict. Observing updates at a fine

granularity, such as the record level, the system might detect that

the two users have added independent records and thereby con-

clude that their updates do not conflict. Neither of these conclu-

sions are warranted. In fact, for this application, a conflict occurs

when two meetings scheduled for the same room overlap in time. Bibliographic databases provide another example of applica-

tion-specific conflicts. In this application, two bibliographic entries

conflict when either they describe different publications but have been assigned the same key by their submitters or else they describe the same publication and have been assigned distinct keys. Again, this definition of conflicting updates is specific to this

application. The steps taken to resolve conflicting updates once they have

been detected may also vary according to the semantics of the

application. In the case of the meeting room scheduling applica-

tion, one or more of a set of conflicting meetings may need to be

174

 

 

Bayou_Write (update, dependency_check, mergeproc) {

IF (DB_Eval (dependency_check. query) <> dependency_check.expected_result)

resolved_update = Interpret (mergeproc);

ELSE

resolved_update = update; DB_Apply (res~lved_up~ate);

}

Figure 2. Processing a Bayou Write Operation

Bayou_Write(

update = {insert, Meetings, 12/18/95, 1:30pm, 60min, “Budget Meeting”),

dependency_check = { query = “SELECT key FROM Meetings WHERE day= 12/18/95

AND start < 2:30pm AND end> l:30pm”,

expected_result = EMPTY},

mergeproc = {

alternates = {{12/18/95, 3:OOpm], {12/19/95, 9:30am]];

newupdate = {]; FOREACH a IN alternates {

# check fthere would be a conflict

IF (NOT EMPTY ( SELECT key FROM Meetings WHERE day = a.date AND start < a.time + 60min AND end > a.time))

CONTINUE; #no conflict, can schedule meeting at that time

newupdate = {insert, Meetings, a.date, a.time, 60min, “Budget Meeting”);

BREAK;

1 IF (newupdate = {]) #no alternate is acceptable

newupdate = {insert, ErrorLog, 12/18/95, 1:30pm, 60min, “Budget Meeting”];

) RETURN newupdate;}

Figure 3, A Bayou

moved to a different room or different time. In the bibliographic apphcation, an entry may need to be assigned a different unique key or two entries for the same publication may need to be merged

into one.

The Bayou system includes two mechanisms for automatic

conthct detection and resolution that are intended to support arbi-

trary applications: dependency checks and merge procedures. These mechanisms permit clients to indicate, for each individual

Write operation, how the system should detect conflicts involving

the Write and what steps should be taken to resolve any detected conflicts based on the semantics of the application. They were

designed to be flexlble since we expect that apphcations will differ appreciably in both the procedures used to handle conflicts, and, more generally, in their ability to deal with conflicts.

Techniques for semantic-based confhct detection and resolution have previously been incorporated into some systems to handle

special cases such as file directory updates. For example, the

Locus [30], FICUS [12], and Coda [17] distributed file systems all

include mechanisms for automatically resolving certain classes of

conflicting directory operations. More recently, some of these sys-

tems have also incorporated support for “resolver” programs that reduce the need for human intervention when rmnlvlng other types of file confllcts [18. 26]. Omcle’s symmetric rephcatlon product

also includes the notion of application-selected resolvers for rela- tional databases [8]. Other systems. llke Lotus Notes [ 15]. do not

Vrite Operation

provide application-specific mechanisms to handle conflicts, but rather create multiple versions of a document, file, or data object when conflicts arise. As wdl become apparent from the next cou-

ple of sections, Bayou’s dependency checks and merge procedures

are more general than these previous techniques.

4.2 Dependency checks

Application-specific conflict detection is accomplished in the

Bayou system through the use of dependency checks. Each Write

operation includes a dependency check consisting of an applica- tion-supplied query and its expected result. A conflict is detected if

the query, when run at a server against its current copy of the data, does not return the expected result. This dependency check 1s a precondition for performing the update that is mchtded in the

Write operation. If the check fails, then the requested update is not

performed and the server revokes a procedure to resolve the detected confllct as outlined in Figure 2 and discussed below.

As an example of apphcauon-detined conflicts, Figure 3 pre- sents a sample Bayou Write operat]on that might be submitted by

the meeting room scheduhng application. This Write ottempts to reserve an hour-long time slot. It includes a dependency check with a single query. written in an SQL-like language, that return>

information about any pre~ Iousl> reserved meetings th:~t o] erl:ip with this ume slot It expects the quer! to return m empt> wt

175

 

 

Bayou’s dependency checks, like the version vectors and times-

tamps traditionally used in distributed systems [12, 19, 25, 27], can

be used to detect Write-Write confhcts. That is, they can be used to

detect when two users update the same data item without one of them first observing the other’s update. Such conflicts can be

detected by having the dependency check query the current values of any data items being updated and ensure that they have not changed from the values they had at the time the Write was sub-

mitted, as M done in Oracle’s rephcated database [8]. Bayou’s dependency checking mechanism is more powerful

than the traditional use of version vectors since it can also be used

to detect Read-Write conflicts. Specifically, each Write operation

can explicitly specify the expected values of any data items on which the update depends, including data items that have been

read but are not being updated. Thtrs, Bayou chents can emulate

the optimistic style of concurrency control employed in some dis-

tributed database systems [4, 6]. For example, a Write operation that installs a new program binary file might only include a depen-

dency check of the sources, including version stamps, from which It was derived. Since the binary does not depend on lts previous value, this need not be included.

Moreover, because dependency queries can read any data in the

server’s replica, dependency checks can enforce arbitrary, muhl –

item integrity constraints on the data. For example, suppose a Write transfers $100 from account A to account B. The applica-

tion, before issuing the Write, reads the balance of account A and

discovers that it currently has $150. Traditional optimistic concur- rency control would check that account A still had $150 before

performing the requested Write operation. The real requirement, however, is that the account have at least $100, and this can easily be specified in the Write’s dependency check. Thus, only if con- current updates cause the balance in account A to drop below $100 will a conflict be detected.

4.3 Merge procedures

Once a conflict is detected, a merge procedure is run by the

Bayou server in an attempt to resolve the conflict. Merge proce-

dures, included with each Write operation, are general programs written m a high-level, interpreted language. They can have

embedded data, such as application-specific knowledge related to

the update that was being attempted, and can perform arbitrary Reads on the current state of the server’s replica. The merge proce-

dure associated with a Write M responsible for resolving any con- flicts detected by its dependency check and for producing a rewsed

update to apply. The complete process of detecting a conflict, nm- ning a merge procedure, and applying the revised update, shown in Figure 2, is performed atomically at each server as part of execut- ing a Write.

In principle, the algorrthm m Figure 2 could be imbedded in each merge procedure, thereby ehrninating any special mecha-

nisms for dependency checking. This approach would require

servers to create a new merge procedure interpreter to execute each Wrrte, which would be overly expensive. Supporting dependency

checks separately allows servers to avoid running the merge proce- dure in the expected case where the Write does not introduce a conflict.

The meeting room scheduling apphcatlon provides good exam-

ples of conflict resolution procedures that are specific not only to a

particular application but also to a patlcular Write operation. In this application, users, well aware that their reservations may be

invalidated by other concurrent users, can specify alternate sched-

uling choices as part of their original scheduling updates. These alternates are encoded in a merge procedure that attempts to

reserve one of the alternate meeting times if the original time is found to be in confhct with some other previously scheduled meet-

ing. An example of such a merge procedure is dlustrated m Figure

3. A different merge procedure altogether could search for the next

available time slot to schedule the meeting, which is an option a user might choose if any time would be satisfactory.

In practice, Bayou merge procedures are written by application

programmers m the form of templates that are instantiated with the appropriate details filled in for each Write. The users of apphca- tions do not have to know about merge procedures, and therefore

about the internal workings of the applications they use, except when automatic confhct resolution cannot be done.

In the case where automatic resolution is not possible, the

merge procedure wdl still run to completion, but is expected to produce a revised update that logs the detected conflict in some

fashion that will enable a person to resolve the conflict later. To

enable manual resolution, perhaps using an interactive merge tool

[22], the conflicting updates must be presented to a user in a man-

ner that allows him to understand what has happened. By conven- tion, most Bayou data collections include an error log for

unresolvable conflicts. Such conventions, however, are outside the domain of the Bayou storage system and may vary according to

the application. In contrast to systems like Coda [18] or Ficus [26] that lock

individual files or complete file volumes when contlcts have been

detected but not yet resolved, Bayou allows replicas to always remain accessible. This permits chents to continue to Read previ- ously written data and to continue to issue new Wrttes. In the

meeting room scheduling application, for example, a user who

only cares about Monday meetings need not concern himself with scheduling conflicts on Wednesday. Of course, the potential draw-

back of this approach is that newly issued Writes may depend on data that M in conflict and may lead to cascaded conflict resolution.

Bayou’s merge procedures resemble the previously mentioned

resolver programs, for which support has been added to a number

of replicated file systems [18, 26]. In these systems, a file-type- specific resolver program is run when a version vector mismatch is detected for a file. This program ]s presented with both the current and proposed tile contents and it can do whatever it wishes in order

to resolve the detected conflict. An example is a resolver program

for a binary file that checks to see if it can find a specltication for how to derive the file from its sources, such as a Unix makefile,

and then recompiles the program in order to obtain a new, “resolved” value for the file. Merge procedures are more general since they can vary for individual Write operations rather than

being associated with the type of the updated data, as illustrated

above for the meeting room scheduling application.

5. Replica Consistency

While the replicas held by two servers at any time may vary in

their contents because they have received and processed d] fferent Writes, a fundamental property of the Bayou design is that all serv-

ers move towards eventual consistency That is, the Bayou system guarantees that all servers eventually receive all Writes wa the pair-wise anti-entropy process and that two servers holding the

same set of Writes will have the same data contents, However, it cannot enforce strict bounds on Write propagation delays since these depend on network connectivity factors that are outside of

Bayou’s control. Two important features of the Bayou system design allows

servers to achieve eventual consistency. First, Writes are per- formed in the same, well-defined order at all servers. Second, the

conflict detection and merge procedures are deterministic so that servers resolve the same conflicts in the same manner.

as

In theory, the execution history at individual servers could vary

long as them execution was equivalent to some global Write

 

 

ordering. For example, Writes known to be commutative could be

performed in any order. In practice, because Bayou’s Write opera-

tions include arbitrary merge procedures, it is effectively impossi-

ble either to determine whether two Writes commute or to

transform two Writes so they can be reordered as has been sug-

gested for some systems [9].

When a Write is accepted by a Bayou server from a client, it is

initially deemed tentative. Tentative Writes are ordered according

to timestamps assigned to them by their accepting servers. Eventu- ally, each Write is committed, by a process described in the next

section. Committed Writes are ordered according to the times at

which they commit and before any tentative Writes.

The only requirement placed on timestamps for tentative Writes is that they be monotonically increasing at each server so

that the pair <timestamp, ID of server that assigned it> produce a total order on Write operations. There is no requirement that serv-

ers have synchronized clocks, which is crucial since trying to

ensure clock synchronization across portable computers is prob-

lematic. However, keeping servers’ clocks reasonably close is

desirable so that the induced Write order is consistent with a user’s

perception of the order in which Writes are submitted. Bayou serv- ers maintain logical clocks [20] to timestamp new Writes. A

server’s logical clock is generally synchronized with lts real-time

system clock, but, to preserve the causal ordering of Write opera-

tions, the server may need to advance its logical clock when Writes

are received during anh-entropy. Enforcing a global order on tentative, as well as committed,

Writes ensures that an isolated cluster of servers will come to

agreement on the tentative resolution of any conflicts that they

encounter. While this is not strictly necessary since clients must be prepared to deal with temporarily inconsistent servers in any case,

we believe it desirable to provide as much internal consistency as

possible. Moreover, clients can expect that the tentative resolution

of conflicts within their cluster will correspond to their eventual permanent resolution, provided that no further conflicts are intro- duced outside the cluster.

Because servers may receive Writes from clients and from other servers in an order that differs from the required execution

order, and because servers immediately apply all known Writes to their replicas, servers must be able to undo the effects of some pre- vious tentative execution of a Write operation and reapply it in a

different order. Interestingly, the number of times that a given Write operation is re-executed depends only on the order in which

Writes arrive via anti-entropy and not on the likelihood of conflicts

involving the Write.

Conceptually, each server maintains a log of all Write opera- tions that it has received, sorted by their committed or tentative timestamps, with committed Writes at the head of the log. The

server’s current data contents are generated by executing all of the Writes in the given order. Techniques for pruning a server’s Write

log and for efficiently maintaining the corresponding data contents

by undoing and redoing Write operations are given in Section 7.

Bayou guarantees that merge procedures, which execute inde- pendently at each server, produce consistent updates by restricting

them to depend only on the server’s current data contents and on any data supplied by the merge procedure itself. In particular, a

merge procedure cannot access time-varying or server-specific “environment” information such as the current system clock or

server’s name. Moreover, merge procedures that fail due to

exceeding their limits on resource usage must fail deterministi-

tally. This means that all servers must place uniform bounds on the CPU and memory resources allocated to a merge procedure and must consistently enforce these bounds during execution. Once these conditions are met, two servers that start with identical repli- cas will end up with identical rephcas after executing a Write.

6. Write Stability and Commitment

A Write is said to be stable at a server when it has been exe-

cuted for the last time by that server. Recall that as servers learn of

new updates by performing anti-entropy with other servers, the

effects of previously executed Write operations may need to be

undone and the Writes re-executed. Thus, a given Write operation

may be executed several times at a server and may produce differ-

ent results depending on the execution history of the server. A Write operation becomes stable when the set of Writes that precede

it in the server’s Write log is fixed. This means that the server has

already received and executed any Writes that could possibly be

ordered before the given Write. Bayou’s notion of stability is simi- lar to that in ordered mtdticast protocols, such as those provided in

the 1S1S toolkit [2]. In many cases, an application can be designed with a notion of

“confirmation” or “commitment” that corresponds to the Bayou

notion of stability. As an example, in the Bayou meeting room

scheduling application, two users may try to schedule separate

meetings for the same time in the same room. Only when one of

the users discovers that his Write has become stable and his sched- ule still shows that he has reserved the room for the desired time,

can he be sure that his tentative reservation has been confirmed. Since clients may want to know when a Write has stabilized,

the Bayou API provides means for inquiring about the stability of

a specific Write. Given a Write’s unique identifier, a client can ask a server whether the given Write is stable at the server. The answer may vary, of course, depending on which server is contacted.

Bayou also provides support for clients that may choose to access

only stable data. How does a server determine whether a Write is stable? One

approach would be to have each server include in the information passed during anti-entropy not only any Writes that have been

accepted by this server but also the current value of the clock that it uses to timestamp new Writes. With suitable assumptions about

the propagation order of Writes, a server could then determine that a Write is stable when it has a lower timestamp than all servers’ clocks. The main drawback of this approach is that a server that

remains disconnected can prevent Writes from stabilizing, which could cause a large number of Writes to be rolled back when the server reconnects.

To speed up the rate at which updates stabihze in an environ-

ment where communication with some servers may not be possible

for extended periods of time, the Bayou system uses a commit pro-

cedure. That is, a Write becomes stable when it is explicitly com-

mitted, and, in fact, we generally use the terms “stable” and “committed” interchangeably in the Bayou system. Committed

Writes, in commit order, are placed ahead of any tentative Writes

in each server’s Write log. This, along with Bayou’s anti-entropy protocol ensuring that servers learn of committed Writes in the

order that they were committed, provides stability. In the Bayou system, we use a primary commir scheme [28].

That M, one server designated as the primary takes responsibihty

for committing updates. Knowledge of which Writes have com- mitted and in which order they were committed then propagates to other servers during anti-entropy. In all other respects, the primary

behaves exactly like any other server. Each replicated data collec- tion can have a different server designated as its primary.

Any commit protocol that prevents different groups of servers

from committing updates in different orders would meet Bayou’s

needs. In our anticipated weak connectivity environment, using a primary to commit data is more attractive than the standard two- phase commit protocol since it alleviates the need to gather a

majority quorum of servers. Consider the case of data that is repli-

cated among laptops that are mostly disconnected. Requiring a majority of these laptops to be in communication with each other

177

 

 

Timestamp Vectors Write Log ,..

~. In Memory I /’ On Stable Storage ,

Figure 4. Bayou Database Organization

at the same time in order to commit updates would be unreason-

able.

The primary commit approach also enables updates to commit

on a disconnected laptop that acts as the primary server. For exam-

ple, suppose a user keeps the primary copy of his calendar with him on his laptop and allows others, such as a spouse or secretary, to keep secondary, mostly read-only copies. In this case, the user’s updates to his own calendar commit immediately. This example illustrates how one might choose the primary to coincide with the locus of update activity, thereby maximizing the rate at which Writes get committed.

Unlike other distributed storage systems in which the ability to commit data is of primary importance, the Bayou design readily

accommodates the temporary unavailability of the primary. The inability of a client to communicate with the primary server, for

instance if the primary crashes or is disconnected, does not prevent it from performing useful Read and Write operations. Writes

accepted by other servers simply remain tentative until they even- tually reach the primary.

Bayou tries to arrange, but cannot ensure, that the order in

which Writes are committed is consistent with the tentative order indicated by their timestamps. Writes from a given server are com-

mitted in timestamp order. Writes from different servers, however,

may commit in a different order based on when the servers per- form anti-entropy with the primary and with each other. Writes held on a disconnected non-primary server, for instance, will com-

mit only after the server reconnects to the rest of the system and could be committed after Writes with later timestamps.

7. Storage System Implementation Issues

The Bayou design places several demands on the underlying

storage system used by each server including the need for space-

efficlent Write logging, efficient undokedo of Write operations, separate views of committed and tentative data, and support for server-to-server anti-entropy. We implemented a storage system

tadored to these special needs. Our implementation is factored into three main components as

shown in Figure 4: the Write Log, the Tuple Store, and the Undo

Log. The Write Log contains Writes that have been received by a Bayou server, sorted by their global committed or tentative order. The server’s Tuple Store is a database that is obtained by executing

the Writes in order and is used to process Read requests. The Undo Log facilitates rolling back tentative Writes that have been applied

You didn't find what you were looking for? Upload your specific requirements now and relax as your preferred tutor delivers a top quality customized paper

Order Now