Managing Update Conflicts in Bayou,

Managing Update Conflicts in Bayou,

a Weakly Connected Replicated Storage System

Douglas B. Terry, Marvin M. Theimer, Karin Petersen, Alan J. Demers,

Mike J. Spreitzer and Carl H. Hauser

Computer Science Laboratory

Xerox Palo Alto Research Center

Palo Alto, California 94304 U.S.A.

Abstract

Bayou is a replicated, weakly consmtent storage system designed for a mobile computing environment that includes porta-

ble machines with less than ideal network connectivity. To maxi- mize availabdity, users can read and write any accessible replica.

Bayou’s design has focused on supporting apphcation-specific mechanisms to detect and resolve the update conflicts that natu-

rally arise in such a system, ensuring that replicas move towards

eventual consistency, and defining a protocol by which the resolu- tion of update conflicts stabilizes, It includes novel methods for confhct detection, called dependency checks, and per-write con- flict resolution based on client-provided merge procedures. To

guarantee eventual consistency, Bayou servers must be able to roll- back the effects of previously executed writes and redo them according to a global serialization order. Furthermore, Bayou per-

mits clients to observe the results of all writes received by a server, mchrding tentative writes whose conflicts have not been ultimately resolved. This paper presents the motivation for and design of these mechanisms and describes the experiences gained with an

initial implementation of the system.

1. Introduction

The Bayou storage system prowdes an mfrastrtrcture for col-

laborative applications that manages the conflicts introduced by concurrent activity while relying only on the weak connectivity available for mobile computing. The advent of mobile computers,

in the form of laptops and personal digital assistants (PDAs) enables the use of computational facilities away from the usual work setting of users. However, mobile computers do not enjoy the

connectivity afforded by local area networks or the telephone sys- tem. Even wireless media, such as cellular telephony, will not per-

mit continuous connectivity until per-mmute costs decline enough

to justify lengthy connections. Thus, the Bayou design requires

only occasional, patr-wise communication between computers.

This model takes into consideration characteristics of mobile com-

puting such as expensive connection time, frequent or occasional disconnections, and that collaborating computers may never be all connected simultaneously [1, 13, 16].

The Bayou architecture does not include the notion of a “dis- connected” mode of operation because, in fact, various degrees of

Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notiea is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to recktribute to lists, requires prior specific permission aodlor a fee.

SIGOPS ’95 12/95 CO, USA 01995 ACM 0-89791-71 5-4/9510012…$3.50

“connectedness” are possible. Groups of computers may be pafii-

tioned away from the rest of the system yet remain connected to each other. Supporting disconnected workgroups is a central goal

of the Bayou system. By relying only on pair-wise communication

in the normal mode of operation, the Bayou design copes with arbitrary network connectivity.

A weak connectivity networking model can be accommodated only with weakly consistent, replicated data. Replication is

reqtured since a single storage site may not be reachable from

mobile clients or within disconnected workgroups. Weak consis-

tency is desired since any replication scheme providing one copy serializablhty [6], such as reqturing clients to access a quorum of replicas or to acquire exclusive locks on data that they wish to

update, yields unacceptably low write availability in par-btioned

networks [5]. For these reasons, Bayou adopts a model in which chents can read and write to any replica without the need for explicit coordination with other rephcas. Every computer eventu- ally receives updates from every other, either directly or indirectly, through a chain of pair-wise interactions.

Unhke many previous systems [12, 27], our goal m designing

the Bayou system was not to provide transparent rephcated data support for existing file system and database applications. We believe that applications must be aware that they may read weakly

consistent data and also that their wrrte operations may conflict with those of other users and applications. Moreover, applications

must be revolved m the detection and resolution of conflicts since

these naturally depend on the semantics of the application. To this end, Bayou provides system support for applicatlon-

specific confllct detection and resolution. Previous systems, such

as Locus [30] and Coda [17], have proven the value of semantic conflict detection and resolution for file directories, and several systems are exploring conflict resolution for file and database con- tents [8, 18, 26]. Bayou’s mechamsms extend this work by letting

applications exploit domain-specific knowledge to achieve auto-

matic conflict resolution at the granularity of individual update

operations without compromising security or eventual consistency. Automatic conflict resolution is highly desirable because lt

enables a Bayou replica to remam available. In a replicated system

with the weak connectiwt y model adopted by Bayou, conflicts may be detected arbitrarily far from the users who introduced the

conflicts. Moreover, conflicts may be detected when no user is present. Bayou does not take the approach of systems that mark conflicting data as unavadable until a person resolves the conflict. Instead, clients can read data at all times, including data whose

conflicts have not been fully resolved either because human inter- vention M needed or because other conflicting updates may be propagating through the system. Bayou provides interfaces that

make the state of a replica’s data apparent to the application. The contributions presented in this paper are as follows: we

introduce per-update dependency checks and merge procedures as

J.72

 

 

a general mechamsm for application-specific conflict detection and

resolution; we define two states of an update, committed and tenta-

tive, which relate to whether or not the conflicts potentially intro-

duced by the update have been ultimately resolved; we present

mechanisms for managing these two states of an update both from

the perspective of the clients and the storage management require-

ments of the replicas; we describe how replicas move towards

eventual consistency; and, finally, we discuss how security is pro- vided in a system like Bayou.

2. Bayou Applications

Tbe Bayou replicated storage system was designed to support a

variety of non-real-time collaborative applications, such as shared

calendars, mad and bibliographic databases, program develop- ment, and document editing for disconnected workgroups, as well

as applications that might be used by individuals at different hosts

at different times. To serve as a backdrop for the discussion in fol-

lowing sections, this section presents a quick overview of two applications that have been implemented thus far, a meeting room scheduler and a bibliographic database.

2.1 Meeting room scheduler

Our meeting room scheduling application enables users to reserve meeting rooms. At most one person (or group) can reserve

the room for any given period of time. This meeting room schedul- ing program M intended for use after a group of people have

already decided that they want to meet in a certain room and have determined a set of acceptable times for the meeting. It does not help them to determine a mutually agreeable place and time for the

meeting, it only allows them to reserve the room. Thus, it is a

much simpler application than one of general meeting scheduling. Users interact with a graphical interface for the schedule of a

room that indicates which times are already reserved, much like the display of a typical calendar manager. The meeting room

scheduling program periodically re-reads the room schedule and refreshes the user’s display. This refresh process enables the user to observe new entries added by other users. The user’s display might be out-of-date with respect to the confirmed reservations of

the room, for example when it is showing a local copy of the room

schedule on a disconnected laptop. Users reserve a time slot simply by selecting a free time period

and filling in a form describing the meeting that is being sched-

uled. Because the user’s display might be out-of-date, there is a

chance that the user could try to schedule a meeting at a time that was already reserved by someone else. To account for this possi-

bdity, users can select several acceptable meeting times rather than just one. At most one of the requested times will eventually be reserved.

A user’s reservation, rather than being immediately confirmed

(or rejected), may remain “tentative” for awhile. While tentative, a meeting may be rescheduled as other interfering reservations

become known. Tentative reservations are indicated as such on the display (by showing them grayed). The “outdatedness” of a calen-

dar does not prevent it from being useful, but simply increases the

likelihood that tentative room reservations will be rescheduled and

finally “committed” to less preferred meeting times. A group of users, although disconnected from the rest of the

system, can immediately see each other’s tentative room reserva-

tions if they are all connected to the same COPY of the meeting room schedule. If, instead, users are maintaining private copies on

their laptop computers, local communication between the machines will eventually synchronize all copies within the group.

2.2 Bibliographic database

Our second application allows users to cooperatively manage

databases of bibliographic entries. Users can add entries to a data-

base as they find papers in the library, in reference lists, via word

of mouth, or by other means. A user can freely read and write any

copy of the database, such as one that resides on his laptop. For the

most part, the database is append-only, though users occasionally

update entries to fix mistakes or add personal annotations.

As is common in bibliographic databases, each entry has a

unique, human-sensible key that is constructed by appending the

year in which the paper was published to the first author’s last

name and adding a character if necessary to distinguish between

multiple papers by the same author in the same year. Thus, the first paper by Jones et al, in 1995 might be identified as “Jones95” and subsequent papers as “Jones95b’, “Jones95c”, and so on,

An entry’s key 1s tentatively assigned when the entry is added.

A user must be aware that the assigned keys are only tentative and

may change when the entry is “committed.” In other words, a user

must be aware that other concurrent updaters could be trying to assign the same key to different entries. Only one entry can have

the key; the others will be assigned alternative keys by the system.

Thus, for example, if the user employs the tentatively assigned key

in some fashion, such as embedding it as a citation in a document, then he must also remember later to check that the key assigned when the entry was committed is in fact the expected one.

Because users can access inconsistent database copies, the same bibliograpblc entry may be concurrently added by different users with different keys. To the extent possible, the system detects

duplicates and merges their contents into a single entry with a sin-

gle key. Interestingly, this is an application where a user may choose to

operate in disconnected mode even if constant connectivity were

possible. Consider the case where a user is in a university library looking up some papers. He occasionally types bibliographic refer- ences into his laptop or PDA. He may spend hours m the library

but only enter a handful of references. He is not likely to want to keep a cellular phone connection open for the duration of his visit. Nor will he want to connect to the university’s local wireless net- work and subject himself to student hackers. He wdl more likely

be content to have his bibliographic entries integrated into his database stored by Bayou upon returning to his home or office.

3. Bayou’s Basic System Model

In the Bayou system, each data collection is replicated in full at

a number of servers. Applications running as clienrs interact with the servers through the Bayou application programming interface

(API), which is implemented as a client stub bound with the appli- cation. This API, as well as the underlying client-server RPC pro- tocol, supports two basic operations: Read and Wrife. Read operations permit queries over a data collection, while Write oper-

ations can insert, modify, and delete a number of data items in a collection. Figure 1 illustrates these components of the Bayou architecture. Note that a chent and a server may be co-resident on a

host, as would be typical of a laptop or PDA running in isolation.

Access to one server is sufficient for a client to perform useful work. The client can read the data held by that server and submit

Writes to the server. Once a Write is accepted by a server, the cli-

ent has no further responsibility for that Write. In particular, the client does not wait for the Write to propagate to other servers, In other words, Bayou presents a weakly consntent replication model with a read-uny/write-any style of access. Weakly consistent repli-

cation has been used previously for availability, simplicity and

scalability in a variety of systems [3, 7, 10, 12, 15, 19].

173

 

 

Anti-entropy

Figure 1. Bayou System Model

While individual Read and Write operations are performed at a

single server, clients need not confine themselves to interacting with a single server. Indeed, in a mobile computing environment,

switching bet ween servers is often desirable, and Bayou provides session guarantees to reduce client-observed inconsistencies when accessing different servers. The description of session guarantees has been presented elsewhere [29].

To support application-specific conflict detection and resolu-

tion, Bayou Writes must contain more than a typical file system write or database update. Along with the desired updates, a Bayou Write carries information that lets each server receiving the Write

decide if there is a conflict and if so, how to fix it. Each Bayou Write also contains a globally unique WriteID assigned by the server that first accepted the Write.

The storage system at each Bayou server conceptually consists of an ordered log of the Writes described above plus the data resulting from the execution of these Writes. Each server performs each Write locally with conflicts detected and resolved as they are

encountered during the execution. A server immediately makes the effects of all known Writes available for reading.

In keeping with the goal of requiring as little of the network as

possible, Bayou servers propagate Writes among themselves dur-

ing pair-wise contacts, called anti-entropy sessions [7]. The two

servers involved in a session exchange Write operations so that

when they are finished they agree on the set of Bayou Writes they

have seen and the order in which to perform them. The theory of epidemic algorithms assures that as long as the

set of servers is not permanently partitioned each Write will even-

tually reach all servers [7]. This holds even for communication patterns in which at most one pair of servers is ever connected at once. In the absence of new Writes from clients, all servers will eventually hold the same data. The rate at which servers reach con- vergence depends on a number of factors including network con- nectivity, the frequency of anti-entropy, and the policies by which

servers select anti-entropy partners. These policies may vary

according to the characteristics of the network, the data, and its servers. Developing optimal anti-entropy policies is a research topic in its own right and not further discussed in this paper.

4. Conflict Detection and Resolution

4.1 Accommodating application semantics

Supporting application-specific conflict detection and resolu- tion is a major emphasis in the Bayou design, A basic tenet of our

work is that storage systems must provide means for an application to specify its notion of a conflict along with its policy for resolving

conflicts. In return, the system implements the mechanisms for

reliably detecting conflicts, as specified by the application, and for

automatically resolving them when possible. This design goal fol- lows from the observation that different applications have different

notions of what it means for two updates to conflict, and that such conflicts cannot always be identified by simply observing conven- tional reads and writes submitted by the applications.

As an example of application-specific conflicts, consider the meeting room scheduling application discussed in Section 2.1. Observing updates at a coarse granularity, such as the whole-file level, the storage system might detect that two users have concur-

rently updated different replicas of the meeting room calendar and conclude that their updates conflict. Observing updates at a fine

granularity, such as the record level, the system might detect that

the two users have added independent records and thereby con-

clude that their updates do not conflict. Neither of these conclu-

sions are warranted. In fact, for this application, a conflict occurs

when two meetings scheduled for the same room overlap in time. Bibliographic databases provide another example of applica-

tion-specific conflicts. In this application, two bibliographic entries

conflict when either they describe different publications but have been assigned the same key by their submitters or else they describe the same publication and have been assigned distinct keys. Again, this definition of conflicting updates is specific to this

application. The steps taken to resolve conflicting updates once they have

been detected may also vary according to the semantics of the

application. In the case of the meeting room scheduling applica-

tion, one or more of a set of conflicting meetings may need to be

174

 

 

Bayou_Write (update, dependency_check, mergeproc) {

IF (DB_Eval (dependency_check. query) <> dependency_check.expected_result)

resolved_update = Interpret (mergeproc);

ELSE

resolved_update = update; DB_Apply (res~lved_up~ate);

}

Figure 2. Processing a Bayou Write Operation

Bayou_Write(

update = {insert, Meetings, 12/18/95, 1:30pm, 60min, “Budget Meeting”),

dependency_check = { query = “SELECT key FROM Meetings WHERE day= 12/18/95

AND start < 2:30pm AND end> l:30pm”,

expected_result = EMPTY},

mergeproc = {

alternates = {{12/18/95, 3:OOpm], {12/19/95, 9:30am]];

newupdate = {]; FOREACH a IN alternates {

# check fthere would be a conflict

IF (NOT EMPTY ( SELECT key FROM Meetings WHERE day = a.date AND start < a.time + 60min AND end > a.time))

CONTINUE; #no conflict, can schedule meeting at that time

newupdate = {insert, Meetings, a.date, a.time, 60min, “Budget Meeting”);

BREAK;

1 IF (newupdate = {]) #no alternate is acceptable

newupdate = {insert, ErrorLog, 12/18/95, 1:30pm, 60min, “Budget Meeting”];

) RETURN newupdate;}

Figure 3, A Bayou

moved to a different room or different time. In the bibliographic apphcation, an entry may need to be assigned a different unique key or two entries for the same publication may need to be merged

into one.

The Bayou system includes two mechanisms for automatic

conthct detection and resolution that are intended to support arbi-

trary applications: dependency checks and merge procedures. These mechanisms permit clients to indicate, for each individual

Write operation, how the system should detect conflicts involving

the Write and what steps should be taken to resolve any detected conflicts based on the semantics of the application. They were

designed to be flexlble since we expect that apphcations will differ appreciably in both the procedures used to handle conflicts, and, more generally, in their ability to deal with conflicts.

Techniques for semantic-based confhct detection and resolution have previously been incorporated into some systems to handle

special cases such as file directory updates. For example, the

Locus [30], FICUS [12], and Coda [17] distributed file systems all

include mechanisms for automatically resolving certain classes of

conflicting directory operations. More recently, some of these sys-

tems have also incorporated support for “resolver” programs that reduce the need for human intervention when rmnlvlng other types of file confllcts [18. 26]. Omcle’s symmetric rephcatlon product

also includes the notion of application-selected resolvers for rela- tional databases [8]. Other systems. llke Lotus Notes [ 15]. do not

Vrite Operation

provide application-specific mechanisms to handle conflicts, but rather create multiple versions of a document, file, or data object when conflicts arise. As wdl become apparent from the next cou-

ple of sections, Bayou’s dependency checks and merge procedures

are more general than these previous techniques.

4.2 Dependency checks

Application-specific conflict detection is accomplished in the

Bayou system through the use of dependency checks. Each Write

operation includes a dependency check consisting of an applica- tion-supplied query and its expected result. A conflict is detected if

the query, when run at a server against its current copy of the data, does not return the expected result. This dependency check 1s a precondition for performing the update that is mchtded in the

Write operation. If the check fails, then the requested update is not

performed and the server revokes a procedure to resolve the detected confllct as outlined in Figure 2 and discussed below.

As an example of apphcauon-detined conflicts, Figure 3 pre- sents a sample Bayou Write operat]on that might be submitted by

the meeting room scheduhng application. This Write ottempts to reserve an hour-long time slot. It includes a dependency check with a single query. written in an SQL-like language, that return>

information about any pre~ Iousl> reserved meetings th:~t o] erl:ip with this ume slot It expects the quer! to return m empt> wt

175

 

 

Bayou’s dependency checks, like the version vectors and times-

tamps traditionally used in distributed systems [12, 19, 25, 27], can

be used to detect Write-Write confhcts. That is, they can be used to

detect when two users update the same data item without one of them first observing the other’s update. Such conflicts can be

detected by having the dependency check query the current values of any data items being updated and ensure that they have not changed from the values they had at the time the Write was sub-

mitted, as M done in Oracle’s rephcated database [8]. Bayou’s dependency checking mechanism is more powerful

than the traditional use of version vectors since it can also be used

to detect Read-Write conflicts. Specifically, each Write operation

can explicitly specify the expected values of any data items on which the update depends, including data items that have been

read but are not being updated. Thtrs, Bayou chents can emulate

the optimistic style of concurrency control employed in some dis-

tributed database systems [4, 6]. For example, a Write operation that installs a new program binary file might only include a depen-

dency check of the sources, including version stamps, from which It was derived. Since the binary does not depend on lts previous value, this need not be included.

Moreover, because dependency queries can read any data in the

server’s replica, dependency checks can enforce arbitrary, muhl –

item integrity constraints on the data. For example, suppose a Write transfers $100 from account A to account B. The applica-

tion, before issuing the Write, reads the balance of account A and

discovers that it currently has $150. Traditional optimistic concur- rency control would check that account A still had $150 before

performing the requested Write operation. The real requirement, however, is that the account have at least $100, and this can easily be specified in the Write’s dependency check. Thus, only if con- current updates cause the balance in account A to drop below $100 will a conflict be detected.

4.3 Merge procedures

Once a conflict is detected, a merge procedure is run by the

Bayou server in an attempt to resolve the conflict. Merge proce-

dures, included with each Write operation, are general programs written m a high-level, interpreted language. They can have

embedded data, such as application-specific knowledge related to

the update that was being attempted, and can perform arbitrary Reads on the current state of the server’s replica. The merge proce-

dure associated with a Write M responsible for resolving any con- flicts detected by its dependency check and for producing a rewsed

update to apply. The complete process of detecting a conflict, nm- ning a merge procedure, and applying the revised update, shown in Figure 2, is performed atomically at each server as part of execut- ing a Write.

In principle, the algorrthm m Figure 2 could be imbedded in each merge procedure, thereby ehrninating any special mecha-

nisms for dependency checking. This approach would require

servers to create a new merge procedure interpreter to execute each Wrrte, which would be overly expensive. Supporting dependency

checks separately allows servers to avoid running the merge proce- dure in the expected case where the Write does not introduce a conflict.

The meeting room scheduling apphcatlon provides good exam-

ples of conflict resolution procedures that are specific not only to a

particular application but also to a patlcular Write operation. In this application, users, well aware that their reservations may be

invalidated by other concurrent users, can specify alternate sched-

uling choices as part of their original scheduling updates. These alternates are encoded in a merge procedure that attempts to

reserve one of the alternate meeting times if the original time is found to be in confhct with some other previously scheduled meet-

ing. An example of such a merge procedure is dlustrated m Figure

3. A different merge procedure altogether could search for the next

available time slot to schedule the meeting, which is an option a user might choose if any time would be satisfactory.

In practice, Bayou merge procedures are written by application

programmers m the form of templates that are instantiated with the appropriate details filled in for each Write. The users of apphca- tions do not have to know about merge procedures, and therefore

about the internal workings of the applications they use, except when automatic confhct resolution cannot be done.

In the case where automatic resolution is not possible, the

merge procedure wdl still run to completion, but is expected to produce a revised update that logs the detected conflict in some

fashion that will enable a person to resolve the conflict later. To

enable manual resolution, perhaps using an interactive merge tool

[22], the conflicting updates must be presented to a user in a man-

ner that allows him to understand what has happened. By conven- tion, most Bayou data collections include an error log for

unresolvable conflicts. Such conventions, however, are outside the domain of the Bayou storage system and may vary according to

the application. In contrast to systems like Coda [18] or Ficus [26] that lock

individual files or complete file volumes when contlcts have been

detected but not yet resolved, Bayou allows replicas to always remain accessible. This permits chents to continue to Read previ- ously written data and to continue to issue new Wrttes. In the

meeting room scheduling application, for example, a user who

only cares about Monday meetings need not concern himself with scheduling conflicts on Wednesday. Of course, the potential draw-

back of this approach is that newly issued Writes may depend on data that M in conflict and may lead to cascaded conflict resolution.

Bayou’s merge procedures resemble the previously mentioned

resolver programs, for which support has been added to a number

of replicated file systems [18, 26]. In these systems, a file-type- specific resolver program is run when a version vector mismatch is detected for a file. This program ]s presented with both the current and proposed tile contents and it can do whatever it wishes in order

to resolve the detected conflict. An example is a resolver program

for a binary file that checks to see if it can find a specltication for how to derive the file from its sources, such as a Unix makefile,

and then recompiles the program in order to obtain a new, “resolved” value for the file. Merge procedures are more general since they can vary for individual Write operations rather than

being associated with the type of the updated data, as illustrated

above for the meeting room scheduling application.

5. Replica Consistency

While the replicas held by two servers at any time may vary in

their contents because they have received and processed d] fferent Writes, a fundamental property of the Bayou design is that all serv-

ers move towards eventual consistency That is, the Bayou system guarantees that all servers eventually receive all Writes wa the pair-wise anti-entropy process and that two servers holding the

same set of Writes will have the same data contents, However, it cannot enforce strict bounds on Write propagation delays since these depend on network connectivity factors that are outside of

Bayou’s control. Two important features of the Bayou system design allows

servers to achieve eventual consistency. First, Writes are per- formed in the same, well-defined order at all servers. Second, the

conflict detection and merge procedures are deterministic so that servers resolve the same conflicts in the same manner.

as

In theory, the execution history at individual servers could vary

long as them execution was equivalent to some global Write

 

 

ordering. For example, Writes known to be commutative could be

performed in any order. In practice, because Bayou’s Write opera-

tions include arbitrary merge procedures, it is effectively impossi-

ble either to determine whether two Writes commute or to

transform two Writes so they can be reordered as has been sug-

gested for some systems [9].

When a Write is accepted by a Bayou server from a client, it is

initially deemed tentative. Tentative Writes are ordered according

to timestamps assigned to them by their accepting servers. Eventu- ally, each Write is committed, by a process described in the next

section. Committed Writes are ordered according to the times at

which they commit and before any tentative Writes.

The only requirement placed on timestamps for tentative Writes is that they be monotonically increasing at each server so

that the pair <timestamp, ID of server that assigned it> produce a total order on Write operations. There is no requirement that serv-

ers have synchronized clocks, which is crucial since trying to

ensure clock synchronization across portable computers is prob-

lematic. However, keeping servers’ clocks reasonably close is

desirable so that the induced Write order is consistent with a user’s

perception of the order in which Writes are submitted. Bayou serv- ers maintain logical clocks [20] to timestamp new Writes. A

server’s logical clock is generally synchronized with lts real-time

system clock, but, to preserve the causal ordering of Write opera-

tions, the server may need to advance its logical clock when Writes

are received during anh-entropy. Enforcing a global order on tentative, as well as committed,

Writes ensures that an isolated cluster of servers will come to

agreement on the tentative resolution of any conflicts that they

encounter. While this is not strictly necessary since clients must be prepared to deal with temporarily inconsistent servers in any case,

we believe it desirable to provide as much internal consistency as

possible. Moreover, clients can expect that the tentative resolution

of conflicts within their cluster will correspond to their eventual permanent resolution, provided that no further conflicts are intro- duced outside the cluster.

Because servers may receive Writes from clients and from other servers in an order that differs from the required execution

order, and because servers immediately apply all known Writes to their replicas, servers must be able to undo the effects of some pre- vious tentative execution of a Write operation and reapply it in a

different order. Interestingly, the number of times that a given Write operation is re-executed depends only on the order in which

Writes arrive via anti-entropy and not on the likelihood of conflicts

involving the Write.

Conceptually, each server maintains a log of all Write opera- tions that it has received, sorted by their committed or tentative timestamps, with committed Writes at the head of the log. The

server’s current data contents are generated by executing all of the Writes in the given order. Techniques for pruning a server’s Write

log and for efficiently maintaining the corresponding data contents

by undoing and redoing Write operations are given in Section 7.

Bayou guarantees that merge procedures, which execute inde- pendently at each server, produce consistent updates by restricting

them to depend only on the server’s current data contents and on any data supplied by the merge procedure itself. In particular, a

merge procedure cannot access time-varying or server-specific “environment” information such as the current system clock or

server’s name. Moreover, merge procedures that fail due to

exceeding their limits on resource usage must fail deterministi-

tally. This means that all servers must place uniform bounds on the CPU and memory resources allocated to a merge procedure and must consistently enforce these bounds during execution. Once these conditions are met, two servers that start with identical repli- cas will end up with identical rephcas after executing a Write.

6. Write Stability and Commitment

A Write is said to be stable at a server when it has

Database Final Exam Spring

Database Final Exam Spring 2023 When you have completed the assignment, submit your completed document with the code and answers to the assignment drop box. Do not create new permanent tables. I must be able to run your code using the three assigned tables. Only use these tables: `classdata-324219.ppp.pppfull_geo_c` `classdata-324219.ppp.census_income` `bigquery-public-data.census_bureau_acs.censustract_2020_5yr` The geo_id field connects the three tables. See the attachment to find your assigned bank and assessment areas. Answer each of the questions in the document directly under the question. Below all of the questions, include your code and indicate the question it addresses. Part 1: Questions Limit these results to loans that mapped to businesses with good address in census tracts with populations >= 100?

For these questions, use the American Community Survey (ACS) values: total_pop, hispanic_pop, black_pop, white_pop. (`bigquery-public-data.census_bureau_acs.censustract_2020_5yr`)

1. For your bank in your bank’s designated counties, how many loans mapped to census tracts with populations >= 100?

2. For your bank in your bank’s designated counties, what percentage of those loans went to businesses in census tracts that were majority black?

3. For your bank in your bank’s designated counties, what percentage of those loans went to businesses in census tracts that were majority Hispanic?

4. For your bank in your bank’s designated counties, what percentage of those loans went to businesses in census tracts that were majority white?

(Note: Don’t break down the results by individual county.)

5. For all banks that made loans to business in your bank’s designated counties, how many loans went to census tracts with populations >= 100?

6. For all banks that made loans to business in your bank’s designated counties, what percent of those loans (tract populations >= 100) went to businesses in census tracts that were majority black?

7. For all banks that made loans to business in your bank’s designated counties, what percent of those loans (tract populations >= 100) went to businesses in census tracts that were majority Hispanic?

8. For all banks that made loans to business in your bank’s designated counties, what percent of those loans (tract populations >= 100) went to businesses in census tracts that were majority white?

 

 

 

Part 2: Questions Using the income table: `classdata-324219.ppp.census_income` (see guide below). Limit these results to loans that mapped to businesses with good address in tracts with populations >= 100?

9. For your bank, what percent of its loans went to businesses in census tracts in your bank’s designated counties where the median family income is > 0% and < 50% of the MSA/MD median family income (Income_Level_Ind =1)?

10. For your bank, what percent of its loans went to businesses in census tracts in your bank’s designated counties where the median family income is >= 50% and < 80% of the MSA/MD median family income?

11. For your bank, what percent of its loans went to businesses in census tracts in your bank’s designated counties where the median family income is >= 80% and < 120% of the MSA/MD median family income?

12. For your bank, what percent of its loans went to businesses in census tracts in your bank’s designated counties where the median family income is >= 120% of the MSA/MD median family income?

(Note: Don’t break down the results by individual county.) 13. For all banks that made loans to business in your bank’s designated counties, what percent of

those loans (tract populations >= 100) went to businesses in census tracts median family income is > 0% and < 50% of the MSA/MD median family income?

14. For all banks that made loans to business in your bank’s designated counties, what percent of those loans (tract populations >= 100) went to businesses in census tracts median family income is >= 50% and < 80% of the MSA/MD median family income?

15. For all banks that made loans to business in your bank’s designated counties, what percent of those loans (tract populations >= 100) went to businesses in census tracts median family income is >= 80% and < 120% of the MSA/MD median family income?

16. For all banks that made loans to business in your bank’s designated counties, what percent of those loans (tract populations >= 100) went to businesses in census tracts median family income is >= 120% of the MSA/MD median family income?

 

References

Census Notes `classdata-324219.ppp.census_income` Field Name Definition Year Year of data (2022) MSA Code MSA Code State Code State Code County Code County Code Tract Tract Code

 

 

MSA Med Family Income Median Family Income (MFI) of the Metropolitan Statistical Area/Metropolitan Division (MSA/MD) (or statewide non-MSA/MD) in which the tract resides. Tract Med Family Income Tract level MFI Income_Per Tract level MFI divided by the MSA/MD level MFI, and the result is truncated to two decimal places. Income_Level_Ind Income Level Indicator identifies each census tract as not available, low, moderate, middle, or upper. It is determined from the Income % using the following categories: The codes are: 0 — Not Available: if tract median family income = 0 1 — Low: if tract median family income is > 0% and < 50% of the MSA/MD median family income 2 — Moderate: if tract median family income is >= 50% and < 80% of the MSA/MD median family income 3 — Middle: if tract median family income is >= 80% and < 120% of the MSA/MD median family income 4 — Upper: if tract median family income is >= 120% of the MSA/MD median family income

ACS General Census

Area Type GEOID Structure Number of Digits

Example Geographic Area Example GEOID

State STATE 2 Texas 48

County STATE+COUNTY 2+3=5 Harris County, TX

48201

County Subdivision

STATE+COUNTY+COUSUB 2+3+5=10 Pasadena CCD, Harris County, TX

4820192975

Places STATE+PLACE 2+5=7 Houston, TX 4835000

Census Tract

STATE+COUNTY+TRACT 2+3+6=11 Census Tract 2231 in Harris County, TX

48201223100

Block Group

STATE+COUNTY+TRACT+BLOCK GROUP

2+3+6+1=12 Block Group 1 in Census Tract 2231 in Harris County, TX

482012231001

Block* STATE+COUNTY+TRACT+BLOCK 2+3+6+4=15 Block 1050 in 482012231001050

 

 

(Note – some blocks also contain a one character suffix (A, B, C, ect.)

Census Tract 2231 in Harris County, TX

 

 

For more information see the links below: What is the Community Reinvestment Act (CRA)? https://www.federalreserve.gov/consumerscommunities/cra_about.htm The Equal Credit Opportunity Act https://www.justice.gov/crt/equal-credit-opportunity-act-3

 

The National Cybersecurity Strategy

The National Cybersecurity Strategy March 2023 provides a comprehensive and coordinated plan to address the growing threats to the United States’ digital ecosystem. It aims to shift the burden of cybersecurity away from individuals, small businesses, and local governments and onto the organizations that are most capable of reducing risks for everyone. The strategy seeks to build and enhance collaboration around five pillars: defending critical infrastructure, disrupting and dismantling threat actors, shaping market forces to drive security and resilience, investing in a resilient future, and forging international partnerships to pursue shared goals. Its implementation will protect investments in rebuilding America’s infrastructure, developing clean energy, and reshoring America’s technology and manufacturing base.

https://www.whitehouse.gov/wp-content/uploads/2023/03/National-Cybersecurity-Strategy-2023.pdf

For this assignment, you are required to analyze the National Cybersecurity Strategy March 2023 and write a 4-page paper, double-spaced, with a focus on one of the five pillars of the strategy.

Instructions:

General Review: Write a two-page general review of the National Cybersecurity Strategy March 2023.

Chosen Pillar Review: Select one of the five pillars from the strategy and write a two-page review of it.

Midterm Project due by Sunday, March 19

Requirements:
The assignment should be 4 pages double-spaced, excluding the title page and references page.
Use APA citation style to cite sources in the text and on the references page.

ParkingFinder.com

ParkingFinder.com is a type of e-business exchange that does business entirely on the Internet. The system connects parking space landlords and space tenants all over the world in an online market and manages the entire booking process and payments.

For a person to offer parking spaces (landlord), he/she must register with ParkingFinder.com. The person must provide a current physical address and telephone number as well as a current e-mail address. The system will then maintain an open account for this person. Access to the system as a landlord is through a secure, authenticated portal.

A landlord can list parking spaces on the system through a special Internet form. Information required includes all of the pertinent information about the parking space, the asking price, and documents showing that the landlord has the right to rent the space. A landlord may list as many spaces as desired. Parking spaces listed by a landlord need to be verified that the landlord owns them, before they can be visible to the public. The system maintains an index of all parking spaces in the system so that tenants can use the search engine to search for parking spaces. The search engine allows searches of parking spaces by location. At the end of each month, a check is mailed to each landlord for the spaces that have been rented.

People wanting to rent parking spaces (potential tenants) come to the site and search for the space they want. When they decide to book, they must open an account with a credit card to pay for the parking space. They need to select the space they want to book, the time slot for the renting period and the vehicle license plate number. The system maintains all of this information on secure servers. The tenants can, if they want, enter their review and ratings of the parking space. The same functionality can be accessed through mobile app, which can be downloaded and installed from Apple Store and Google Play Store.

Week 3:

6.         Develop a  sequence diagram for the use case  Book a Parking Space (10 points)

7.         Develop a  design class diagram  (see Figure 10-19 from the textbook for example) that includes classes, attributes, methods, and navigation visibility for the use case  Book a Parking Space (10 points)

Submit the sequence diagram and the design class diagram together with the SSD, Domain Model Class diagram that you use to develop the design models.

Live Migration of Virtual Machines

Live Migration of Virtual Machines

Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen†, Eric Jul†, Christian Limpach, Ian Pratt, Andrew Warfield

University of Cambridge Computer Laboratory † Department of Computer Science 15 JJ Thomson Avenue, Cambridge, UK University of Copenhagen, Denmark

firstname.lastname@cl.cam.ac.uk {jacobg,eric}@diku.dk

Abstract Migrating operating system instances across distinct phys- ical hosts is a useful tool for administrators of data centers and clusters: It allows a clean separation between hard- ware and software, and facilitates fault management, load balancing, and low-level system maintenance.

By carrying out the majority of migration while OSes con- tinue to run, we achieve impressive performance with min- imal service downtimes; we demonstrate the migration of entire OS instances on a commodity cluster, recording ser- vice downtimes as low as 60ms. We show that that our performance is sufficient to make live migration a practical tool even for servers running interactive loads.

In this paper we consider the design options for migrat- ing OSes running services with liveness constraints, fo- cusing on data center and cluster environments. We intro- duce and analyze the concept of writable working set, and present the design, implementation and evaluation of high- performance OS migration built on top of the Xen VMM.

1 Introduction

Operating system virtualization has attracted considerable interest in recent years, particularly from the data center and cluster computing communities. It has previously been shown [1] that paravirtualization allows many OS instances to run concurrently on a single physical machine with high performance, providing better use of physical resources and isolating individual OS instances.

In this paper we explore a further benefit allowed by vir- tualization: that of live OS migration. Migrating an en- tire OS and all of its applications as one unit allows us to avoid many of the difficulties faced by process-level mi- gration approaches. In particular the narrow interface be- tween a virtualized OS and the virtual machine monitor (VMM) makes it easy avoid the problem of ‘residual de- pendencies’ [2] in which the original host machine must remain available and network-accessible in order to service

certain system calls or even memory accesses on behalf of migrated processes. With virtual machine migration, on the other hand, the original host may be decommissioned once migration has completed. This is particularly valuable when migration is occurring in order to allow maintenance of the original host.

Secondly, migrating at the level of an entire virtual ma- chine means that in-memory state can be transferred in a consistent and (as will be shown) efficient fashion. This ap- plies to kernel-internal state (e.g. the TCP control block for a currently active connection) as well as application-level state, even when this is shared between multiple cooperat- ing processes. In practical terms, for example, this means that we can migrate an on-line game server or streaming media server without requiring clients to reconnect: some- thing not possible with approaches which use application- level restart and layer 7 redirection.

Thirdly, live migration of virtual machines allows a sepa- ration of concerns between the users and operator of a data center or cluster. Users have ‘carte blanche’ regarding the software and services they run within their virtual machine, and need not provide the operator with any OS-level access at all (e.g. a root login to quiesce processes or I/O prior to migration). Similarly the operator need not be concerned with the details of what is occurring within the virtual ma- chine; instead they can simply migrate the entire operating system and its attendant processes as a single unit.

Overall, live OS migration is a extremelely powerful tool for cluster administrators, allowing separation of hardware and software considerations, and consolidating clustered hardware into a single coherent management domain. If a physical machine needs to be removed from service an administrator may migrate OS instances including the ap- plications that they are running to alternative machine(s), freeing the original machine for maintenance. Similarly, OS instances may be rearranged across machines in a clus- ter to relieve load on congested hosts. In these situations the combination of virtualization and migration significantly improves manageability.

NSDI ’05: 2nd Symposium on Networked Systems Design & ImplementationUSENIX Association 273

 

 

 

We have implemented high-performance migration sup- port for Xen [1], a freely available open source VMM for commodity hardware. Our design and implementation ad- dresses the issues and tradeoffs involved in live local-area migration. Firstly, as we are targeting the migration of ac- tive OSes hosting live services, it is critically important to minimize the downtime during which services are entirely unavailable. Secondly, we must consider the total migra- tion time, during which state on both machines is synchro- nized and which hence may affect reliability. Furthermore we must ensure that migration does not unnecessarily dis- rupt active services through resource contention (e.g., CPU, network bandwidth) with the migrating OS.

Our implementation addresses all of these concerns, allow- ing for example an OS running the SPECweb benchmark to migrate across two physical hosts with only 210ms un- availability, or an OS running a Quake 3 server to migrate with just 60ms downtime. Unlike application-level restart, we can maintain network connections and application state during this process, hence providing effectively seamless migration from a user’s point of view.

We achieve this by using a pre-copy approach in which pages of memory are iteratively copied from the source machine to the destination host, all without ever stopping the execution of the virtual machine being migrated. Page- level protection hardware is used to ensure a consistent snapshot is transferred, and a rate-adaptive algorithm is used to control the impact of migration traffic on running services. The final phase pauses the virtual machine, copies any remaining pages to the destination, and resumes exe- cution there. We eschew a ‘pull’ approach which faults in missing pages across the network since this adds a residual dependency of arbitrarily long duration, as well as provid- ing in general rather poor performance.

Our current implementation does not address migration across the wide area, nor does it include support for migrat- ing local block devices, since neither of these are required for our target problem space. However we discuss ways in which such support can be provided in Section 7.

2 Related Work

The Collective project [3] has previously explored VM mi- gration as a tool to provide mobility to users who work on different physical hosts at different times, citing as an ex- ample the transfer of an OS instance to a home computer while a user drives home from work. Their work aims to optimize for slow (e.g., ADSL) links and longer time spans, and so stops OS execution for the duration of the transfer, with a set of enhancements to reduce the transmitted image size. In contrast, our efforts are concerned with the migra- tion of live, in-service OS instances on fast neworks with only tens of milliseconds of downtime. Other projects that

have explored migration over longer time spans by stop- ping and then transferring include Internet Suspend/Re- sume [4] and µDenali [5].

Zap [6] uses partial OS virtualization to allow the migration of process domains (pods), essentially process groups, us- ing a modified Linux kernel. Their approach is to isolate all process-to-kernel interfaces, such as file handles and sock- ets, into a contained namespace that can be migrated. Their approach is considerably faster than results in the Collec- tive work, largely due to the smaller units of migration. However, migration in their system is still on the order of seconds at best, and does not allow live migration; pods are entirely suspended, copied, and then resumed. Further- more, they do not address the problem of maintaining open connections for existing services.

The live migration system presented here has considerable shared heritage with the previous work on NomadBIOS [7], a virtualization and migration system built on top of the L4 microkernel [8]. NomadBIOS uses pre-copy migration to achieve very short best-case migration downtimes, but makes no attempt at adapting to the writable working set behavior of the migrating OS.

VMware has recently added OS migration support, dubbed VMotion, to their VirtualCenter management software. As this is commercial software and strictly disallows the publi- cation of third-party benchmarks, we are only able to infer its behavior through VMware’s own publications. These limitations make a thorough technical comparison impos- sible. However, based on the VirtualCenter User’s Man- ual [9], we believe their approach is generally similar to ours and would expect it to perform to a similar standard.

Process migration, a hot topic in systems research during the 1980s [10, 11, 12, 13, 14], has seen very little use for real-world applications. Milojicic et al [2] give a thorough survey of possible reasons for this, including the problem of the residual dependencies that a migrated process re- tains on the machine from which it migrated. Examples of residual dependencies include open file descriptors, shared memory segments, and other local resources. These are un- desirable because the original machine must remain avail- able, and because they usually negatively impact the per- formance of migrated processes.

For example Sprite [15] processes executing on foreign nodes require some system calls to be forwarded to the home node for execution, leading to at best reduced perfor- mance and at worst widespread failure if the home node is unavailable. Although various efforts were made to ame- liorate performance issues, the underlying reliance on the availability of the home node could not be avoided. A sim- ilar fragility occurs with MOSIX [14] where a deputy pro- cess on the home node must remain available to support remote execution.

NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association274

 

 

We believe the residual dependency problem cannot easily be solved in any process migration scheme – even modern mobile run-times such as Java and .NET suffer from prob- lems when network partition or machine crash causes class loaders to fail. The migration of entire operating systems inherently involves fewer or zero such dependencies, mak- ing it more resilient and robust.

3 Design

At a high level we can consider a virtual machine to encap- sulate access to a set of physical resources. Providing live migration of these VMs in a clustered server environment leads us to focus on the physical resources used in such environments: specifically on memory, network and disk.

This section summarizes the design decisions that we have made in our approach to live VM migration. We start by describing how memory and then device access is moved across a set of physical hosts and then go on to a high-level description of how a migration progresses.

3.1 Migrating Memory

Moving the contents of a VM’s memory from one phys- ical host to another can be approached in any number of ways. However, when a VM is running a live service it is important that this transfer occurs in a manner that bal- ances the requirements of minimizing both downtime and total migration time. The former is the period during which the service is unavailable due to there being no currently executing instance of the VM; this period will be directly visible to clients of the VM as service interruption. The latter is the duration between when migration is initiated and when the original VM may be finally discarded and, hence, the source host may potentially be taken down for maintenance, upgrade or repair.

It is easiest to consider the trade-offs between these require- ments by generalizing memory transfer into three phases:

Push phase The source VM continues running while cer- tain pages are pushed across the network to the new destination. To ensure consistency, pages modified during this process must be re-sent.

Stop-and-copy phase The source VM is stopped, pages are copied across to the destination VM, then the new VM is started.

Pull phase The new VM executes and, if it accesses a page that has not yet been copied, this page is faulted in (“pulled”) across the network from the source VM.

Although one can imagine a scheme incorporating all three phases, most practical solutions select one or two of the

three. For example, pure stop-and-copy [3, 4, 5] involves halting the original VM, copying all pages to the destina- tion, and then starting the new VM. This has advantages in terms of simplicity but means that both downtime and total migration time are proportional to the amount of physical memory allocated to the VM. This can lead to an unaccept- able outage if the VM is running a live service.

Another option is pure demand-migration [16] in which a short stop-and-copy phase transfers essential kernel data structures to the destination. The destination VM is then started, and other pages are transferred across the network on first use. This results in a much shorter downtime, but produces a much longer total migration time; and in prac- tice, performance after migration is likely to be unaccept- ably degraded until a considerable set of pages have been faulted across. Until this time the VM will fault on a high proportion of its memory accesses, each of which initiates a synchronous transfer across the network.

The approach taken in this paper, pre-copy [11] migration, balances these concerns by combining a bounded itera- tive push phase with a typically very short stop-and-copy phase. By ‘iterative’ we mean that pre-copying occurs in rounds, in which the pages to be transferred during round n are those that are modified during round n− 1 (all pages are transferred in the first round). Every VM will have some (hopefully small) set of pages that it updates very frequently and which are therefore poor candidates for pre- copy migration. Hence we bound the number of rounds of pre-copying, based on our analysis of the writable working set (WWS) behavior of typical server workloads, which we present in Section 4.

Finally, a crucial additional concern for live migration is the impact on active services. For instance, iteratively scanning and sending a VM’s memory image between two hosts in a cluster could easily consume the entire bandwidth avail- able between them and hence starve the active services of resources. This service degradation will occur to some ex- tent during any live migration scheme. We address this is- sue by carefully controlling the network and CPU resources used by the migration process, thereby ensuring that it does not interfere excessively with active traffic or processing.

3.2 Local Resources

A key challenge in managing the migration of OS instances is what to do about resources that are associated with the physical machine that they are migrating away from. While memory can be copied directly to the new host, connec- tions to local devices such as disks and network interfaces demand additional consideration. The two key problems that we have encountered in this space concern what to do with network resources and local storage.

NSDI ’05: 2nd Symposium on Networked Systems Design & ImplementationUSENIX Association 275

 

 

For network resources, we want a migrated OS to maintain all open network connections without relying on forward- ing mechanisms on the original host (which may be shut down following migration), or on support from mobility or redirection mechanisms that are not already present (as in [6]). A migrating VM will include all protocol state (e.g. TCP PCBs), and will carry its IP address with it.

To address these requirements we observed that in a clus- ter environment, the network interfaces of the source and destination machines typically exist on a single switched LAN. Our solution for managing migration with respect to network in this environment is to generate an unsolicited ARP reply from the migrated host, advertising that the IP has moved to a new location. This will reconfigure peers to send packets to the new physical address, and while a very small number of in-flight packets may be lost, the mi- grated domain will be able to continue using open connec- tions with almost no observable interference.

Some routers are configured not to accept broadcast ARP replies (in order to prevent IP spoofing), so an unsolicited ARP may not work in all scenarios. If the operating system is aware of the migration, it can opt to send directed replies only to interfaces listed in its own ARP cache, to remove the need for a broadcast. Alternatively, on a switched net- work, the migrating OS can keep its original Ethernet MAC address, relying on the network switch to detect its move to a new port1.

In the cluster, the migration of storage may be similarly ad- dressed: Most modern data centers consolidate their stor- age requirements using a network-attached storage (NAS) device, in preference to using local disks in individual servers. NAS has many advantages in this environment, in- cluding simple centralised administration, widespread ven- dor support, and reliance on fewer spindles leading to a reduced failure rate. A further advantage for migration is that it obviates the need to migrate disk storage, as the NAS is uniformly accessible from all host machines in the clus- ter. We do not address the problem of migrating local-disk storage in this paper, although we suggest some possible strategies as part of our discussion of future work.

3.3 Design Overview

The logical steps that we execute when migrating an OS are summarized in Figure 1. We take a conservative approach to the management of migration with regard to safety and failure handling. Although the consequences of hardware failures can be severe, our basic principle is that safe mi- gration should at no time leave a virtual OS more exposed

1Note that on most Ethernet controllers, hardware MAC filtering will have to be disabled if multiple addresses are in use (though some cards support filtering of multiple addresses in hardware) and so this technique is only practical for switched networks.

Stage 0: Pre-Migration Active VM on Host A

Alternate physical host may be preselected for migration

Block devices mirrored and free resources maintained

Stage 4: Commitment VM state on Host A is released

Stage 5: Activation VM starts on Host B

Connects to local devices

Resumes normal operation

Stage 3: Stop and copy Suspend VM on host A

Generate ARP to redirect traffic to Host B

Synchronize all remaining VM state to Host B

Stage 2: Iterative Pre-copy Enable shadow paging

Copy dirty pages in successive rounds.

Stage 1: Reservation Initialize a container on the target host

Downtime

(VM Out of Service)

VM running normally on

Host A

VM running normally on

Host B

Overhead due to copying

Figure 1: Migration timeline

to system failure than when it is running on the original sin- gle host. To achieve this, we view the migration process as a transactional interaction between the two hosts involved:

Stage 0: Pre-Migration We begin with an active VM on physical host A. To speed any future migration, a tar- get host may be preselected where the resources re- quired to receive migration will be guaranteed.

Stage 1: Reservation A request is issued to migrate an OS from host A to host B. We initially confirm that the necessary resources are available on B and reserve a VM container of that size. Failure to secure resources here means that the VM simply continues to run on A unaffected.

Stage 2: Iterative Pre-Copy During the first iteration, all pages are transferred from A to B. Subsequent itera- tions copy only those pages dirtied during the previous transfer phase.

Stage 3: Stop-and-Copy We suspend the running OS in- stance at A and redirect its network traffic to B. As described earlier, CPU state and any remaining incon- sistent memory pages are then transferred. At the end of this stage there is a consistent suspended copy of the VM at both A and B. The copy at A is still con- sidered to be primary and is resumed in case of failure.

Stage 4: Commitment Host B indicates to A that it has successfully received a consistent OS image. Host A acknowledges this message as commitment of the mi- gration transaction: host A may now discard the orig- inal VM, and host B becomes the primary host.

Stage 5: Activation The migrated VM on B is now ac- tivated. Post-migration code runs to reattach device drivers to the new machine and advertise moved IP addresses.

NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation USENIX Association276

 

 

Elapsed time (secs) 0 2000 4000 6000 8000 10000 12000

N um

be ro

fp ag

es

0

10000

20000

30000

40000

50000

60000

70000

80000

Tracking the Writable Working Set of SPEC CINT2000

gzip vpr gcc mcf crafty parser eon perlbmk gap vortex bzip2 twolf

Figure 2: WWS curve for a complete run of SPEC CINT2000 (512MB VM)

This approach to failure management ensures that at least one host has a consistent VM image at all times during migration. It depends on the assumption that the original host remains stable until the migration commits, and that the VM may be suspended and resumed on that host with no risk of failure. Based on these assumptions, a migra- tion request essentially attempts to move the VM to a new host, and on any sort of failure execution is resumed locally, aborting the migration.

4 Writable Working Sets

When migrating a live operating system, the most signif- icant influence on service performance is the overhead of coherently transferring the virtual machine’s memory im- age. As mentioned previously, a simple stop-and-copy ap- proach will achieve this in time proportional to the amount of memory allocated to the VM. Unfortunately, during this time any running services are completely unavailable.

A more attractive alternative is pre-copy migration, in which the memory image is transferred while the operat- ing system (and hence all hosted services) continue to run. The drawback however, is the wasted overhead of trans- ferring memory pages that are subsequently modified, and hence must be transferred again. For many workloads there will be a small set of memory pages that are updated very frequently, and which it is not worth attempting to maintain coherently on the destination machine before stopping and copying the remainder of the VM.

The fundamental question for iterative pre-copy migration

is: how does one determine when it is time to stop the pre- copy phase because too much time and resource is being wasted? Clearly if the VM being migrated never modifies memory, a single pre-copy of each memory page will suf- fice to transfer a consistent image to the destination. How- ever, should the VM continuously dirty pages faster than the rate of copying, then all pre-copy work will be in vain and one should immediately stop and copy.

In practice, one would expect most workloads to lie some- where between these extremes: a certain (possibly large) set of pages will seldom or never be modified and hence are good candidates for pre-copy, while the remainder will be written often and so should best be transferred via stop-and- copy – we dub this latter set of pages the writable working set (WWS) of the operating system by obvious extension of the original working set concept [17].

In this section we analyze the WWS of operating systems running a range of different workloads in an attempt to ob- tain some insight to allow us build heuristics for an efficient and controllable pre-copy implementation.

4.1 Measuring Writable Working Sets

To trace the writable working set behaviour of a number of representative workloads we used Xen’s shadow page ta- bles (see Section 5) to track dirtying statistics on all pages used by a particular executing operating system. This al- lows us to determine within any time period the set of pages written to by the virtual machine.

Using the above, we conducted a set of experiments to sam-

NSDI ’05: 2nd Symposium on Networked Systems Design & ImplementationUSENIX Association 277

 

 

Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime (Based on a page trace of Linux Kernel Compile)

Migration throughput: 128 Mbit/sec

Elapsed time (sec) 0 100 200 300 400 500 600

R at

e of

pa ge

di rty

in g

(p ag

es /s

ec )

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

E xp

ec te

d do

w nt

im e

(s ec

)

0

0.5

1

1.5

2

2.5

3

3.5

4

Migration throughput: 256 Mbit/sec

Elapsed time (sec) 0 100 200 300 400 500 600

R at

e of

pa ge

di rty

in g

(p ag

es /s

ec )

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

E xp

ec te

d do

w nt

im e

(s ec

)

0

0.5

1

1.5

2

2.5

3

3.5

4

Migration throughput: 512 Mbit/sec

Elapsed time (sec) 0 100 200 300 400 500 600

R at

e of

pa ge

di rty

in g

(p ag

es /s

ec )

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

E xp

ec te

d do

w nt

im e

(s ec

)

0

0.5

1

1.5

2

2.5

3

3.5

4

Figure 3: Expected downtime due to last-round memory copy on traced page dirtying of a Linux kernel compile.

Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime (Based on a page trace of OLTP Database Benchmark)

Migration throughput: 128 Mbit/sec

Elapsed time (sec) 0 200 400 600 800 1000 1200

R at

e of

pa ge

di rty

in g

(p ag

es /s

ec )

0

1000

2000

3000

4000

5000

6000

7000

8000

E xp

ec te

d do

w nt

im e

(s ec

)

0

0.5

1

1.5

2

2.5

3

3.5

4

Migration throughput: 256 Mbit/sec

Elapsed time (sec) 0 200 400 600 800 1000 1200

R at

e of

pa ge

di rty

in g

(p ag

es /s

ec )

0

1000

2000

3000

4000

5000

6000

7000

8000

E xp

ec te

d do

w nt

im e

(s ec

)

0

0.5

1

1.5

2

2.5

3

3.5

4

Migration throughput: 512 Mbit/sec

Elapsed time (sec) 0 200 400 600 800 1000 1200

R at

e of

pa ge

di rty

in g

(p ag

es /s

ec )

0

1000

2000

3000

4000

5000

6000

7000

8000

E xp

ec te

d do

w nt

im e

(s ec

) 0

0.5

1

1.5

2

2.5

3

3.5

4

Figure 4: Expected downtime due to last-round memory copy on traced page dirtying of OLTP.

Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime (Based on a page trace of Quake 3 Server)

Migration throughput: 128 Mbit/sec

Elapsed time (sec) 0 100 200 300 400 500

R at

e of

pa ge

di rty

in g

(p ag

es /s

ec )

0

100

200

300

400

500

600

E xp

ec te

d do

w nt

im e

(s ec

)

0

Algorithms in Social Media

 

part 1:

Question 1:

Select any ONE of the topics you would like to write about from the list below.

a. Algorithms in Social Media
b. Non-Fungible Tokens
c. Biases in Artificial Intelligence
d. 3D Printing

Question 2:

Write in point form – what you would like to cover for the topic you have chosen.
Provide a balanced view.

Question 3:

Review what you have written in question 1, and narrow down your title. This could include a statement of a problem or opportunity within the topic.

Question 4:

Research three journal articles in the Charles Sturt University Library Database related to your topic. Three journal articles must not be older than three years old.
Enter the references of these three journal articles here, in APA 7th edition format. This is the standard format for most Charles Sturt University subjects. You will use the contents of these journals to write your structured report.

Question 5:

Research three (or more) other articles (magazines, newspaper, reports, etc) related to your topic. You may find them in the Charles Sturt University Database or Google. Articles may be older than three years old. Enter the references of these three other articles here, in APA 7th edition format. You will use the contents of these articles to write your structured report.

Question 6:

After completing the first five questions, comment briefly on what you have discovered (through the research process) about the topic you have selected. This reflection should be no more than 200 words.

part 2:

 

Task

Write a report based on the preparation done in Assessment 2. Use the guidelines below to structure your report

Report SectionChecklistTitleThe title must be narrow and focused.
You might want to review this title after you have finished the entire report.
The title should give the reader a good idea of what to expect in your report.
Do not format it as a question.

IntroductionAbout 170-200 words.
Clear introduction of the topic through background, statement of report’s purpose/argument, and a preview of main points and recommendations.
Correct use of third-person writing.
No references or quotes.

Body of Report

About 700-800 words.
Statements made must be backed up by literature.
Paraphrase or quote correctly to style.
Do not write in dot (or numbered) points.
Subheadings are optional.
Use third-person writing.
Arguments ‘for’ and ‘against’ are to be included to give a balanced outlook of the topic.
Use of at least 3 journal papers plus other sources.

RecommendationsAbout 350-400 words.
At least three evidence-based recommendations are informed by literature.
Ensure recommendations are logically supported in the body.
Clear and concise.
Do not introduce new topics/issues.
Do not write in dot (or numbered) points.
Correct use of third-person writing.

ConclusionAbout 170-200 words.
Concluding the entire report only.
No new material (or thoughts) is to be introduced here.
No quotes or references.
Correct use of third-person writing.

Reference ListConform to APA 7th edition style.
Check for punctuation, capitals, etc. in your reference list.
Due to the Interact2 Journal interface, the lecturer understands the difficulty in trying to
use indentation in the Reference List. Not inserting indentation will not cause you to lose
marks.

As part of the graduate component to Special Topics in Data Science,

As part of the graduate component to Special Topics in Data Science, you will
be asked to submit a term paper that includes at least 10 pages and at least 5 scholarly
research, government white papers, or reputable industry reports. This term paper will
ask you to survey the ways in which Big Data has been utilized across a range of
business activities that include the following aspects: product development, predictive
maintenance, customer experience, fraud and compliance, machine learning,
operational efficiency, and drive innovation. Further details defining the scope of each
of these business activities is included below for clarification in the uploaded image.

This project will expose you to inference using Bayesian networks.

Purpose:  

This project will expose you to inference using Bayesian networks. Bayesian networks capture causal relationships and are widely used in fault diagnosis across a wide variety of applications. A Bayesian network can be represented by a directed graph which will model causal relationships between variables. A useful tool to represent and traverse a graph is NetworkX (NetworkX — NetworkX documentation) which contains a comprehensive library of graph types and graph algorithms written in Python. The application that we will be targeting is Car fault diagnosis which was introduced in class. The fundamental issue in such diagnosis applications is to discover the causes or underlying reasons for the fault to occur and to rank these reasons in terms of their importance.

In this application we will be exploring the reasons behind: a) the car not starting and the probability that this event takes place; and b) under what conditions the car battery becomes flat and the likelihood of this occurring.

Project Requirements:

R1

Use Networkx in Google Colab and represent the network as shown below:

Attach probability tables to each node as specified in the Project 3 discussion document. Visualize the network using Networkx and show the nodes and edges. You do not have to show the probability tables you created but this of course will be embedded in your code. The coloring used in the figure above does not need to be reproduced. Instead, use a neutral color of blue to shade the nodes. Make sure that your edges show directionality.

R2

Compute the probability P (-cs, +ab, +fb)

R3

Compute the probability P (-cs, +ab)

R4

Compute the probability P(-cs, +fb)

R5

For the battery going flat, which of the factors is more important, battery dead or not charging?

Note:

  1. Use the starter code provided for this project. This is essential as many of you will not be familiar with NetworkX.
  2. I strongly recommend that you read the Project 3 Discussion Document as it covers not just the probability table creation but also what formulae needs to be used to compute the answers to requirements R2 to R5.

Managing Update Conflicts in Bayou,

Managing Update Conflicts in Bayou,

a Weakly Connected Replicated Storage System

Douglas B. Terry, Marvin M. Theimer, Karin Petersen, Alan J. Demers,

Mike J. Spreitzer and Carl H. Hauser

Computer Science Laboratory

Xerox Palo Alto Research Center

Palo Alto, California 94304 U.S.A.

Abstract

Bayou is a replicated, weakly consmtent storage system designed for a mobile computing environment that includes porta-

ble machines with less than ideal network connectivity. To maxi- mize availabdity, users can read and write any accessible replica.

Bayou’s design has focused on supporting apphcation-specific mechanisms to detect and resolve the update conflicts that natu-

rally arise in such a system, ensuring that replicas move towards

eventual consistency, and defining a protocol by which the resolu- tion of update conflicts stabilizes, It includes novel methods for confhct detection, called dependency checks, and per-write con- flict resolution based on client-provided merge procedures. To

guarantee eventual consistency, Bayou servers must be able to roll- back the effects of previously executed writes and redo them according to a global serialization order. Furthermore, Bayou per-

mits clients to observe the results of all writes received by a server, mchrding tentative writes whose conflicts have not been ultimately resolved. This paper presents the motivation for and design of these mechanisms and describes the experiences gained with an

initial implementation of the system.

1. Introduction

The Bayou storage system prowdes an mfrastrtrcture for col-

laborative applications that manages the conflicts introduced by concurrent activity while relying only on the weak connectivity available for mobile computing. The advent of mobile computers,

in the form of laptops and personal digital assistants (PDAs) enables the use of computational facilities away from the usual work setting of users. However, mobile computers do not enjoy the

connectivity afforded by local area networks or the telephone sys- tem. Even wireless media, such as cellular telephony, will not per-

mit continuous connectivity until per-mmute costs decline enough

to justify lengthy connections. Thus, the Bayou design requires

only occasional, patr-wise communication between computers.

This model takes into consideration characteristics of mobile com-

puting such as expensive connection time, frequent or occasional disconnections, and that collaborating computers may never be all connected simultaneously [1, 13, 16].

The Bayou architecture does not include the notion of a “dis- connected” mode of operation because, in fact, various degrees of

Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notiea is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to recktribute to lists, requires prior specific permission aodlor a fee.

SIGOPS ’95 12/95 CO, USA 01995 ACM 0-89791-71 5-4/9510012…$3.50

“connectedness” are possible. Groups of computers may be pafii-

tioned away from the rest of the system yet remain connected to each other. Supporting disconnected workgroups is a central goal

of the Bayou system. By relying only on pair-wise communication

in the normal mode of operation, the Bayou design copes with arbitrary network connectivity.

A weak connectivity networking model can be accommodated only with weakly consistent, replicated data. Replication is

reqtured since a single storage site may not be reachable from

mobile clients or within disconnected workgroups. Weak consis-

tency is desired since any replication scheme providing one copy serializablhty [6], such as reqturing clients to access a quorum of replicas or to acquire exclusive locks on data that they wish to

update, yields unacceptably low write availability in par-btioned

networks [5]. For these reasons, Bayou adopts a model in which chents can read and write to any replica without the need for explicit coordination with other rephcas. Every computer eventu- ally receives updates from every other, either directly or indirectly, through a chain of pair-wise interactions.

Unhke many previous systems [12, 27], our goal m designing

the Bayou system was not to provide transparent rephcated data support for existing file system and database applications. We believe that applications must be aware that they may read weakly

consistent data and also that their wrrte operations may conflict with those of other users and applications. Moreover, applications

must be revolved m the detection and resolution of conflicts since

these naturally depend on the semantics of the application. To this end, Bayou provides system support for applicatlon-

specific confllct detection and resolution. Previous systems, such

as Locus [30] and Coda [17], have proven the value of semantic conflict detection and resolution for file directories, and several systems are exploring conflict resolution for file and database con- tents [8, 18, 26]. Bayou’s mechamsms extend this work by letting

applications exploit domain-specific knowledge to achieve auto-

matic conflict resolution at the granularity of individual update

operations without compromising security or eventual consistency. Automatic conflict resolution is highly desirable because lt

enables a Bayou replica to remam available. In a replicated system

with the weak connectiwt y model adopted by Bayou, conflicts may be detected arbitrarily far from the users who introduced the

conflicts. Moreover, conflicts may be detected when no user is present. Bayou does not take the approach of systems that mark conflicting data as unavadable until a person resolves the conflict. Instead, clients can read data at all times, including data whose

conflicts have not been fully resolved either because human inter- vention M needed or because other conflicting updates may be propagating through the system. Bayou provides interfaces that

make the state of a replica’s data apparent to the application. The contributions presented in this paper are as follows: we

introduce per-update dependency checks and merge procedures as

J.72

 

 

a general mechamsm for application-specific conflict detection and

resolution; we define two states of an update, committed and tenta-

tive, which relate to whether or not the conflicts potentially intro-

duced by the update have been ultimately resolved; we present

mechanisms for managing these two states of an update both from

the perspective of the clients and the storage management require-

ments of the replicas; we describe how replicas move towards

eventual consistency; and, finally, we discuss how security is pro- vided in a system like Bayou.

2. Bayou Applications

Tbe Bayou replicated storage system was designed to support a

variety of non-real-time collaborative applications, such as shared

calendars, mad and bibliographic databases, program develop- ment, and document editing for disconnected workgroups, as well

as applications that might be used by individuals at different hosts

at different times. To serve as a backdrop for the discussion in fol-

lowing sections, this section presents a quick overview of two applications that have been implemented thus far, a meeting room scheduler and a bibliographic database.

2.1 Meeting room scheduler

Our meeting room scheduling application enables users to reserve meeting rooms. At most one person (or group) can reserve

the room for any given period of time. This meeting room schedul- ing program M intended for use after a group of people have

already decided that they want to meet in a certain room and have determined a set of acceptable times for the meeting. It does not help them to determine a mutually agreeable place and time for the

meeting, it only allows them to reserve the room. Thus, it is a

much simpler application than one of general meeting scheduling. Users interact with a graphical interface for the schedule of a

room that indicates which times are already reserved, much like the display of a typical calendar manager. The meeting room

scheduling program periodically re-reads the room schedule and refreshes the user’s display. This refresh process enables the user to observe new entries added by other users. The user’s display might be out-of-date with respect to the confirmed reservations of

the room, for example when it is showing a local copy of the room

schedule on a disconnected laptop. Users reserve a time slot simply by selecting a free time period

and filling in a form describing the meeting that is being sched-

uled. Because the user’s display might be out-of-date, there is a

chance that the user could try to schedule a meeting at a time that was already reserved by someone else. To account for this possi-

bdity, users can select several acceptable meeting times rather than just one. At most one of the requested times will eventually be reserved.

A user’s reservation, rather than being immediately confirmed

(or rejected), may remain “tentative” for awhile. While tentative, a meeting may be rescheduled as other interfering reservations

become known. Tentative reservations are indicated as such on the display (by showing them grayed). The “outdatedness” of a calen-

dar does not prevent it from being useful, but simply increases the

likelihood that tentative room reservations will be rescheduled and

finally “committed” to less preferred meeting times. A group of users, although disconnected from the rest of the

system, can immediately see each other’s tentative room reserva-

tions if they are all connected to the same COPY of the meeting room schedule. If, instead, users are maintaining private copies on

their laptop computers, local communication between the machines will eventually synchronize all copies within the group.

2.2 Bibliographic database

Our second application allows users to cooperatively manage

databases of bibliographic entries. Users can add entries to a data-

base as they find papers in the library, in reference lists, via word

of mouth, or by other means. A user can freely read and write any

copy of the database, such as one that resides on his laptop. For the

most part, the database is append-only, though users occasionally

update entries to fix mistakes or add personal annotations.

As is common in bibliographic databases, each entry has a

unique, human-sensible key that is constructed by appending the

year in which the paper was published to the first author’s last

name and adding a character if necessary to distinguish between

multiple papers by the same author in the same year. Thus, the first paper by Jones et al, in 1995 might be identified as “Jones95” and subsequent papers as “Jones95b’, “Jones95c”, and so on,

An entry’s key 1s tentatively assigned when the entry is added.

A user must be aware that the assigned keys are only tentative and

may change when the entry is “committed.” In other words, a user

must be aware that other concurrent updaters could be trying to assign the same key to different entries. Only one entry can have

the key; the others will be assigned alternative keys by the system.

Thus, for example, if the user employs the tentatively assigned key

in some fashion, such as embedding it as a citation in a document, then he must also remember later to check that the key assigned when the entry was committed is in fact the expected one.

Because users can access inconsistent database copies, the same bibliograpblc entry may be concurrently added by different users with different keys. To the extent possible, the system detects

duplicates and merges their contents into a single entry with a sin-

gle key. Interestingly, this is an application where a user may choose to

operate in disconnected mode even if constant connectivity were

possible. Consider the case where a user is in a university library looking up some papers. He occasionally types bibliographic refer- ences into his laptop or PDA. He may spend hours m the library

but only enter a handful of references. He is not likely to want to keep a cellular phone connection open for the duration of his visit. Nor will he want to connect to the university’s local wireless net- work and subject himself to student hackers. He wdl more likely

be content to have his bibliographic entries integrated into his database stored by Bayou upon returning to his home or office.

3. Bayou’s Basic System Model

In the Bayou system, each data collection is replicated in full at

a number of servers. Applications running as clienrs interact with the servers through the Bayou application programming interface

(API), which is implemented as a client stub bound with the appli- cation. This API, as well as the underlying client-server RPC pro- tocol, supports two basic operations: Read and Wrife. Read operations permit queries over a data collection, while Write oper-

ations can insert, modify, and delete a number of data items in a collection. Figure 1 illustrates these components of the Bayou architecture. Note that a chent and a server may be co-resident on a

host, as would be typical of a laptop or PDA running in isolation.

Access to one server is sufficient for a client to perform useful work. The client can read the data held by that server and submit

Writes to the server. Once a Write is accepted by a server, the cli-

ent has no further responsibility for that Write. In particular, the client does not wait for the Write to propagate to other servers, In other words, Bayou presents a weakly consntent replication model with a read-uny/write-any style of access. Weakly consistent repli-

cation has been used previously for availability, simplicity and

scalability in a variety of systems [3, 7, 10, 12, 15, 19].

173

 

 

Anti-entropy

Figure 1. Bayou System Model

While individual Read and Write operations are performed at a

single server, clients need not confine themselves to interacting with a single server. Indeed, in a mobile computing environment,

switching bet ween servers is often desirable, and Bayou provides session guarantees to reduce client-observed inconsistencies when accessing different servers. The description of session guarantees has been presented elsewhere [29].

To support application-specific conflict detection and resolu-

tion, Bayou Writes must contain more than a typical file system write or database update. Along with the desired updates, a Bayou Write carries information that lets each server receiving the Write

decide if there is a conflict and if so, how to fix it. Each Bayou Write also contains a globally unique WriteID assigned by the server that first accepted the Write.

The storage system at each Bayou server conceptually consists of an ordered log of the Writes described above plus the data resulting from the execution of these Writes. Each server performs each Write locally with conflicts detected and resolved as they are

encountered during the execution. A server immediately makes the effects of all known Writes available for reading.

In keeping with the goal of requiring as little of the network as

possible, Bayou servers propagate Writes among themselves dur-

ing pair-wise contacts, called anti-entropy sessions [7]. The two

servers involved in a session exchange Write operations so that

when they are finished they agree on the set of Bayou Writes they

have seen and the order in which to perform them. The theory of epidemic algorithms assures that as long as the

set of servers is not permanently partitioned each Write will even-

tually reach all servers [7]. This holds even for communication patterns in which at most one pair of servers is ever connected at once. In the absence of new Writes from clients, all servers will eventually hold the same data. The rate at which servers reach con- vergence depends on a number of factors including network con- nectivity, the frequency of anti-entropy, and the policies by which

servers select anti-entropy partners. These policies may vary

according to the characteristics of the network, the data, and its servers. Developing optimal anti-entropy policies is a research topic in its own right and not further discussed in this paper.

4. Conflict Detection and Resolution

4.1 Accommodating application semantics

Supporting application-specific conflict detection and resolu- tion is a major emphasis in the Bayou design, A basic tenet of our

work is that storage systems must provide means for an application to specify its notion of a conflict along with its policy for resolving

conflicts. In return, the system implements the mechanisms for

reliably detecting conflicts, as specified by the application, and for

automatically resolving them when possible. This design goal fol- lows from the observation that different applications have different

notions of what it means for two updates to conflict, and that such conflicts cannot always be identified by simply observing conven- tional reads and writes submitted by the applications.

As an example of application-specific conflicts, consider the meeting room scheduling application discussed in Section 2.1. Observing updates at a coarse granularity, such as the whole-file level, the storage system might detect that two users have concur-

rently updated different replicas of the meeting room calendar and conclude that their updates conflict. Observing updates at a fine

granularity, such as the record level, the system might detect that

the two users have added independent records and thereby con-

clude that their updates do not conflict. Neither of these conclu-

sions are warranted. In fact, for this application, a conflict occurs

when two meetings scheduled for the same room overlap in time. Bibliographic databases provide another example of applica-

tion-specific conflicts. In this application, two bibliographic entries

conflict when either they describe different publications but have been assigned the same key by their submitters or else they describe the same publication and have been assigned distinct keys. Again, this definition of conflicting updates is specific to this

application. The steps taken to resolve conflicting updates once they have

been detected may also vary according to the semantics of the

application. In the case of the meeting room scheduling applica-

tion, one or more of a set of conflicting meetings may need to be

174

 

 

Bayou_Write (update, dependency_check, mergeproc) {

IF (DB_Eval (dependency_check. query) <> dependency_check.expected_result)

resolved_update = Interpret (mergeproc);

ELSE

resolved_update = update; DB_Apply (res~lved_up~ate);

}

Figure 2. Processing a Bayou Write Operation

Bayou_Write(

update = {insert, Meetings, 12/18/95, 1:30pm, 60min, “Budget Meeting”),

dependency_check = { query = “SELECT key FROM Meetings WHERE day= 12/18/95

AND start < 2:30pm AND end> l:30pm”,

expected_result = EMPTY},

mergeproc = {

alternates = {{12/18/95, 3:OOpm], {12/19/95, 9:30am]];

newupdate = {]; FOREACH a IN alternates {

# check fthere would be a conflict

IF (NOT EMPTY ( SELECT key FROM Meetings WHERE day = a.date AND start < a.time + 60min AND end > a.time))

CONTINUE; #no conflict, can schedule meeting at that time

newupdate = {insert, Meetings, a.date, a.time, 60min, “Budget Meeting”);

BREAK;

1 IF (newupdate = {]) #no alternate is acceptable

newupdate = {insert, ErrorLog, 12/18/95, 1:30pm, 60min, “Budget Meeting”];

) RETURN newupdate;}

Figure 3, A Bayou

moved to a different room or different time. In the bibliographic apphcation, an entry may need to be assigned a different unique key or two entries for the same publication may need to be merged

into one.

The Bayou system includes two mechanisms for automatic

conthct detection and resolution that are intended to support arbi-

trary applications: dependency checks and merge procedures. These mechanisms permit clients to indicate, for each individual

Write operation, how the system should detect conflicts involving

the Write and what steps should be taken to resolve any detected conflicts based on the semantics of the application. They were

designed to be flexlble since we expect that apphcations will differ appreciably in both the procedures used to handle conflicts, and, more generally, in their ability to deal with conflicts.

Techniques for semantic-based confhct detection and resolution have previously been incorporated into some systems to handle

special cases such as file directory updates. For example, the

Locus [30], FICUS [12], and Coda [17] distributed file systems all

include mechanisms for automatically resolving certain classes of

conflicting directory operations. More recently, some of these sys-

tems have also incorporated support for “resolver” programs that reduce the need for human intervention when rmnlvlng other types of file confllcts [18. 26]. Omcle’s symmetric rephcatlon product

also includes the notion of application-selected resolvers for rela- tional databases [8]. Other systems. llke Lotus Notes [ 15]. do not

Vrite Operation

provide application-specific mechanisms to handle conflicts, but rather create multiple versions of a document, file, or data object when conflicts arise. As wdl become apparent from the next cou-

ple of sections, Bayou’s dependency checks and merge procedures

are more general than these previous techniques.

4.2 Dependency checks

Application-specific conflict detection is accomplished in the

Bayou system through the use of dependency checks. Each Write

operation includes a dependency check consisting of an applica- tion-supplied query and its expected result. A conflict is detected if

the query, when run at a server against its current copy of the data, does not return the expected result. This dependency check 1s a precondition for performing the update that is mchtded in the

Write operation. If the check fails, then the requested update is not

performed and the server revokes a procedure to resolve the detected confllct as outlined in Figure 2 and discussed below.

As an example of apphcauon-detined conflicts, Figure 3 pre- sents a sample Bayou Write operat]on that might be submitted by

the meeting room scheduhng application. This Write ottempts to reserve an hour-long time slot. It includes a dependency check with a single query. written in an SQL-like language, that return>

information about any pre~ Iousl> reserved meetings th:~t o] erl:ip with this ume slot It expects the quer! to return m empt> wt

175

 

 

Bayou’s dependency checks, like the version vectors and times-

tamps traditionally used in distributed systems [12, 19, 25, 27], can

be used to detect Write-Write confhcts. That is, they can be used to

detect when two users update the same data item without one of them first observing the other’s update. Such conflicts can be

detected by having the dependency check query the current values of any data items being updated and ensure that they have not changed from the values they had at the time the Write was sub-

mitted, as M done in Oracle’s rephcated database [8]. Bayou’s dependency checking mechanism is more powerful

than the traditional use of version vectors since it can also be used

to detect Read-Write conflicts. Specifically, each Write operation

can explicitly specify the expected values of any data items on which the update depends, including data items that have been

read but are not being updated. Thtrs, Bayou chents can emulate

the optimistic style of concurrency control employed in some dis-

tributed database systems [4, 6]. For example, a Write operation that installs a new program binary file might only include a depen-

dency check of the sources, including version stamps, from which It was derived. Since the binary does not depend on lts previous value, this need not be included.

Moreover, because dependency queries can read any data in the

server’s replica, dependency checks can enforce arbitrary, muhl –

item integrity constraints on the data. For example, suppose a Write transfers $100 from account A to account B. The applica-

tion, before issuing the Write, reads the balance of account A and

discovers that it currently has $150. Traditional optimistic concur- rency control would check that account A still had $150 before

performing the requested Write operation. The real requirement, however, is that the account have at least $100, and this can easily be specified in the Write’s dependency check. Thus, only if con- current updates cause the balance in account A to drop below $100 will a conflict be detected.

4.3 Merge procedures

Once a conflict is detected, a merge procedure is run by the

Bayou server in an attempt to resolve the conflict. Merge proce-

dures, included with each Write operation, are general programs written m a high-level, interpreted language. They can have

embedded data, such as application-specific knowledge related to

the update that was being attempted, and can perform arbitrary Reads on the current state of the server’s replica. The merge proce-

dure associated with a Write M responsible for resolving any con- flicts detected by its dependency check and for producing a rewsed

update to apply. The complete process of detecting a conflict, nm- ning a merge procedure, and applying the revised update, shown in Figure 2, is performed atomically at each server as part of execut- ing a Write.

In principle, the algorrthm m Figure 2 could be imbedded in each merge procedure, thereby ehrninating any special mecha-

nisms for dependency checking. This approach would require

servers to create a new merge procedure interpreter to execute each Wrrte, which would be overly expensive. Supporting dependency

checks separately allows servers to avoid running the merge proce- dure in the expected case where the Write does not introduce a conflict.

The meeting room scheduling apphcatlon provides good exam-

ples of conflict resolution procedures that are specific not only to a

particular application but also to a patlcular Write operation. In this application, users, well aware that their reservations may be

invalidated by other concurrent users, can specify alternate sched-

uling choices as part of their original scheduling updates. These alternates are encoded in a merge procedure that attempts to

reserve one of the alternate meeting times if the original time is found to be in confhct with some other previously scheduled meet-

ing. An example of such a merge procedure is dlustrated m Figure

3. A different merge procedure altogether could search for the next

available time slot to schedule the meeting, which is an option a user might choose if any time would be satisfactory.

In practice, Bayou merge procedures are written by application

programmers m the form of templates that are instantiated with the appropriate details filled in for each Write. The users of apphca- tions do not have to know about merge procedures, and therefore

about the internal workings of the applications they use, except when automatic confhct resolution cannot be done.

In the case where automatic resolution is not possible, the

merge procedure wdl still run to completion, but is expected to produce a revised update that logs the detected conflict in some

fashion that will enable a person to resolve the conflict later. To

enable manual resolution, perhaps using an interactive merge tool

[22], the conflicting updates must be presented to a user in a man-

ner that allows him to understand what has happened. By conven- tion, most Bayou data collections include an error log for

unresolvable conflicts. Such conventions, however, are outside the domain of the Bayou storage system and may vary according to

the application. In contrast to systems like Coda [18] or Ficus [26] that lock

individual files or complete file volumes when contlcts have been

detected but not yet resolved, Bayou allows replicas to always remain accessible. This permits chents to continue to Read previ- ously written data and to continue to issue new Wrttes. In the

meeting room scheduling application, for example, a user who

only cares about Monday meetings need not concern himself with scheduling conflicts on Wednesday. Of course, the potential draw-

back of this approach is that newly issued Writes may depend on data that M in conflict and may lead to cascaded conflict resolution.

Bayou’s merge procedures resemble the previously mentioned

resolver programs, for which support has been added to a number

of replicated file systems [18, 26]. In these systems, a file-type- specific resolver program is run when a version vector mismatch is detected for a file. This program ]s presented with both the current and proposed tile contents and it can do whatever it wishes in order

to resolve the detected conflict. An example is a resolver program

for a binary file that checks to see if it can find a specltication for how to derive the file from its sources, such as a Unix makefile,

and then recompiles the program in order to obtain a new, “resolved” value for the file. Merge procedures are more general since they can vary for individual Write operations rather than

being associated with the type of the updated data, as illustrated

above for the meeting room scheduling application.

5. Replica Consistency

While the replicas held by two servers at any time may vary in

their contents because they have received and processed d] fferent Writes, a fundamental property of the Bayou design is that all serv-

ers move towards eventual consistency That is, the Bayou system guarantees that all servers eventually receive all Writes wa the pair-wise anti-entropy process and that two servers holding the

same set of Writes will have the same data contents, However, it cannot enforce strict bounds on Write propagation delays since these depend on network connectivity factors that are outside of

Bayou’s control. Two important features of the Bayou system design allows

servers to achieve eventual consistency. First, Writes are per- formed in the same, well-defined order at all servers. Second, the

conflict detection and merge procedures are deterministic so that servers resolve the same conflicts in the same manner.

as

In theory, the execution history at individual servers could vary

long as them execution was equivalent to some global Write

 

 

ordering. For example, Writes known to be commutative could be

performed in any order. In practice, because Bayou’s Write opera-

tions include arbitrary merge procedures, it is effectively impossi-

ble either to determine whether two Writes commute or to

transform two Writes so they can be reordered as has been sug-

gested for some systems [9].

When a Write is accepted by a Bayou server from a client, it is

initially deemed tentative. Tentative Writes are ordered according

to timestamps assigned to them by their accepting servers. Eventu- ally, each Write is committed, by a process described in the next

section. Committed Writes are ordered according to the times at

which they commit and before any tentative Writes.

The only requirement placed on timestamps for tentative Writes is that they be monotonically increasing at each server so

that the pair <timestamp, ID of server that assigned it> produce a total order on Write operations. There is no requirement that serv-

ers have synchronized clocks, which is crucial since trying to

ensure clock synchronization across portable computers is prob-

lematic. However, keeping servers’ clocks reasonably close is

desirable so that the induced Write order is consistent with a user’s

perception of the order in which Writes are submitted. Bayou serv- ers maintain logical clocks [20] to timestamp new Writes. A

server’s logical clock is generally synchronized with lts real-time

system clock, but, to preserve the causal ordering of Write opera-

tions, the server may need to advance its logical clock when Writes

are received during anh-entropy. Enforcing a global order on tentative, as well as committed,

Writes ensures that an isolated cluster of servers will come to

agreement on the tentative resolution of any conflicts that they

encounter. While this is not strictly necessary since clients must be prepared to deal with temporarily inconsistent servers in any case,

we believe it desirable to provide as much internal consistency as

possible. Moreover, clients can expect that the tentative resolution

of conflicts within their cluster will correspond to their eventual permanent resolution, provided that no further conflicts are intro- duced outside the cluster.

Because servers may receive Writes from clients and from other servers in an order that differs from the required execution

order, and because servers immediately apply all known Writes to their replicas, servers must be able to undo the effects of some pre- vious tentative execution of a Write operation and reapply it in a

different order. Interestingly, the number of times that a given Write operation is re-executed depends only on the order in which

Writes arrive via anti-entropy and not on the likelihood of conflicts

involving the Write.

Conceptually, each server maintains a log of all Write opera- tions that it has received, sorted by their committed or tentative timestamps, with committed Writes at the head of the log. The

server’s current data contents are generated by executing all of the Writes in the given order. Techniques for pruning a server’s Write

log and for efficiently maintaining the corresponding data contents

by undoing and redoing Write operations are given in Section 7.

Bayou guarantees that merge procedures, which execute inde- pendently at each server, produce consistent updates by restricting

them to depend only on the server’s current data contents and on any data supplied by the merge procedure itself. In particular, a

merge procedure cannot access time-varying or server-specific “environment” information such as the current system clock or

server’s name. Moreover, merge procedures that fail due to

exceeding their limits on resource usage must fail deterministi-

tally. This means that all servers must place uniform bounds on the CPU and memory resources allocated to a merge procedure and must consistently enforce these bounds during execution. Once these conditions are met, two servers that start with identical repli- cas will end up with identical rephcas after executing a Write.

6. Write Stability and Commitment

A Write is said to be stable at a server when it has been exe-

cuted for the last time by that server. Recall that as servers learn of

new updates by performing anti-entropy with other servers, the

effects of previously executed Write operations may need to be

undone and the Writes re-executed. Thus, a given Write operation

may be executed several times at a server and may produce differ-

ent results depending on the execution history of the server. A Write operation becomes stable when the set of Writes that precede

it in the server’s Write log is fixed. This means that the server has

already received and executed any Writes that could possibly be

ordered before the given Write. Bayou’s notion of stability is simi- lar to that in ordered mtdticast protocols, such as those provided in

the 1S1S toolkit [2]. In many cases, an application can be designed with a notion of

“confirmation” or “commitment” that corresponds to the Bayou

notion of stability. As an example, in the Bayou meeting room

scheduling application, two users may try to schedule separate

meetings for the same time in the same room. Only when one of

the users discovers that his Write has become stable and his sched- ule still shows that he has reserved the room for the desired time,

can he be sure that his tentative reservation has been confirmed. Since clients may want to know when a Write has stabilized,

the Bayou API provides means for inquiring about the stability of

a specific Write. Given a Write’s unique identifier, a client can ask a server whether the given Write is stable at the server. The answer may vary, of course, depending on which server is contacted.

Bayou also provides support for clients that may choose to access

only stable data. How does a server determine whether a Write is stable? One

approach would be to have each server include in the information passed during anti-entropy not only any Writes that have been

accepted by this server but also the current value of the clock that it uses to timestamp new Writes. With suitable assumptions about

the propagation order of Writes, a server could then determine that a Write is stable when it has a lower timestamp than all servers’ clocks. The main drawback of this approach is that a server that

remains disconnected can prevent Writes from stabilizing, which could cause a large number of Writes to be rolled back when the server reconnects.

To speed up the rate at which updates stabihze in an environ-

ment where communication with some servers may not be possible

for extended periods of time, the Bayou system uses a commit pro-

cedure. That is, a Write becomes stable when it is explicitly com-

mitted, and, in fact, we generally use the terms “stable” and “committed” interchangeably in the Bayou system. Committed

Writes, in commit order, are placed ahead of any tentative Writes

in each server’s Write log. This, along with Bayou’s anti-entropy protocol ensuring that servers learn of committed Writes in the

order that they were committed, provides stability. In the Bayou system, we use a primary commir scheme [28].

That M, one server designated as the primary takes responsibihty

for committing updates. Knowledge of which Writes have com- mitted and in which order they were committed then propagates to other servers during anti-entropy. In all other respects, the primary

behaves exactly like any other server. Each replicated data collec- tion can have a different server designated as its primary.

Any commit protocol that prevents different groups of servers

from committing updates in different orders would meet Bayou’s

needs. In our anticipated weak connectivity environment, using a primary to commit data is more attractive than the standard two- phase commit protocol since it alleviates the need to gather a

majority quorum of servers. Consider the case of data that is repli-

cated among laptops that are mostly disconnected. Requiring a majority of these laptops to be in communication with each other

177

 

 

Timestamp Vectors Write Log ,..

~. In Memory I /’ On Stable Storage ,

Figure 4. Bayou Database Organization

at the same time in order to commit updates would be unreason-

able.

The primary commit approach also enables updates to commit

on a disconnected laptop that acts as the primary server. For exam-

ple, suppose a user keeps the primary copy of his calendar with him on his laptop and allows others, such as a spouse or secretary, to keep secondary, mostly read-only copies. In this case, the user’s updates to his own calendar commit immediately. This example illustrates how one might choose the primary to coincide with the locus of update activity, thereby maximizing the rate at which Writes get committed.

Unlike other distributed storage systems in which the ability to commit data is of primary importance, the Bayou design readily

accommodates the temporary unavailability of the primary. The inability of a client to communicate with the primary server, for

instance if the primary crashes or is disconnected, does not prevent it from performing useful Read and Write operations. Writes

accepted by other servers simply remain tentative until they even- tually reach the primary.

Bayou tries to arrange, but cannot ensure, that the order in

which Writes are committed is consistent with the tentative order indicated by their timestamps. Writes from a given server are com-

mitted in timestamp order. Writes from different servers, however,

may commit in a different order based on when the servers per- form anti-entropy with the primary and with each other. Writes held on a disconnected non-primary server, for instance, will com-

mit only after the server reconnects to the rest of the system and could be committed after Writes with later timestamps.

7. Storage System Implementation Issues

The Bayou design places several demands on the underlying

storage system used by each server including the need for space-

efficlent Write logging, efficient undokedo of Write operations, separate views of committed and tentative data, and support for server-to-server anti-entropy. We implemented a storage system

tadored to these special needs. Our implementation is factored into three main components as

shown in Figure 4: the Write Log, the Tuple Store, and the Undo

Log. The Write Log contains Writes that have been received by a Bayou server, sorted by their global committed or tentative order. The server’s Tuple Store is a database that is obtained by executing

the Writes in order and is used to process Read requests. The Undo Log facilitates rolling back tentative Writes that have been applied

Problem Statement – Issues discussed by the author

I need the following after reviewing the paper

Problem Statement – Issues discussed by the author

Approach & design – How the authors approach to the issue & what proposed ideas they mentioned

Strengths and Weakness – strengths & weakness of the proposed approach & design, and about the paper.  what are the key strengths of the authors proposed system and weakness of the system.

Evaluation(Performance) – How the authors evaluated the proposed system, what parameters they used to test the performance

Conclusion(In readers perspective)

Along with these, I need to have a detailed explanation of the paper section-wise:

sections are:

Abstract

Introduction

Bayou Applications

Bayou’s basic system model

Conflict detection and resolution

Replica consistency

Write stability and commitment

Storage system implementation issues

Access control

Status and experience

Conclusion

Summary

Summary

Conclusion