Transaction, Concurrency: ACID Properties: November 2007

DATABASE TRANSACTION

A database transaction is a unit of interaction with a database management system or similar system that is treated in a coherent and reliable way independent of other transactions. In general, a database transaction must be atomic, meaning that it must be either entirely completed or aborted. Ideally, a database system will guarantee the properties of Atomicity, Consistency, Isolation and Durability (ACID) for each transaction. In practice, these properties are often relaxed somewhat to provide better performance.

In some systems, transactions are also called LUWs for Logical Units of Work.

Purpose of Transaction

In database products the ability to handle transactions allows the user to ensure that integrity of a database is maintained.

A single transaction might require several queries, each reading and/or writing information in the database. When this happens it is usually important to be sure that the database is not left with only some of the queries carried out. For example, when doing a money transfer, if the money was debited from one account, it is important that it also be credited to the depositing account. Also, transactions should not interfere with each other. For more information about desirable transaction properties, see ACID.

A simple transaction is usually issued to the database system in a language like SQL in this form:

Begin the transaction
Execute several queries (although any updates to the database aren't actually visible to the outside world yet)
Commit the transaction (updates become visible if the transaction is successful)

If one of the queries fails the database system may rollback either the entire transaction or just the failed query. This behavior is dependent on the DBMS in use and how it is set up. The transaction can also be rolled back manually at any time before the commit.

CONCURRENCY CONTROL

In computer science, especially in the fields of computer programming (see also concurrent programming, parallel programming), operating systems (see also parallel computing) , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

Concurrency control in databases

Concurrency control in database management systems (DBMS) ensures that database transactions are performed concurrently without the concurrency violating the data integrity of a database. Transactions should be executed safely and follow the ACID rules, as described below. The DBMS must guarantee that only serializable (unless Serializability is relaxed), recoverable schedules are generated, and also that no committed actions are lost while undoing aborted transactions.

Transaction ACID rules

Atomicity - Either the effects of all or none of its operations remain when a transaction is completed - in other words, to the outside world the transaction appears to be indivisible, atomic.

Consistency - Every transaction must leave the database in a consistent state.

Isolation - Transactions cannot interfere with each other. Providing isolation is the main goal of concurrency control.

Durability - Successful transactions must persist through crashes.

Concurrency control mechanism

The main categories of concurrency control mechanisms are:

Optimistic - Delay the synchronization for a transaction until its end without blocking (read, write) operations, and then abort transactions that violate desired synchronization rules.

Pessimistic - Block operations of transaction that would cause violation of synchronization rules.

There are several methods for concurrency control. Among them:

Two-phase locking

Strict two-phase locking

Conservative two-phase locking

Index locking

Multiple granularity locking

A Lock is a database system object associated with a database object (typically a data item) that prevents undesired (typically synchronization rule violating) operations of other transactions by blocking them. Database system operations check for lock existence, and halt when noticing a lock type that is intended to block them.

There are also non-lock concurrency control methods, among them:

Conflict (serializability, precedence) graph checking

Timestamp ordering

commitment ordering

Almost all currently implemented lock-based and non-lock-based concurrency control mechanisms guarantee schedules that are conflict serializable (unless relaxed forms of serializability are needed). However, there are many research texts encouraging view serializable schedules for possible gains in performance, especially when not too many conflicts exist (and not too many aborts of completely executed transactions occur), due to reducing the considerable overhead of blocking mechanisms.

ACID

In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction.

An example of a transaction is a transfer of funds from one account to another, even though it might consist of multiple individual operations (such as debiting one account and crediting another).

Atomicity

Atomicity refers to the ability of the DBMS to guarantee that either all of the tasks of a transaction are performed or none of them are. The transfer of funds can be completed or it can fail for a multitude of reasons, but atomicity guarantees that one account won't be debited if the other is not credited as well.

Consistency

Consistency refers to the database being in a legal state when the transaction begins and when it ends. This means that a transaction cannot break the rules, or integrity constraints, of the database. If an integrity constraint states that all accounts must have a positive balance, then any transaction violating this rule will be aborted.

Isolation

Isolation refers to the ability of the application to make operations in a transaction appear isolated from all other operations. This means that no operation outside the transaction can ever see the data in an intermediate state; a bank manager can see the transferred funds on one account or the other, but never on both — even if he ran his query while the transfer was still being processed. More formally, isolation means the transaction history (or schedule) is serializable. For performance reasons, this ability is the most often relaxed constraint.

Durability

Durability refers to the guarantee that once the user has been notified of success, the transaction will persist, and not be undone. This means it will survive system failure, and that the database system has checked the integrity constraints and won't need to abort the transaction. Many databases implement durability by writing all transactions into a log that can be played back to recreate the system state right before the failure. A transaction can only be deemed committed after it is safely in the log.

MULTIVALUED DEPENDENCY

A multivalued dependency is a full constraint between two sets of attributes in a relation.

In contrast to the functional dependency, the multivalued dependency requires that certain tuples be present in a relation. Therefore, a multivalued dependency is a special case of tuple-generating dependency. The multivalued dependency plays a role in the 4NF database normalization.

Consider this example of a database of teaching courses, the books recommended for the course, and the lecturers who will be teaching the course:

Teaching databaseCourse Book Lecturer

AHA Silberschatz John D

AHA Nederpelt John D

AHA Silberschatz William M

AHA Nederpelt William M

AHA Silberschatz Christian G

AHA Nederpelt Christian G

OSO Silberschatz John D

OSO Silberschatz William M

Because the lecturers attached to the course and the books attached to the course are independent of each other, this database design has a multivalued dependency; if we were to add a new book to the AHA course, we would have to add one record for each of the lecturers on that course, and vice versa.

Put formally, there are two in this relation: {course} {book} and equivalently {course} {lecturer}.

Databases with multivalued dependencies thus exhibit redundancy. In database normalization, fourth normal form requires that databases have no multivalued dependencies.

FOURTH NORMAL FORM (4NF)

Fourth normal form (4NF) is a normal form used in database normalization. 4NF ensures that independent multivalued facts are correctly and efficiently represented in a database design. 4NF is the next level of normalization after Boyce-Codd normal form (BCNF).

The definition of 4NF relies on the notion of a multivalued dependency. A table with a multivalued dependency is one where the existence of two or more independent many-to-many relationships in a table causes redundancy; and it is this redundancy which is removed by fourth normal form.

Consider the following example:

Pizza Delivery PermutationsRestaurant Pizza Variety Delivery Area

Vincenzo's Pizza Thick Crust Springfield

Vincenzo's Pizza Thick Crust Shelbyville

Vincenzo's Pizza Thin Crust Springfield

Vincenzo's Pizza Thin Crust Shelbyville

Elite Pizza Thin Crust Capital City

Elite Pizza Stuffed Crust Capital City

A1 Pizza Thick Crust Springfield

A1 Pizza Thick Crust Shelbyville

A1 Pizza Thick Crust Capital City

A1 Pizza Stuffed Crust Springfield

A1 Pizza Stuffed Crust Shelbyville

A1 Pizza Stuffed Crust Capital City

Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.

Notice that because the table has a unique key and no non-key attributes, it does not violate any normal form up to BCNF. But because the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers, there is redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza start producing Cheese Crust pizzas then we will need to add multiple records, one for each of A1 Pizza's delivery areas. In formal terms, this is described as Pizza Variety having a multivalued dependency on Restaurant.

To satisfy 4NF, we must place the facts about varieties offered into a different table from the facts about delivery areas:

Varieties By RestaurantRestaurant Pizza Variety

Vincenzo's Pizza Thick Crust

Vincenzo's Pizza Thin Crust

Elite Pizza Thin Crust

Elite Pizza Stuffed Crust

A1 Pizza Thick Crust

A1 Pizza Stuffed Crust

Delivery Areas By RestaurantRestaurant Delivery Area

Vincenzo's Pizza Springfield

Vincenzo's Pizza Shelbyville

Elite Pizza Capital City

A1 Pizza Springfield

A1 Pizza Shelbyville

A1 Pizza Capital City

In contrast, if the pizza varieties offered by a restaurant sometimes did vary from one delivery area to another, the original three-column table would satisfy 4NF.

Ronald Fagin demonstrated that it is always possible to achieve 4NF (but not always desirable). Rissanen's theorem is also applicable on multivalued dependencies.

Transaction, Concurrency: ACID Properties

Tuesday, November 6, 2007

Blog Archive

About Me