SQL 표준안에서는 네 종류의 트랜잭션 격리 수준을 정의하고 있다. 가장 엄격한 격리 수준은 직렬화가 가능한 스냅숏 격리인데, 이것은 마치 여러 세션의 같은 트랜잭션 작업을 한 줄로 세워 차례 대로 진행하는 것과 같은 결과를 보장한다. 나머지 세 종류의 격리는 동시에 진행 되는 트랜잭션들 사이 허용 되는 작업의 범위에 따라서 구분된다. 표준안에서는 직렬화 가능한 트랜잭션 격리 수준은 동시에 진행되는 트랜잭션 사이의 상호 관계가 전혀 없어야 한다고 정의한다. (이것은 놀라운 이야기다 -- 실 세계에서는 당연히 동시에 여러 트랜잭션들이 발생할 것이고, 이것들이 어떻게 서로 상호 관계를 안 할 수 있을까?)
표준안에서 소개하고 있는 각 수준별 이름과 상호 작용 범위는 다음과 같다:
한 트랜잭션은 다른 트랜잭션에 아직 커밋하지 않은 자료도 읽을 수 있다.
한 트랜잭션은 다른 트랜잭션에서 커밋한 자료를 읽을 수 있다. (처음 어떤 자료를 읽고, 다시 읽으려고 하는데, 그 사이 다른 트랜잭션이 자료를 변경하고 커밋했다면, 다음 읽는 값이 커밋된 값으로 읽을 수 있다.)
위와 같은 상황에서 다른 트랜잭션에 의한 커밋된 자료가 있다 하더라도, 항상 자신의 트랜잭션에서 조회 했던 그 자료값 그대로 보여준다.
The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.
SQL 표준과 PostgreSQL에서 구현한 트랜잭션 격리수준은 표 13.1에서 설명한다.
표 13.1. 트랜잭션 격리 수준
격리 수준 | Dirty Read | Nonrepeatable Read | Phantom Read | Serialization Anomaly |
---|---|---|---|---|
Read uncommitted | 허용, PG에서는 없음 | 가능 | 가능 | 가능 |
Read committed | 불가능 | 가능 | 가능 | 가능 |
Repeatable read | 불가능 | 불가능 | 허용, PG에서는 없음 | 가능 |
Serializable | 불가능 | 불가능 | 불가능 | 불가능 |
In PostgreSQL, you can request any of the four standard transaction isolation levels, but internally only three distinct isolation levels are implemented, i.e. PostgreSQL's Read Uncommitted mode behaves like Read Committed. This is because it is the only sensible way to map the standard isolation levels to PostgreSQL's multiversion concurrency control architecture.
The table also shows that PostgreSQL's Repeatable Read implementation does not allow phantom reads. Stricter behavior is permitted by the SQL standard: the four isolation levels only define which phenomena must not happen, not which phenomena must happen. The behavior of the available isolation levels is detailed in the following subsections.
트랜잭션 격리 수준을 지정하려면, SET TRANSACTION 명령을 사용한다.
PostgreSQL 몇몇 자료형과 함수는
트랜잭션 내 특별한 형태로 읽기 특성을 제공한다.
특히, 자동 증가 칼럼으로 사용하는 serial
자료형과, 시퀀스 같은 객체는 롤백이 없으며,
자료 변경 즉시 다른 모든 세션에서도 그 변경된 값을
볼 수 있다. 9.17절,
8.1.4절 참조.
Read Committed 수준은
PostgreSQL 기본 격리 수준이다.
이 격리 수준을 사용하면,
SELECT
쿼리
(FOR UPDATE/SHARE
절을 사용하지 않는)
는 이것을 실행되기 전에 커밋된 자료를 보여 준다;
It never sees either uncommitted
data or changes committed during query execution by concurrent
transactions. In effect, a SELECT
query sees
a snapshot of the database as of the instant the query begins to
run. However, SELECT
does see the effects
of previous updates executed within its own transaction, even
though they are not yet committed. Also note that two successive
SELECT
commands can see different data, even
though they are within a single transaction, if other transactions
commit changes during execution of the first SELECT
.
UPDATE
, DELETE
, SELECT
FOR UPDATE
, and SELECT FOR SHARE
commands
behave the same as SELECT
in terms of searching for target rows: they will only find target rows
that were committed as of the command start time. However, such a target
row might have already been updated (or deleted or locked) by
another concurrent transaction by the time it is found. In this case, the
would-be updater will wait for the first updating transaction to commit or
roll back (if it is still in progress). If the first updater rolls back,
then its effects are negated and the second updater can proceed with
updating the originally found row. If the first updater commits, the
second updater will ignore the row if the first updater deleted it,
otherwise it will attempt to apply its operation to the updated version of
the row. The search condition of the command (the WHERE
clause) is
re-evaluated to see if the updated version of the row still matches the
search condition. If so, the second updater proceeds with its operation
using the updated version of the row. In the case of
SELECT FOR UPDATE
and SELECT FOR
SHARE
, this means it is the updated version of the row that is
locked and returned to the client.
Because of the above rules, it is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database. This behavior makes Read Committed mode unsuitable for commands that involve complex search conditions; however, it is just right for simpler cases. For example, consider updating bank balances with transactions like:
BEGIN; UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345; UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534; COMMIT;
If two such transactions concurrently try to change the balance of account 12345, we clearly want the second transaction to start with the updated version of the account's row. Because each command is affecting only a predetermined row, letting it see the updated version of the row does not create any troublesome inconsistency.
More complex usage can produce undesirable results in Read Committed
mode. For example, consider a DELETE
command
operating on data that is being both added and removed from its
restriction criteria by another command, e.g., assume
website
is a two-row table with
website.hits
equaling 9
and
10
:
BEGIN; UPDATE website SET hits = hits + 1; -- run from another session: DELETE FROM website WHERE hits = 10; COMMIT;
The DELETE
will have no effect even though
there is a website.hits = 10
row before and
after the UPDATE
. This occurs because the
pre-update row value 9
is skipped, and when the
UPDATE
completes and DELETE
obtains a lock, the new row value is no longer 10
but
11
, which no longer matches the criteria.
Because Read Committed mode starts each command with a new snapshot that includes all transactions committed up to that instant, subsequent commands in the same transaction will see the effects of the committed concurrent transaction in any case. The point at issue above is whether or not a single command sees an absolutely consistent view of the database.
The partial transaction isolation provided by Read Committed mode is adequate for many applications, and this mode is fast and simple to use; however, it is not sufficient for all cases. Applications that do complex queries and updates might require a more rigorously consistent view of the database than Read Committed mode provides.
The Repeatable Read isolation level only sees data committed before the transaction began; it never sees either uncommitted data or changes committed during transaction execution by concurrent transactions. (However, the query does see the effects of previous updates executed within its own transaction, even though they are not yet committed.) This is a stronger guarantee than is required by the SQL standard for this isolation level, and prevents all of the phenomena described in 표 13.1 except for serialization anomalies. As mentioned above, this is specifically allowed by the standard, which only describes the minimum protections each isolation level must provide.
This level is different from Read Committed in that a query in a
repeatable read transaction sees a snapshot as of the start of the
first non-transaction-control statement in the
transaction, not as of the start
of the current statement within the transaction. Thus, successive
SELECT
commands within a single
transaction see the same data, i.e., they do not see changes made by
other transactions that committed after their own transaction started.
Applications using this level must be prepared to retry transactions due to serialization failures.
UPDATE
, DELETE
, SELECT
FOR UPDATE
, and SELECT FOR SHARE
commands
behave the same as SELECT
in terms of searching for target rows: they will only find target rows
that were committed as of the transaction start time. However, such a
target row might have already been updated (or deleted or locked) by
another concurrent transaction by the time it is found. In this case, the
repeatable read transaction will wait for the first updating transaction to commit or
roll back (if it is still in progress). If the first updater rolls back,
then its effects are negated and the repeatable read transaction can proceed
with updating the originally found row. But if the first updater commits
(and actually updated or deleted the row, not just locked it)
then the repeatable read transaction will be rolled back with the message
ERROR: could not serialize access due to concurrent update
because a repeatable read transaction cannot modify or lock rows changed by other transactions after the repeatable read transaction began.
When an application receives this error message, it should abort the current transaction and retry the whole transaction from the beginning. The second time through, the transaction will see the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update.
Note that only updating transactions might need to be retried; read-only transactions will never have serialization conflicts.
The Repeatable Read mode provides a rigorous guarantee that each transaction sees a completely stable view of the database. However, this view will not necessarily always be consistent with some serial (one at a time) execution of concurrent transactions of the same level. For example, even a read only transaction at this level may see a control record updated to show that a batch has been completed but not see one of the detail records which is logically part of the batch because it read an earlier revision of the control record. Attempts to enforce business rules by transactions running at this isolation level are not likely to work correctly without careful use of explicit locks to block conflicting transactions.
The Repeatable Read isolation level is implemented using a technique known in academic database literature and in some other database products as Snapshot Isolation. Differences in behavior and performance may be observed when compared with systems that use a traditional locking technique that reduces concurrency. Some other systems may even offer Repeatable Read and Snapshot Isolation as distinct isolation levels with different behavior. The permitted phenomena that distinguish the two techniques were not formalized by database researchers until after the SQL standard was developed, and are outside the scope of this manual. For a full treatment, please see [berenson95].
Prior to PostgreSQL version 9.1, a request for the Serializable transaction isolation level provided exactly the same behavior described here. To retain the legacy Serializable behavior, Repeatable Read should now be requested.
The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution for all committed transactions; as if transactions had been executed one after another, serially, rather than concurrently. However, like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures. In fact, this isolation level works exactly the same as Repeatable Read except that it monitors for conditions which could make execution of a concurrent set of serializable transactions behave in a manner inconsistent with all possible serial (one at a time) executions of those transactions. This monitoring does not introduce any blocking beyond that present in repeatable read, but there is some overhead to the monitoring, and detection of the conditions which could cause a serialization anomaly will trigger a serialization failure.
As an example,
consider a table mytab
, initially containing:
class | value -------+------- 1 | 10 1 | 20 2 | 100 2 | 200
Suppose that serializable transaction A computes:
SELECT SUM(value) FROM mytab WHERE class = 1;
and then inserts the result (30) as the value
in a
new row with class
= 2
. Concurrently, serializable
transaction B computes:
SELECT SUM(value) FROM mytab WHERE class = 2;
and obtains the result 300, which it inserts in a new row with
class
= 1
. Then both transactions try to commit.
If either transaction were running at the Repeatable Read isolation level,
both would be allowed to commit; but since there is no serial order of execution
consistent with the result, using Serializable transactions will allow one
transaction to commit and will roll the other back with this message:
ERROR: could not serialize access due to read/write dependencies among transactions
This is because if A had executed before B, B would have computed the sum 330, not 300, and similarly the other order would have resulted in a different sum computed by A.
When relying on Serializable transactions to prevent anomalies, it is important that any data read from a permanent user table not be considered valid until the transaction which read it has successfully committed. This is true even for read-only transactions, except that data read within a deferrable read-only transaction is known to be valid as soon as it is read, because such a transaction waits until it can acquire a snapshot guaranteed to be free from such problems before starting to read any data. In all other cases applications must not depend on results read during a transaction that later aborted; instead, they should retry the transaction until it succeeds.
To guarantee true serializability PostgreSQL
uses predicate locking, which means that it keeps locks
which allow it to determine when a write would have had an impact on
the result of a previous read from a concurrent transaction, had it run
first. In PostgreSQL these locks do not
cause any blocking and therefore can not play any part in
causing a deadlock. They are used to identify and flag dependencies
among concurrent Serializable transactions which in certain combinations
can lead to serialization anomalies. In contrast, a Read Committed or
Repeatable Read transaction which wants to ensure data consistency may
need to take out a lock on an entire table, which could block other
users attempting to use that table, or it may use SELECT FOR
UPDATE
or SELECT FOR SHARE
which not only
can block other transactions but cause disk access.
Predicate locks in PostgreSQL, like in most
other database systems, are based on data actually accessed by a
transaction. These will show up in the
pg_locks
system view with a mode
of SIReadLock
. The
particular locks
acquired during execution of a query will depend on the plan used by
the query, and multiple finer-grained locks (e.g., tuple locks) may be
combined into fewer coarser-grained locks (e.g., page locks) during the
course of the transaction to prevent exhaustion of the memory used to
track the locks. A READ ONLY
transaction may be able to
release its SIRead locks before completion, if it detects that no
conflicts can still occur which could lead to a serialization anomaly.
In fact, READ ONLY
transactions will often be able to
establish that fact at startup and avoid taking any predicate locks.
If you explicitly request a SERIALIZABLE READ ONLY DEFERRABLE
transaction, it will block until it can establish this fact. (This is
the only case where Serializable transactions block but
Repeatable Read transactions don't.) On the other hand, SIRead locks
often need to be kept past transaction commit, until overlapping read
write transactions complete.
Consistent use of Serializable transactions can simplify development.
The guarantee that any set of successfully committed concurrent
Serializable transactions will have the same effect as if they were run
one at a time means that if you can demonstrate that a single transaction,
as written, will do the right thing when run by itself, you can have
confidence that it will do the right thing in any mix of Serializable
transactions, even without any information about what those other
transactions might do, or it will not successfully commit. It is
important that an environment which uses this technique have a
generalized way of handling serialization failures (which always return
with a SQLSTATE value of '40001'), because it will be very hard to
predict exactly which transactions might contribute to the read/write
dependencies and need to be rolled back to prevent serialization
anomalies. The monitoring of read/write dependencies has a cost, as does
the restart of transactions which are terminated with a serialization
failure, but balanced against the cost and blocking involved in use of
explicit locks and SELECT FOR UPDATE
or SELECT FOR
SHARE
, Serializable transactions are the best performance choice
for some environments.
While PostgreSQL's Serializable transaction isolation level only allows concurrent transactions to commit if it can prove there is a serial order of execution that would produce the same effect, it doesn't always prevent errors from being raised that would not occur in true serial execution. In particular, it is possible to see unique constraint violations caused by conflicts with overlapping Serializable transactions even after explicitly checking that the key isn't present before attempting to insert it. This can be avoided by making sure that all Serializable transactions that insert potentially conflicting keys explicitly check if they can do so first. For example, imagine an application that asks the user for a new key and then checks that it doesn't exist already by trying to select it first, or generates a new key by selecting the maximum existing key and adding one. If some Serializable transactions insert new keys directly without following this protocol, unique constraints violations might be reported even in cases where they could not occur in a serial execution of the concurrent transactions.
For optimal performance when relying on Serializable transactions for concurrency control, these issues should be considered:
Declare transactions as READ ONLY
when possible.
Control the number of active connections, using a connection pool if needed. This is always an important performance consideration, but it can be particularly important in a busy system using Serializable transactions.
Don't put more into a single transaction than needed for integrity purposes.
Don't leave connections dangling “idle in transaction” longer than necessary. The configuration parameter idle_in_transaction_session_timeout may be used to automatically disconnect lingering sessions.
Eliminate explicit locks, SELECT FOR UPDATE
, and
SELECT FOR SHARE
where no longer needed due to the
protections automatically provided by Serializable transactions.
When the system is forced to combine multiple page-level predicate locks into a single relation-level predicate lock because the predicate lock table is short of memory, an increase in the rate of serialization failures may occur. You can avoid this by increasing max_pred_locks_per_transaction, max_pred_locks_per_relation, and/or max_pred_locks_per_page.
A sequential scan will always necessitate a relation-level predicate lock. This can result in an increased rate of serialization failures. It may be helpful to encourage the use of index scans by reducing random_page_cost and/or increasing cpu_tuple_cost. Be sure to weigh any decrease in transaction rollbacks and restarts against any overall change in query execution time.
The Serializable isolation level is implemented using a technique known in academic database literature as Serializable Snapshot Isolation, which builds on Snapshot Isolation by adding checks for serialization anomalies. Some differences in behavior and performance may be observed when compared with other systems that use a traditional locking technique. Please see [ports12] for detailed information.