15.1. 병렬 쿼리 작동 원리

15.1. 병렬 쿼리 작동 원리
이전	위로	15장. 병렬 쿼리	처음으로	다음

옵티마이저가 특정 쿼리의 빠른 실행을 위해 병렬 처리를 할 때, 아래와 같이 Gather 또는 Gather Merge를 포함하여 실행계획을 세운다:

EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Gather  (cost=1000.00..217018.43 rows=1 width=97)
   Workers Planned: 2
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..216018.33 rows=1 width=97)
         Filter: (filler ~~ '%x%'::text)
(4 rows)

위 실행 계획 경우, Gather 또는 Gather Merge노드는 병렬로 실행되는 계획에 정확히 하나의 하위 계획을 갖는다. Gather 노드가 실행 계획 트리의 맨 위에 있는 경우 모든 쿼리는 병렬로 실행된다. If the Gather or Gather Merge node is at the very top of the plan tree, then the entire query will execute in parallel. If it is somewhere else in the plan tree, then only the portion of the plan below it will run in parallel. In the example above, the query accesses only one table, so there is only one plan node other than the Gather node itself; since that plan node is a child of the Gather node, it will run in parallel.

Using EXPLAIN, you can see the number of workers chosen by the planner. When the Gather node is reached during query execution, the process which is implementing the user's session will request a number of background worker processes equal to the number of workers chosen by the planner. The number of background workers that the planner will consider using is limited to at most max_parallel_workers_per_gather. The total number of background workers that can exist at any one time is limited by both max_worker_processes and max_parallel_workers. Therefore, it is possible for a parallel query to run with fewer workers than planned, or even with no workers at all. The optimal plan may depend on the number of workers that are available, so this can result in poor query performance. If this occurrence is frequent, consider increasing max_worker_processes and max_parallel_workers so that more workers can be run simultaneously or alternatively reducing max_parallel_workers_per_gather so that the planner requests fewer workers.

Every background worker process which is successfully started for a given parallel query will execute the parallel portion of the plan. The leader will also execute that portion of the plan, but it has an additional responsibility: it must also read all of the tuples generated by the workers. When the parallel portion of the plan generates only a small number of tuples, the leader will often behave very much like an additional worker, speeding up query execution. Conversely, when the parallel portion of the plan generates a large number of tuples, the leader may be almost entirely occupied with reading the tuples generated by the workers and performing any further processing steps which are required by plan nodes above the level of the Gather node or Gather Merge node. In such cases, the leader will do very little of the work of executing the parallel portion of the plan.

When the node at the top of the parallel portion of the plan is Gather Merge rather than Gather, it indicates that each process executing the parallel portion of the plan is producing tuples in sorted order, and that the leader is performing an order-preserving merge. In contrast, Gather reads tuples from the workers in whatever order is convenient, destroying any sort order that may have existed.