PostgreSQL's implementation of the TABLESAMPLE
clause supports custom table sampling methods, in addition to
the BERNOULLI
and SYSTEM
methods that are required
by the SQL standard. The sampling method determines which rows of the
table will be selected when the TABLESAMPLE
clause is used.
At the SQL level, a table sampling method is represented by a single SQL function, typically implemented in C, having the signature
method_name(internal) RETURNS tsm_handler
The name of the function is the same method name appearing in the
TABLESAMPLE
clause. The internal
argument is a dummy
(always having value zero) that simply serves to prevent this function from
being called directly from a SQL command.
The result of the function must be a palloc'd struct of
type TsmRoutine
, which contains pointers to support functions for
the sampling method. These support functions are plain C functions and
are not visible or callable at the SQL level. The support functions are
described in 57.1절.
In addition to function pointers, the TsmRoutine
struct must
provide these additional fields:
List *parameterTypes
This is an OID list containing the data type OIDs of the parameter(s)
that will be accepted by the TABLESAMPLE
clause when this
sampling method is used. For example, for the built-in methods, this
list contains a single item with value FLOAT4OID
, which
represents the sampling percentage. Custom sampling methods can have
more or different parameters.
bool repeatable_across_queries
If true
, the sampling method can deliver identical samples
across successive queries, if the same parameters
and REPEATABLE
seed value are supplied each time and the
table contents have not changed. When this is false
,
the REPEATABLE
clause is not accepted for use with the
sampling method.
bool repeatable_across_scans
If true
, the sampling method can deliver identical samples
across successive scans in the same query (assuming unchanging
parameters, seed value, and snapshot).
When this is false
, the planner will not select plans that
would require scanning the sampled table more than once, since that
might result in inconsistent query output.
The TsmRoutine
struct type is declared
in src/include/access/tsmapi.h
, which see for additional
details.
The table sampling methods included in the standard distribution are good
references when trying to write your own. Look into
the src/backend/access/tablesample
subdirectory of the source
tree for the built-in sampling methods, and into the contrib
subdirectory for add-on methods.