This module implements a data type ltree
for representing
labels of data stored in a hierarchical tree-like structure.
Extensive facilities for searching through label trees are provided.
This module is considered “trusted”, that is, it can be
installed by non-superusers who have CREATE
privilege
on the current database.
A label is a sequence of alphanumeric characters
and underscores (for example, in C locale the characters
A-Za-z0-9_
are allowed).
Labels must be less than 256 characters long.
Examples: 42
, Personal_Services
A label path is a sequence of zero or more
labels separated by dots, for example L1.L2.L3
, representing
a path from the root of a hierarchical tree to a particular node. The
length of a label path cannot exceed 65535 labels.
Example: Top.Countries.Europe.Russia
The ltree
module provides several data types:
ltree
stores a label path.
lquery
represents a regular-expression-like pattern
for matching ltree
values. A simple word matches that
label within a path. A star symbol (*
) matches zero
or more labels. These can be joined with dots to form a pattern that
must match the whole label path. For example:
foo Match the exact label pathfoo
*.foo.* Match any label path containing the labelfoo
*.foo Match any label path whose last label isfoo
Both star symbols and simple words can be quantified to restrict how many labels they can match:
*{n
} Match exactlyn
labels *{n
,} Match at leastn
labels *{n
,m
} Match at leastn
but not more thanm
labels *{,m
} Match at mostm
labels — same as *{0,m
} foo{n
,m
} Match at leastn
but not more thanm
occurrences offoo
foo{,} Match any number of occurrences offoo
, including zero
In the absence of any explicit quantifier, the default for a star symbol
is to match any number of labels (that is, {,}
) while
the default for a non-star item is to match exactly once (that
is, {1}
).
There are several modifiers that can be put at the end of a non-star
lquery
item to make it match more than just the exact match:
@ Match case-insensitively, for examplea@
matchesA
* Match any label with this prefix, for examplefoo*
matchesfoobar
% Match initial underscore-separated words
The behavior of %
is a bit complicated. It tries to match
words rather than the entire label. For example
foo_bar%
matches foo_bar_baz
but not
foo_barbaz
. If combined with *
, prefix
matching applies to each word separately, for example
foo_bar%*
matches foo1_bar2_baz
but
not foo1_br2_baz
.
Also, you can write several possibly-modified non-star items separated with
|
(OR) to match any of those items, and you can put
!
(NOT) at the start of a non-star group to match any
label that doesn't match any of the alternatives. A quantifier, if any,
goes at the end of the group; it means some number of matches for the
group as a whole (that is, some number of labels matching or not matching
any of the alternatives).
Here's an annotated example of lquery
:
Top.*{0,2}.sport*@.!football|tennis{1,}.Russ*|Spain a. b. c. d. e.
This query will match any label path that:
begins with the label Top
and next has zero to two labels before
a label beginning with the case-insensitive prefix sport
then has one or more labels, none of which
match football
nor tennis
and then ends with a label beginning with Russ
or
exactly matching Spain
.
ltxtquery
represents a full-text-search-like
pattern for matching ltree
values. An
ltxtquery
value contains words, possibly with the
modifiers @
, *
, %
at the end;
the modifiers have the same meanings as in lquery
.
Words can be combined with &
(AND),
|
(OR), !
(NOT), and parentheses.
The key difference from
lquery
is that ltxtquery
matches words without
regard to their position in the label path.
Here's an example ltxtquery
:
Europe & Russia*@ & !Transportation
This will match paths that contain the label Europe
and
any label beginning with Russia
(case-insensitive),
but not paths containing the label Transportation
.
The location of these words within the path is not important.
Also, when %
is used, the word can be matched to any
underscore-separated word within a label, regardless of position.
Note: ltxtquery
allows whitespace between symbols, but
ltree
and lquery
do not.
Type ltree
has the usual comparison operators
=
, <>
,
<
, >
, <=
, >=
.
Comparison sorts in the order of a tree traversal, with the children
of a node sorted by label text. In addition, the specialized
operators shown in 표 F.13 are available.
표 F.13. ltree
Operators
Operator Description |
---|
Is left argument an ancestor of right (or equal)? |
Is left argument a descendant of right (or equal)? |
Does |
Does |
Does |
Concatenates |
Converts text to |
Does array contain an ancestor of |
Does array contain a descendant of |
Does array contain any path matching |
Does |
Does array contain any path matching |
Returns first array entry that is an ancestor of |
Returns first array entry that is a descendant of |
Returns first array entry that matches |
Returns first array entry that matches |
The operators <@
, @>
,
@
and ~
have analogues
^<@
, ^@>
, ^@
,
^~
, which are the same except they do not use
indexes. These are useful only for testing purposes.
The available functions are shown in 표 F.14.
표 F.14. ltree
Functions
ltree
supports several types of indexes that can speed
up the indicated operators:
B-tree index over ltree
:
<
, <=
, =
,
>=
, >
GiST index over ltree
(gist_ltree_ops
opclass):
<
, <=
, =
,
>=
, >
,
@>
, <@
,
@
, ~
, ?
gist_ltree_ops
GiST opclass approximates a set of
path labels as a bitmap signature. Its optional integer parameter
siglen
determines the
signature length in bytes. The default signature length is 8 bytes.
Valid values of signature length are between 1 and 2024 bytes. Longer
signatures lead to a more precise search (scanning a smaller fraction of the index and
fewer heap pages), at the cost of a larger index.
Example of creating such an index with the default signature length of 8 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (path);
Example of creating such an index with a signature length of 100 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (path gist_ltree_ops(siglen=100));
GiST index over ltree[]
(gist__ltree_ops
opclass):
ltree[] <@ ltree
, ltree @> ltree[]
,
@
, ~
, ?
gist__ltree_ops
GiST opclass works similarly to
gist_ltree_ops
and also takes signature length as
a parameter. The default value of siglen
in
gist__ltree_ops
is 28 bytes.
Example of creating such an index with the default signature length of 28 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
Example of creating such an index with a signature length of 100 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (array_path gist__ltree_ops(siglen=100));
Note: This index type is lossy.
This example uses the following data (also available in file
contrib/ltree/ltreetest.sql
in the source distribution):
CREATE TABLE test (path ltree); INSERT INTO test VALUES ('Top'); INSERT INTO test VALUES ('Top.Science'); INSERT INTO test VALUES ('Top.Science.Astronomy'); INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics'); INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology'); INSERT INTO test VALUES ('Top.Hobbies'); INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy'); INSERT INTO test VALUES ('Top.Collections'); INSERT INTO test VALUES ('Top.Collections.Pictures'); INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy'); INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars'); INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies'); INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts'); CREATE INDEX path_gist_idx ON test USING GIST (path); CREATE INDEX path_idx ON test USING BTREE (path);
Now, we have a table test
populated with data describing
the hierarchy shown below:
Top / | \ Science Hobbies Collections / | \ Astronomy Amateurs_Astronomy Pictures / \ | Astrophysics Cosmology Astronomy / | \ Galaxies Stars Astronauts
We can do inheritance:
ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science'; path ------------------------------------ Top.Science Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology (4 rows)
Here are some examples of path matching:
ltreetest=> SELECT path FROM test WHERE path ~ '*.Astronomy.*'; path ----------------------------------------------- Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Collections.Pictures.Astronomy Top.Collections.Pictures.Astronomy.Stars Top.Collections.Pictures.Astronomy.Galaxies Top.Collections.Pictures.Astronomy.Astronauts (7 rows) ltreetest=> SELECT path FROM test WHERE path ~ '*.!pictures@.Astronomy.*'; path ------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology (3 rows)
Here are some examples of full text search:
ltreetest=> SELECT path FROM test WHERE path @ 'Astro*% & !pictures@'; path ------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Hobbies.Amateurs_Astronomy (4 rows) ltreetest=> SELECT path FROM test WHERE path @ 'Astro* & !pictures@'; path ------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology (3 rows)
Path construction using functions:
ltreetest=> SELECT subpath(path,0,2)||'Space'||subpath(path,2) FROM test WHERE path <@ 'Top.Science.Astronomy'; ?column? ------------------------------------------ Top.Science.Space.Astronomy Top.Science.Space.Astronomy.Astrophysics Top.Science.Space.Astronomy.Cosmology (3 rows)
We could simplify this by creating a SQL function that inserts a label at a specified position in a path:
CREATE FUNCTION ins_label(ltree, int, text) RETURNS ltree AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);' LANGUAGE SQL IMMUTABLE; ltreetest=> SELECT ins_label(path,2,'Space') FROM test WHERE path <@ 'Top.Science.Astronomy'; ins_label ------------------------------------------ Top.Science.Space.Astronomy Top.Science.Space.Astronomy.Astrophysics Top.Science.Space.Astronomy.Cosmology (3 rows)
Additional extensions are available that implement transforms for
the ltree
type for PL/Python. The extensions are
called ltree_plpythonu
, ltree_plpython2u
,
and ltree_plpython3u
(see 45.1절 for the PL/Python naming
convention). If you install these transforms and specify them when
creating a function, ltree
values are mapped to Python lists.
(The reverse is currently not supported, however.)
It is strongly recommended that the transform extensions be installed in
the same schema as ltree
. Otherwise there are
installation-time security hazards if a transform extension's schema
contains objects defined by a hostile user.
All work was done by Teodor Sigaev (<teodor@stack.net>
) and
Oleg Bartunov (<oleg@sai.msu.su>
). See
http://www.sai.msu.su/~megera/postgres/gist/ for
additional information. Authors would like to thank Eugeny Rodichev for
helpful discussions. Comments and bug reports are welcome.