28.1.4 Plain Text Formats
The graph server (PGX) supports the following plain-text formats:
- Comma-Separated Values (CSV)
- Adjacency List (ADJ_LIST)
- Edge List (EDGE_LIST)
- Two Tables (TWO_TABLES)
- Flat File (FLAT_FILE)
Note:
Starting from Graph Server and Client Release 25.1, Adjacency List (ADJ_LIST), Edge List (EDGE_LIST), Two Tables (TWO_TABLES), and Flat File (FLAT_FILE) formats are deprecated.Note that loading graphs from files encoded in UTF-8 without Byte Order Mark (BOM) is only supported. Therefore, to successfully load graph from files, ensure text-based provider files are UTF-8 encoded without a BOM.
Parsing of Vertices
PGX supports three types of vertex identifies (id): integer
, long
and string
. The type defaults to integer
, but can be configured through the vertex_id_type
option in the graph configuration.
Parsing of Edges
Of the various formats and protocols supported by graph server (PGX), only CSV and flat file parsing support edge identifiers. For all other data sources, the id of an edge is PGX's internal id, which is an integer from zero to num_edges - 1
.
Parsing of Properties
string
properties, spatial properties (currently only point2d
) and temporal properties (date
, local_date
, time
, timestamp
, time_with_timezone
and timestamp_with_timezone
) must be quoted ("<string>
") only if they contain a separator character (usually ,
for CSV and ' '
for Edge List and Adjacency List) or if they contain "
or \n
.
date
properties are parsed using Java's SimpleDateFormat utility, instantiated with the format string yyyy-MM-dd HH:mm:ss
unless specified otherwise in the graph configuration. All other types of temporal properties are parsed using Java's DateTimeFormatter utility.
point2d
can be specified by its longitude followed by its latitude, separated by a space. Both longitude and latitude are doubles. For example, "-74.0445 40.6892
" is the representation of a point2d instance representing the location of the Statue of Liberty.
Boolean values are interpreted as true if the value is true
(ignoring case), Y
(ignoring case) or 1
, false otherwise. The suggested notation for false is false
(ignoring case), N
(ignoring case) or 0
. All other types are parsed using the parseXXX()
functions of its corresponding Java type, for example, Integer.parseInt(...)
for integer types.
Vector properties are supported in the Adjacency List (ADJ_LIST), Comma-Separated Values (CSV), Edge List (EDGE_LIST), and Two Tables text (TWO_TABLES) formats. Vector properties with vector components of type integer
, long
, float
and double
can be loaded from these formats. In order to specify that a vertex or edge property is a vector property, the dimension
field of the graph property configuration must be set to the dimension of the vector and be a strictly positive integer value. A vector value is represented in the supported text formats by the list of the vector components values separated by the vector component delimiter. By default the vector component delimiter is ;
, but this delimiter can be changed by changing the vector_component_delimiter
graph configuration entry. Therefore a 3-dimensional vector of doubles could for example look like 0.1;0.0004;3.14
in the text file if the vector component delimiter is ;
.
Separators
When using single file formats, IDs and properties are separated with tab
or one single space ("\t "
) by default, for multiple file formats comma (",
") is used instead. However, PGX allows to configure the separator string.
Parallel Loading
The following formats support parallel loading from multiple files:
- CSV (specify multiple files in vertex_uris and/or edge_uris)
- Adjacency List (specify multiple files in uris)
- Edge List (specify multiple files in uris)
- Two Tables (specify multiple files in vertex_uris and/or edge_uris)
- Flat File (specify multiple files in vertex_uris and/or edge_uris)
Legend
The following abbreviations are used to specify text formats:
- V = Vertex Key
- VG = Neighbor Vertex
- VL = Vertex Labels
- VP = Vertex Property
- VPK = Vertex Property Key
- VPT = Vertex Property Type
- EL = Edge Label
- EP = Edge Property
- EPK = Edge Property Key
- EPT = Edge Property Type
For example <V-2, VG-4>
or <V-2, VG-4>
denotes the 4th neighbor of the 2nd vertex.