Configuration Parameters for the Graph Server (PGX) Engine

23.1 Configuration Parameters for the Graph Server (PGX) Engine

You can configure the graph server (PGX) engine parameters in the /etc/oracle/graph/pgx.conf JSON file.

During startup, the graph server (PGX) picks up the settings in the /etc/oracle/graph/pgx.conf file, by default.

The following tables describe the different graph server (PGX) runtime configuration options.

Graph Server (PGX) Engine Parameters

The graph server (PGX) engine parameters are described in the following table:

Table 23-1 Runtime Parameters for the Graph Server (PGX) Engine

Parameter	Type	Description	Default
`admin_request_cache_timeout`	`integer`	After how many seconds admin request results get removed from the cache. Requests which are not done or not yet consumed are excluded from this timeout. Note: This is only relevant if PGX is deployed as a webapp.	`60`
`allow_idle_timeout_overwrite`	`boolean`	If true, sessions can overwrite the default idle timeout.	`true`
`allow_lazy_loading_for_database_graphs`	`boolean`	If true, the graph server (PGX) will automatically load the graphs from the database when they are first referenced in the graph queries.	`false`
`allow_override_scheduling_information`	`boolean`	If true, allow all users to override scheduling information like task weight, task priority, and number of threads	`true`
`allow_task_timeout_overwrite`	`boolean`	If true, sessions can overwrite the default task timeout.	`true`
`allow_user_auto_refresh`	`boolean`	If true, users may enable auto refresh for graphs they load. If false, only graphs mentioned in `preload_graphs` can have auto refresh enabled.	`false`
`allowed_remote_loading_locations`	`array of string`	Allow loading graphs into the PGX engine from remote locations (http, https, ftp, ftps, s3). If empty, as by default, no remote location is allowed. If "" is specified in the array, all remote locations are allowed. Only the value "" is currently supported. Note that pre-loaded graphs are loaded from any location, regardless of the value of this setting. Note that this parameter reduces security and therefore use it only when needed.	`[]`
`authorization`	`array of object`	Mapping of users and roles to resources and permissions for authorization.	`[]`
`authorization_session_create_allow_all`	`boolean`	If `true` allow all users to create a PGX session regardless of permissions granted to them.	`false`
`basic_scheduler_config`	`object`	Configuration parameters for the fork join pool backend.	`null`
`bfs_iterate_que_task_size`	`integer`	Task size for BFS iterate QUE phase.	`128`
`bfs_threshold_parent_read_based`	`number`	Threshold of BFS traversal level items to switch to parent-read-based visiting strategy.	`0.05`
`bfs_threshold_read_based`	`integer`	Threshold of BFS traversal level items to switch to read-based visiting strategy.	`1024`
`bfs_threshold_single_threaded`	`integer`	Until what number of BFS traversal level items vertices are visited single-threaded.	`128`
`character_set`	`string`	Standard character set to use throughout PGX. UTF-8 is the default. Note: Some formats may not be compatible.	`utf-8`
`cni_diff_factor_default`	`integer`	Default diff factor value used in the common neighbor iterator implementations.	`8`
`cni_small_default`	`integer`	Default value used in the common neighbor iterator implementations, to indicate below which threshold a subarray is considered small.	`128`
`cni_stop_recursion_default`	`integer`	Default value used in the common neighbor iterator implementations, to indicate the minimum size where the binary search approach is applied.	`96`
`data_memory_limits`	`object`	Memory limits configuration parameters.	`null`
`dfs_threshold_large`	`integer`	Value that determines at which number of visited vertices the DFS implementation will switch to data structures that are optimized for larger numbers of vertices.	`4096`
`enable_csrf_token_checks`	`boolean`	If true, the PGX webapp will verify the Cross-Site Request Forgery (CSRF) token cookie and request parameters sent by the client exist and match. This is to prevent CSRF attacks.	`true`
`enable_gm_compiler`	`boolean`	If `true`, enable dynamic compilation of PGX Algorithm API (or Green-Marl code) during runtime.	`true`
`enable_graph_loading_cache`	`boolean`	If `true`, activate the graph loading cache that will accelerate loading of graphs that were previously loaded (can only be disabled in embedded mode).	`true`
`enable_graph_sharing`	`boolean`	Indicates if a user is allowed to grant `read` permission on its published graphs to other users. This flag is only relevant for a remote server.	`true`
`enable_memory_limits_checks`	`boolean`	If `true` the graph server will enforce the configured memory limits.	`true`
`enable_ml_accelerators`	`boolean`	If `true`, the graph server will utilize the available ML accelerators to run faster machine learning trainings.	`true`
`enable_shutdown_cleanup_hook`	`boolean`	If `true`, PGX will add a JVM shutdown hook that will automatically shutdown PGX at JVM shutdown. Notice: Having the shutdown hook deactivated and not explicitly shutting down PGX may result in pollution of your temp directory.	`true`
`enable_snapshot_properties_publish_state_propagation`	`boolean`	If `true`, properties in a new snapshot will inherit the publishing state of properties in the parent snapshot.	`true`
`enterprise_scheduler_config`	`object`	Configuration parameters for the enterprise scheduler. See Table 23-3 and Table 23-4 for more information.	`null`
`enterprise_scheduler_flags`	`object`	[relevant for enterprise_scheduler] Enterprise scheduler-specific settings.	`null`
`explicit_spin_locks`	`boolean`	`true` means spin explicitly in a loop until lock becomes available. `false` means using JDK locks which rely on the JVM to decide whether to context switch or spin. Setting this value to `true` usually results in better performance.	`true`
`file_locations`	`array of object`	The file locations that can be used in the authorization-config.	`[]`
`graph_algorithm_language`	`enum[GM, JAVA]`	Front-end compiler to use.	`JAVA`
`graph_sharing_option`	`enum[allow_data_sharing, disallow_data_sharing, allow_traceable_data_sharing_for_same_user]`	This is to manage if a graph can be published and shared with other users.	`allow_data_sharing`
`graph_validation_level`	`enum[low, high]`	Level of validation performed on newly loaded or created graphs.	`low`
`ignore_incompatible_backend_operations`	`boolean`	If `true`, only log when encountering incompatible operations and configuration values in RTS or FJ pool. If `false`, throw exceptions.	`false`
`in_place_update_consistency_model`	`enum[ALLLOW_INCONSISTENCIES, CANCEL_TASKS]`	Consistency model used when in-place updates occur. Only relevant if in-place updates are enabled. Currently updates are only applied in place if the updates are not structural (Only modifies properties). Two models are currently implemented, one only delays new tasks when an update occurs, the other also delays running tasks.	`ALLOW_INCONSISTENCIES`
`init_pgql_on_startup`	`boolean`	If `true` PGQL is directly initialized on start-up of PGX. Otherwise, it is initialized during the first use of PGQL.	`true`
`interval_to_poll_max`	`integer`	Exponential backoff upper bound (in ms), which once reached, the job status polling interval is fixed	`1000`
`java_home_dir`	`string`	The path to Java's home directory. If set to `<system-java-home-dir>`, use the `java.home` system property.	`<system-java-home-dir>`
`large_array_threshold`	`integer`	Threshold when the size of an array is too big to use a normal Java array. This depends on the used JVM. (Defaults to `Integer.MAX_VALUE - 3`)	`2147483644`
`max_active_sessions`	`integer`	Maximum number of sessions allowed to be active at a time.	`1024`
`max_distinct_strings_per_pool`	`integer`	[only relevant if string_pooling_strategy is indexed] Number of distinct strings per property after which to stop pooling. If the limit is reached, an exception is thrown.	`65536`
`max_http_client_request_size`	`long`	Maximum size in bytes of any `http` request sent to the PGX server over the REST API. Setting it to `-1` allows requests of any size.	`10485760`
`max_off_heap_size`	`integer`	Maximum amount of off-heap memory (in megabytes) that PGX is allowed to allocate before an OutOfMemoryError will be thrown. Note that this limit is not guaranteed to never be exceeded, because of rounding and synchronization trade-offs. It only serves as threshold when PGX starts to reject new memory allocation requests.	`<available-physical-memory>`
`max_on_heap_memory_usage_ratio`	`number`	Maximum ratio of on-heap memory that PGX is allowed to use, between 0 and 1.	`0.9`
`max_queue_size_per_session`	`integer`	The maximum number of pending tasks allowed to be in the queue, per session. If a session reaches the maximum, new incoming requests of that sesssion get rejected. A negative value means infinity or unlimited..	`-1`
`max_snapshot_count`	`integer`	Number of snapshots that may be loaded in the engine at the same time. New snapshots can be created via auto or forced update. If the number of snapshots of a graph reaches this threshold, no more auto-updates will be performed, and a forced update will result in an exception until one or more snapshots are removed from memory. A value of zero indicates to support an unlimited amount of snapshots.	`0`
`memory_allocator`	`enum[basic_allocator, enterprise_allocator]`	The memory allocator to use.	`basic_allocator`
`memory_cleanup_interval`	`integer`	Memory cleanup interval in seconds.	`5`
`min_array_compaction_threshold`	`number`	Minimum value (only relevant for graphs optimized for updates) that can be used for the `array_compaction_threshold` value in graph configuration. If a graph configuration attempts to use a value lower than the one specified by `min_array_compaction_threshold`, it will use `min_array_compaction_threshold` instead.	`0.2`
`min_fetch_interval_sec`	`integer`	For delta-refresh (only relevant if the graph format supports delta updates), the lowest interval at which a graph source is queried for changes. You can tune this value to prevent PGX from hanging due to too frequent graph delta-refreshing.	`2`
`min_update_interval_sec`	`integer`	For auto-refresh, the lowest interval after which a new snapshot is created, either by reloading the entire graph or if the format supports delta-updates, out of the cached changes (only relevant if the format supports delta updates). You can tune this value to prevent PGX from hanging due to too frequent graph auto-refreshing.	`2`
`ms_bfs_frontier_type_strategy`	`enum[auto_grow, short, int]`	The type strategy to use for MS-BFS frontiers.	`auto_grow`
`num_spin_locks`	`integer`	Number of spin locks each generated app will create at instantiation. Trade-off: a small number implies less memory consumption; a large number implies faster execution (if algorithm uses spin locks).	`1024`
`parallelism`	`integer`	Number of worker threads to be used in thread pool. Note: If the caller thread is part of another thread-pool, this value is ignored and the parallelism of the parent pool is used.	`<number-of-cpus>`
`pattern_matching_supernode_cache_threshold`	`integer`	Minimum number of a node's neighbor to be a supernode. This is for the pattern matching engine.	`1000`
`permission_checks_interval`	`integer`	Interval in seconds to perform permission checks on source graphs.	`60`
`pgx_realm`	`object`	Configuration parameters for the realm. See Table 23-2.	`null`
`pgx_server_base_url`	`string`	This is used when deploying the graph server behind a load balancer to make clients before 21.3 backward compatible. The value should be set to the load balancer address.	`null`
`pooling_factor`	`number`	[only relevant if string_pooling_strategy is on_heap] This value prevents the string pool to grow as big as the property size, which could render the pooling ineffective.	`0.25`
`preload_graphs`	`array of object`	List of graph configs to be registered at start-up. Each item includes path to a graph config, the name of the graph and whether it should be published.	`[]`
`random_generator_strategy`	`enum[non_deterministic, deterministic]`	Method of generating random numbers in PGX.	`non_deterministic`
`random_seed`	`long`	[relevant for deterministic random number generator only] Seed for the deterministic random number generator used in pgx. The default is -24466691093057031.	`-24466691093057031`
`readiness_memory_usage_ratio`	`number`	Memory limit ratio that should be considered to detect if PGX server is ready. This is used by `isReady` API and the default value is 1.0	`1.0`
`release_memory_threshold`	`number`	Threshold percentage (decimal fraction) of used memory after which the engine starts freeing unused graphs. Examples: A value of 0.0 means graphs get freed as soon as their reference count becomes zero. That is, all sessions which loaded that graph were destroyed/timed out. A value of 1.0 means graphs never get freed, and the engine will throw OutOfMemoryErrors as soon as a graph is needed which does not fit in memory anymore. A value of 0.7 means the engine keeps all graphs in memory as long as total memory consumption is below 70% of total available memory, even if there is currently no session using them. When consumption exceeds 70% and another graph needs to get loaded, unused graphs get freed until memory consumption is below 70% again.	`0.0`
`revisit_threshold`	`integer`	Maximum number of matched results from a node to be cached.	`4096`
`running_memory_usage_ratio`	`number`	Memory limit ratio that should be considered to detect if PGX server is running. This is used by `isRunning` API and the default value is 1.0	`1.0`
`scheduler`	`enum[basic_scheduler, enterprise_scheduler, low_latency_scheduler]`	The scheduler to use. `basic_scheduler`: uses a scheduler with basic features `enterprise_scheduler`: uses a scheduler with advanced enterprise features for running multiple tasks concurrently and providing better performance `low_latency_scheduler`: uses a scheduler that privileges latency of tasks over throughput or fairness across multiple sessions. The low_latency_scheduler is only available in embedded mode.	`enterprise_scheduler`
`session_idle_timeout_secs`	`integer`	Timeout of idling sessions in seconds. Zero (0) means infinity or no timeout.	`14400`
`session_task_timeout_secs`	`integer`	Timeout in seconds to interrupt long-running tasks submitted by sessions (algorithms, I/O tasks). Zero (0) means infinity or no timeout.	`0`
`small_task_length`	`integer`	Task length if the total amount of work is smaller than default task length (only relevant for task-stealing strategies).	128
`strict_mode`	`boolean`	If true, exceptions are thrown and logged with ERROR level whenever the engine encounters configuration problems, such as invalid keys, mismatches, and other potential errors. If false, the engine logs problems with ERROR/WARN level (depending on severity) and makes best guesses and uses sensible defaults instead of throwing exceptions.	`true`
`string_pooling_strategy`	`enum[indexed, on_heap, none]`	The string pooling strategy to use.	`on_heap`
`task_length`	`integer`	Default task length (only relevant for task-stealing strategies). Should be between 100 and 10000. Trade-off: a small number implies more fine-grained tasks are generated, higher stealing throughput; a large number implies less memory consumption and GC activity.	`4096`
`tmp_dir`	`string`	Temporary directory to store compilation artifacts and other temporary data. If set to <system-tmp-dir>, uses the standard tmp directory of the underlying system (/tmp on Linux).	`"/tmp"`
`udf_config_directory`	`string`	Directory path containing UDF config files.	`null`
`use_index_for_reachability_queries`	`enum[auto, off]`	Create index for reachability queries.	`auto`
`use_memory_mapper_for_reading_pgb`	`boolean`	If true, use memory mapped files for reading graphs in PGB format if possible; if false, always use a stream-based implementation.	`true`
`use_memory_mapper_for_storing_pgb`	`boolean`	If true, use memory mapped files for storing graphs in PGB format if possible; if false, always use a stream-based implementation.	`true`

The default values of the runtime configuration fields are optimized to deliver the best performance across a wide set of algorithms. Depending on your workload you may be able to improve performance further by experimenting with different strategies, sizes, and thresholds.

Advanced Access Configuration

The following table lists the fields in the pgx_realm object that can be used to customize login behavior.

Table 23-2 Advanced Access Configuration Options

Parameters	Type	Description	Default
`token_expiration_seconds`	`integer`	After how many seconds the generated bearer token will expire.	3600 (1 hour)
`refresh_time_before_token_expiry_seconds`	`integer`	After how many seconds a token is automatically refreshed before it expires. Note that this value must always be less than the `token_expiration_seconds` value.	1800
`connect_timeout_milliseconds`	`integer`	After how many milliseconds an connection attempt to the specified JDBC URL will time out, resulting in the login attempt being rejected.	10000
`max_pool_size`	`integer`	Maximum number of JDBC connections allowed per user. If the number is reached, attempts to read from the database will fail for the current user. Starting from 23.4 onwards, a new dedicated pool with one connection is provided for token refresh. This new dedicated pool does not affect the `max_pool_size` value.	64
`max_num_users`	`integer`	Maximum number of active, signed in users to allow. If this number is reached, the graph server will reject login attempts.	512
`max_num_token_refresh`	`integer`	Maximum amount of times a token can be automatically refreshed before requiring a login again.	24

Enterprise Scheduler Parameters

The following parameters are relevant only if the advanced scheduler is used. (They are ignored if the basic scheduler is used.)

Table 23-3 Enterprise Scheduler Parameters

Parameter	Type	Description	Default
`analysis_task_config`	`object`	Configuration for analysis tasks	weight `<no-of-CPUs>` priority `MEDIUM` max_threads `<no-of-CPUs>`
`fast_analysis_task_config`	`object`	Configuration for fast analysis tasks	weight `1` priority `HIGH` max_threads `<no-of-CPUs>`
`max_num_concurrent_io_tasks`	`integer`	Maximum number of concurrent I/O tasks at a time	`3`
`num_io_threads_per_task`	`integer`	Number of I/O threads to use per task	`<no-of-cpus>`

Basic Scheduler Parameters

The following parameters are relevant only if the basic scheduler is used. (They are ignored if the advanced scheduler is used.)

Table 23-4 Basic Scheduler Parameters

Field	Type	Description	Default
`num_workers_analysis`	`integer`	This specifies how many worker threads to use for analysis tasks.	`<no-of-cpus>`
`num_workers_fast_track_analysis`	`integer`	This specifies how many worker threads to use for fast-track analysis tasks.	`1`
`num_workers_io`	`integer`	This specifies how many worker threads to use for I/O tasks (load/refresh/write from/to disk). This value does not impact file-based loaders, as they are always single-threaded. Database loaders will open a new connection for each I/O worker.	`<no-of-cpus>`

Example 23-1 Minimal Graph Server (PGX) Configuration

The following example causes the graph server (PGX) to initialize its analysis thread pool with 32 workers. (Default values are used for all other parameters.)

{
  "enterprise_scheduler_config": {
    "analysis_task_config": {
      "max_threads": 32
    }
  }
}

Example 23-2 Two Pre-loaded Graphs

This example sets more fields and specifies two fixed graphs for loading into memory during the graph server (PGX) startup.

{ 
  "enterprise_scheduler_config": {
    "analysis_task_config": {
      "max_threads": 32
    },
    "fast_analysis_task_config": {
      "max_threads": 32
    }
  }, 
  "memory_cleanup_interval": 600,
  "max_active_sessions": 1, 
  "release_memory_threshold": 0.2, 
  "preload_graphs": [
    {
      "path": "graph-configs/my-graph.bin.json",
      "name": "my-graph"
    },
    {
      "path": "graph-configs/my-other-graph.adj.json",
      "name": "my-other-graph",
      "publish": false
    }
  ],
  "authorization": [{
    "pgx_role": "GRAPH_DEVELOPER",
    "pgx_permissions": [{
      "preloaded_graph": "my-graph",
      "grant": "read"
    },
    {
      "preloaded_graph": "my-other-graph",
      "grant": "read"
    }]
  },	
	....
  ]
}

Relative paths in parameter values are always resolved relative to the parent directory of the configuration file in which they are specified. For example, if the preceding JSON is in /pgx/conf/pgx.conf, then the file path graph-configs/my-graph.bin.json inside that file would be resolved to /pgx/conf/graph-configs/my-graph.bin.json.

Parent topic: Graph Server (PGX) Configuration Options