5 Cassandra Message Store Pre-Installation Tasks
This chapter provides information on the pre-installation tasks you must complete on Cassandra nodes before you can install Messaging Server software.
Summary of General Pre-Installation Tasks
The following list summarizes the general pre-installation tasks you must complete before installing any Messaging Server component.
-
Create a UNIX system user and group for Messaging Server, and set permissions for the directories and files owned by that user.
-
Check that DNS is running and configured properly for the Messaging Server host.
-
Check the number of file descriptors for Linux, and if this number is less than 16384, you need to increase the value.
-
Install Oracle Directory Server Enterprise Edition, if your site does not currently have Directory Server deployed.
See the chapter titled "Messaging Server Pre-Installation Tasks" in Messaging Server Installation and Configuration Guide for detailed information.
The following list summarizes the pre-installation tasks you must complete on Cassandra nodes:
Installing Java
To install Java, see "Prerequisites" on the Cassandra web site at:
http://cassandra.apache.org/doc/latest/getting_started/installing.html
Note:
The JAVA_HOME/bin directory must be in the PATH environment variable.
Installing Python
To install Python, see the Python documentation at:
https://docs.python.org/2/installing/
Be sure to use the version of Python that is supported by the version of Cassandra that you are installing.
Installing Apache Cassandra
The tasks to install Apache Cassandra are:
Downloading the Apache Cassandra Software
To download the Cassandra software:
-
Download the Cassandra software from the Cassandra download site, located at:
-
Copy the installer file to your Cassandra message store hosts.
Installing the Apache Cassandra Software
To install Cassandra software:
-
On each Cassandra node, install the Cassandra software, and verify that Cassandra is running.
For more information, see the Cassandra installation documentation at:
http://cassandra.apache.org/doc/latest/getting_started/installing.html#
-
Ensure that for Oracle Linux 6.x and later, the 32-bit versions of the glibc libraries are installed.
For more information, see the Cassandra documentation at:
http://cassandra.apache.org/doc/latest/getting_started/installing.html#
-
Optionally, install msstatbot, a monitoring solution for Cassandra. For more information, see the Messaging Server System Administrator's Guide.
Installing Elasticsearch Cluster
To install Elasticsearch cluster, see the documentation at: https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html
Setting Up the Cassandra Cluster
To set up the Cassandra cluster, see the Cassandra documentation:
-
For a single data center, see:
http://cassandra.apache.org/doc/latest/configuration/index.html
-
For multiple data centers, see:
http://cassandra.apache.org/doc/latest/configuration/index.html
When setting up multiple data centers, the Messaging Server recommendation, which minimizes the overhead in replicating and repairing Cassandra keyspaces across all data centers, is to configure four data centers in three clusters with keyspaces arranged as shown in Table 5-1.
Table 5-1 Recommended Multiple Data Centers and Clusters Configuration
Data Center Name and Node Types | Keyspaces | Cluster Configuration |
---|---|---|
DC_MSG, Cassandra nodes |
ms_msg |
Cluster Content |
DC_META, Cassandra nodes |
ms_mbox, ms_index |
Combined with DC_INDEX into Cluster Metadata |
DC_CACHE, Cassandra nodes |
ms_cache |
Cluster Cache |
Cluster settings, such as the cluster name and seed nodes, are defined in the cassandra.yaml file. See the following section for more information.
To support more concurrent index updates, the ratio of DC_META nodes to DC_INDEX nodes should be at least 1 to 2.
Changing Initial Cassandra Settings
On each Cassandra node, optimize the Cassandra installation by following the recommendations in the Cassandra documentation.
Changing Initial Tuning Settings
On each Cassandra node, change the configuration files described in this section so that the node operates correctly in the Cassandra message store deployment.
cassandra.yaml File
Make the changes in this section to the /etc/cassandra/cassandra.yaml file.
For all nodes, to enable separate clusters for better performance, specify cluster_name.
Make the following changes to the num_tokens setting:
-
DC_MSG, DC_META, and DC_CACHE nodes:
num_tokens: 256
To improve performance, locate data on SSD drives:
-
data_file_directories:
/var/lib/cassandra/data
-
commitlog_directory:
/var/lib/cassandra/commitlog
-
saved_caches_directory:
/var/lib/cassandra/saved_caches
-
hints_directory:
/var/lib/cassandra/hints
To support large mailbox and message, increase the commitlog size:
commitlog_segment_size_in_mb: 256
To specify seed nodes, you must use two nodes from each data center in the cluster, preferably located on different racks, so that each cluster has different seeds, for example:
-
DC_MSG cluster:
seeds: "192.0.2.12,192.0.2.24"
-
DC_META cluster:
seeds: "192.0.2.1,192.0.2.2,192.0.2.10,192.0.2.3"
-
DC_CACHE cluster:
seeds: "192.0.2.14,192.0.2.7"
For all nodes, make the following change to improve performance:
memtable_flush_writers: 8
For all nodes, specify listen_address, rpc_address, native_transport_address, and so on, according to your deployment.
cassandra-env.sh File
For all nodes, to specify the location of the heap dump, make the following change to the /etc/cassandra/cassandra-env.sh file:
export CASSANDRA_HEAPDUMP_DIR=/scratch/heapdump
jvm.options File
For DC_MSGDC_META, and DC_CACHE nodes, make the following heap size changes:
-Xms32G -Xmx32G
For DC_META nodes, to improve performance, make the following heap size changes to improve performance:
-Xms16G -Xmx16G
For all nodes, make the following changes to improve performance:
-XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=12 -XX:ConcGCThreads=12
For all nodes, print garbage collection measurements, which are useful for monitoring system performance:
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/var/log/cassandra/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
cassandra-rackdc.properties File
For all nodes, make the following changes to the cassandra-rackdc.properties file.
-
Configure the endpoint snitch:
endpoint_snitch: GossipingPropertyFileSnitch
-
Set the data center and rack names as appropriate:
dc=
mydc
rack=myrac
For example, for a node in DC_CACHE in a physical rack in one location, set dc=DC_CACHE and rack=RAC1. And, for another node in DC_CACHE in a physical rack in another location, set dc=DC_CACHE and rack=RAC2.
Note:
Data center and rack names are case sensitive.