30 Monitoring the Message Store
This chapter describes message store monitoring tasks. See "Managing the Message Store and Mailboxes" for conceptual information.
For more information about monitoring, see the following chapters:
General Message Store Monitoring Procedures
This section outlines standard monitoring procedures for the message store. These procedures are helpful for general message store checks, testing, and standard maintenance.
Checking Hardware Space
A message store should have enough additional disk space and hardware resources. When the message store is near the maximum limit of disk space and hardware space, problems might occur within the message store.
Inadequate disk space is one of the most common causes of the mail server problems and failure. Without space to write to the message store, the mail server will fail. In addition, when the available disk space goes below a certain threshold, there will be problems related to message delivery, logging, and so forth. Disk space can be rapidly depleted when the clean up function of the stored process fails and deleted messages are not expunged from the message store.
See "Monitoring Disk Space" for information on monitoring disk space
Checking Log Files
Check the log files to make sure the message store processes are running as configured. Oracle Communications Messaging Server creates a separate set of log files for each of the major protocols, or services, it supports: SMTP, IMAP, POP, and HTTP. You can look at the log files in the DataRoot/log/ directory. You should monitor the log files on a routine basis.
Be aware that logging can impact server performance. The more verbose the logging you specify, the more disk space your log files will occupy for a given amount of time. You should define effective but realistic log rotation, expiration, and backup policies for your server. See "Using Message Store Log Messages" for information about defining logging policies for your server.
Checking User IMAP/POP/Webmail Session by Using Telemetry
Messaging Server provides a feature called telemetry that can capture a user's entire IMAP, POP or HTTP session into a file. This feature is useful for debugging client problems. For example, if users complain that their message access client is not working as expected, this feature can be used to trace the interaction between the access client and Messaging Server.
To capture a POP session, create the following directory:
DataRoot/telemetry/pop_or_imap_or_http/userid
To capture a POP session, create the following directory:
DataRoot/telemetry/pop/userid
To capture an IMAP session, create the following directory:
DataRoot/telemetry/imap/userid
To capture a Webmail session, create the following directory:
DataRoot/telemetry/http/userid
Note: userid is "uid" for default domain and "uid@domain" for hosted domains.
Note that the directory must be owned or writable by the messaging server userid.
Messaging Server will create one file per session in that directory. Example output is shown below.
LOGIN redb 2003/11/26 13:03:21 >0.017>1 OK User logged in <0.047<2 XSERVERINFO MANAGEACCOUNTURL MANAGELISTSURL MANAGEFILTERSURL >0.003>* XSERVERINFO MANAGEACCOUNTURL {67} http://redb@cuisine.blue.planet.com:800/bin/user/admin/bin/enduser MANAGELISTSURL NIL MANAGEFILTERSURL NIL 2 OK Completed <0.046<3 select "INBOX" >0.236>* FLAGS (\Answered flagged draft deleted \Seen $MDNSent Junk) * OK [PERMANENTFLAGS (\Answered flag draft deleted \Seen $MDNSent Junk \*)] * 1538 EXISTS * 0 RECENT * OK [UNSEEN 23] * OK [UIDVALIDITY 1046219200] * OK [UIDNEXT 1968] 3 OK [READ-WRITE] Completed <0.045<4 UID fetch 1:* (FLAGS) >0.117>* 1 FETCH (FLAGS (\Seen) UID 330) * 2 FETCH (FLAGS (\Seen) UID 331) * 3 FETCH (FLAGS (\Seen) UID 332) * 4 FETCH (FLAGS (\Seen) UID 333) * 5 FETCH (FLAGS (\Seen) UID 334) <etc>
You can gather command telemetry that does not include end-user information by using the imap.logcommands msconfig option (or in legacy configuration local.imap.logcommands). See Messaging Server Reference for additional information.
To disable the telemetry logging, move or remove the directory that you created.
Checking stored Processes
The stored function performs a variety of important tasks such as deadlock and transaction operations of the message database, enforcing aging policies, and expunging and erasing messages stored on disk. If stored stops running, Messaging Server will eventually run into problems. If stored does not start when start-msg is run, no other processes will start.
-
Check that the stored process is running. See "imcheck" for more information.
-
Check for the log file build up in store_root/mboxlist.
-
Check for stored messages in the default log file DataRoot/log/default/default.
-
Check that the time stamps of the following files (in directory MessagingServer_home/config/) in Table 30-1 are updated whenever one of the following functions are attempted by the stored process:
Table 30-1 stored Operations
stored Operation | Function |
---|---|
stored.ckp |
Touched when a database checkpoint was initiated. Stamped approximately every 1 minute. |
stored.lcu |
Touched at every database log cleanup. Time stamped approximately every 5 minutes. |
stored.per |
Touched at every spawn of peruser db write out. Time stamped once an hour. |
See "stored" and "Monitoring the stored Process" for more information on the stored process.
Checking Database Log Files
Database log files refer to sleepycat transaction checkpointing log files (in directory store_root/mboxlist). If log files accumulate, then database checkpointing is not occurring. In general, there are two or three database log files during a single period of time. If there are more files, it could be a sign of a problem.
Checking User Folders
If you want to check the user folders, you might run the command reconstruct -r -n (recursive no fix) which will review any user folder and report errors. See "Repairing Mailboxes and the Mailboxes Database (reconstruct Command)" for more information on the reconstruct command.
Monitoring imapd, popd and httpd
These processes provide access to IMAP, POP and Webmail services. If any of these is not running or not responding, the service will not function appropriately. If the service is running, but is over loaded, monitoring will allow you to detect this and configure it more appropriately.
Symptoms of imapd, popd and httpd Problems
Connections are refused or system is too slow to connect. For example, if IMAP is not running and you try to connect to IMAP directly you will see something like this:
telnet 0 143 Trying 0.0.0.0... telnet: Unable to connect to remote host: Connection refused
If you try to connect with a client, you will get a message such as:
"Client is unable to connect to the server at the location you have specified. The server may be down or busy."
To Monitor imapd, popd and httpd
-
Can be monitored with watcher and msprobe. See "Automatic Restart of Failed or Unresponsive Services" and "Monitoring Using msprobe and watcher Functions" for more information.
-
Can be monitored with SNMP. If you have the SNMP set up, this is a very good way to monitor these processes (see "SNMP Support"). The server information is in the Network Services Monitoring MIB.
-
Check log files. Look in the directory MessagingServer_home/log/service where service can be HTTP, IMAP, or POP. One filename is the name of the service (imap, pop, http) and the others are the name of the service plus a sequence number and a date concatenated to the service name. For example:
imap imap.29.1010221593 imap.31.1010394412 imap.33.1010567224
The file with just the service name is the latest log. The other ones are ordered by the sequence number (here 29, 31, 33) and the one with the highest sequence number is the next newest one (see "Using Message Store Log Messages").
If a server was shut down you might see something like this:
imap.12.1065431243:[07/Oct/2003:01:15:43 -0700] gotmail-2 imapd[20525]: General Warning: Sun Java System Messaging Server IMAP4 6.1 (built Sep 24 2003) shutting down
-
Can be checked with "counterutil".
See "Gathering Message Store Counter Statistics by Using counterutil".
-
Run the platform-specific command to verify that the imapd, popd and httpd processes are running. For example, in Oracle Solaris you can use the ps command and look for imapd, popd and mshttpd.
-
You can set alarms for specified server performance thresholds by setting the server response configuration options described in "Alarm Messages".
-
See "immonitor-access".
Monitoring the stored Process
"stored" performs a variety of important tasks such as deadlock and transaction operations of the message database, enforcing aging policies, and expunging and erasing messages stored on disk. If stored stops running, the messaging server will eventually run into problems. If stored does not start when start-msg is run, no other processes will start. See "stored" for more information.
To Monitor stored
-
Check that the stored process is running. stored creates and updates a pid file in DataRoot/proc called store. The pid file shows an init state when recovering and a ready state when ready. For example:
231: cat store 28250 ready
The number on the first line is the process ID of stored.
232: ps -eaf | grep stored inetuser 28250 1 0 Jan 05 ? 8:44 /opt/sun/comms/messaging64/lib/stored -d
-
Check for log file build up in MessagingServer_home/store/mboxlist. Note that not every log file build up is caused by direct stored problems. Log files may also build up if imapd dies or there is a database problem.
-
Check the timestamp on the following files in MessagingServer_home/config:
stored.ckp - Touched when attempt at checkpointing is made. Should get time stamped every 1 minute.
stored.lcu - Touched at every db log cleanup. Should get time stamped every 5 minutes.
stored.per - Touched at every spawn of peruser db writeout. Should get time stamped every 60 minutes.
-
Check for stored messages in the default log file DataRoot/log/default/default
-
Can be monitored with watcher and msprobe. See "Automatic Restart of Failed or Unresponsive Services" and "Monitoring Using msprobe and watcher Functions" for more information.
Monitoring the State of Message Store Database Locks
The state of database-locks is held by different server processes. These database locks can affect the performance of the message store. In case of deadlocks, messages will not be getting inserted into the store at reasonable speeds and the ims-ms channel queue will grow larger as a result. There are legitimate reasons for a queue to back up, so it is useful to have a history of the queue length in order to diagnose problems.
Symptoms of Message Store Database Lock Problems
Number of transactions are accumulating and not resolving.
To Monitor Message Store Database Locks
Use the command "imcheck" -s (used to be counterutil -o db_lock).
To Monitor Mailbox Quotas and Usage
You can monitor mailbox quota usage and limits by using the "imquotacheck" utility. The imquotacheck utility generates a report that lists defined quotas and limits, and provides information on quota usage.
For example, the following command lists all user quota information:
imquotacheck ------------------------------------------------------------------------- Domain red.example.com (diskquota = not set msgquota = not set) quota usage ------------------------------------------------------------------------- diskquota size(K) %use msgquota msgs %use user # of domains = 1 # of users = 705 no quota 50418 no quota 4392 ajonk no quota 5 no quota 2 andrt no quota 355518 no quota 2500 ansri ...
The following example shows the quota usage for user sorook:
imquotacheck -u sorook ------------------------------------------------------------------------- quota usage for user sorook ------------------------------------------------------------------------- diskquota size(K) %use msgquota msgs %use user no quota 1487 no quota 305 sorook
To list the usage of all users whose quota exceeds the least threshold in the rule file:
imquotacheck
To list quota information for a the domain example.com:
imquotacheck -d example.com
To send a notification to all users in accordance to the default rule file:
imquotacheck -n
To send a notification to all users in accordance to a specified rulefile, myrulefile, and to a specified mail template file, mytemplate.file (for more information, refer to "imquotacheck"):
imquotacheck -n -r myrulefile -t mytemplate.file
To list per folder usages for one user user1 (will ignore the rule file):
imquotacheck -u user1 -e
To Monitor Message Store Database Statistics with imcheck
Use imcheck -s to monitor database statistics including logs and transactions. See "imcheck" for more information.
Note:
The imcheck -s command is only valid for the classic message store.
Gathering Message Store Counter Statistics by Using counterutil
This section describes how to use the counterutil utility to gather message store statistics.
To Get a Current List of Available Counter Objects
This utility provides statistics acquired from different system counters (see "counterutil").
Here is how to get a current list of available counter objects:
counterutil -l Listing registry (/opt/sun/comms/messaging64/data/counter/counter) numobjects = 7 refcount = 20 created = 17/Mar/2015:14:10:03 +0000 modified = 24/Aug/2015:13:00:24 +0000 counterobjects: imapstat popstat alarm serverresponse diskusage httpstat mmpstat
Each entry represents a counter object and supplies a variety of useful counts for this object. In this section we will only be discussing the alarm, diskusage, serverresponse, popstat, imapstat, and httpstat counter objects. See "counterutil" for details on counterutil command usage.
counterutil Output
"counterutil" has a variety of flags. A command format for this utility may be as follows:
counterutil -o CounterObject-i 5 -n 10
where,
-o CounterObject represents the counter object alarm, diskusage, serverresponse, popstat, imapstat, and httpstat.
-i 5 specifies a 5 second interval.
-n 10 represents the number of iterations (default: infinity).
An example of counterutil usage is as follows:
counterutil -o imapstat -i 5 -n 10 Monitor counteroobject (imapstat) registry /gotmail/iplanet/server5/msg-gotmail/counter/counter opened counterobject imapstat opened count = 1 at 972082466 rh = 0xc0990 oh = 0xc0968 global.currentStartTime [4 bytes]: 17/Oct/2000:12:44:23 -0700 global.lastConnectionTime [4 bytes]: 20/Oct/2000:15:53:37 -0700 global.maxConnections [4 bytes]: 69 global.numConnections [4 bytes]: 12480 global.numCurrentConnections [4 bytes]: 48 global.numFailedConnections [4 bytes]: 0 global.numFailedLogins [4 bytes]: 15 global.numGoodLogins [4 bytes]: 10446 ...
Gathering Alarm Statistics by Using counterutil
These alarm statistics refer to the alarms sent by stored. Table 30-2 shows the statistics provided by the alarm counter.
Table 30-2 counterutil alarm Statistics
Suffix | Description |
---|---|
alarm.countoverthreshold |
Number of times crossing threshold. |
alarm.countwarningsent |
Number of warnings sent. |
alarm.current |
Current monitored valued. |
alarm.high |
Highest ever recorded value. |
alarm.low |
Lowest ever recorded value. |
alarm.timelastset |
The last time current value was set. |
alarm.timelastwarning |
The last time warning was sent. |
alarm.timereset |
The last time reset was performed. |
alarm.timestatechanged |
The last time alarm state changed. |
alarm.warningstate |
Warning state (yes(1) or no(0)). |
IMAP, POP, HTTP, and MMP Connection Statistics by Using counterutil
To get information on the number of current IMAP, POP, HTTP, and MMP connections, number of failed logins, total connections from the start time, and so forth, you can use the command counterutil -oCounterObject-i 5 -n 10. Where CounterObject represents the counter object popstat, imapstat, httpstat, or mmpstat. For mmpstat, we have modified the counter names to differentiate the services IMAP and POP since the MMP proxies both. The meaning of the imapstat suffixes is shown in Table 30-3. The popstat and httpstat objects provide the same information in the same format and structure.
Table 30-3 counterutil imapstat Statistics
Suffix | Description |
---|---|
currentStartTime |
Start time of the current IMAP server process. |
lastConnectionTime |
The last time a new client was accepted. |
maxConnections |
Highest recorded number of concurrent TCP connections handled by IMAP server since the last counter reset. |
numConnections |
Total number of TCP connections successfully accepted by the current IMAP server. numConnections can include failed connections, but not always. |
numCurrentConnections |
Current number of active TCP connections. |
numFailedConnections |
Total number of failed TCP connections by the current IMAP server. This number accumulates until the server restart or reset by "counterutil". numFailedConnections counts connections abnormally terminated, including unsuccessful accepts and connections successfully accepted but which had an error later. An error message is logged when a connection failed with an expected error. You can check your IMAP log files for error messages such as the following: Unable to accept client connection: <error message> Socket error : <error message> |
numFailedLogins |
Number of failed system logins served by the current IMAP server. |
numGoodLogins |
Number of successful system logins served by the current IMAP server. |
Disk Usage Statistics by Using counterutil
Table 30-4 shows the information generated by the counterutil -o diskusage command.
Table 30-4 counterutil diskusage Statistics
Suffix | Description |
---|---|
diskusage.availSpace |
Total space available in the disk partition. The values are scaled to fit in the 4 byte counter. If you have a very large file system, the actual number will be divided by 1024 until it is small enough to fit in the 32-bit integer. |
diskusage.lastStatTime |
The last time statistic was taken. |
diskusage.mailPartitionPath |
Mail partition path. |
diskusage.percentAvail |
Disk partition space available percentage. |
diskusage.totalSpace |
Total space in the disk partition. The values are scaled to fit in the 4 byte counter. If you have a very large file system, the actual number will be divided by 1024 until it is small enough to fit in the 32-bit integer. |
Server Response Statistics
Table 30-5 shows the information generated by the counterutil -o serverresponse command. This information is useful for checking if the servers are running, and how quickly they're responding.
Table 30-5 counterutil serverresponse Statistics
Suffix | Description |
---|---|
http.laststattime |
Last time http server response was checked. |
http.responsetime |
Response time for the http. |
imap.laststattime |
Last time imap server response was checked. |
imap.responsetime |
Response time for the imap. |
pop.laststattime |
Last time pop server response was checked. |
pop.responsetime |
Response time for the pop. |