Monitoring Recommendations
Here are monitoring recommendations to effectively oversee your Jive environment. To implement these recommendations, use a monitoring tool—such as check_MK, Zenoss, Zyrion, IBM/Tivoli, or others—with polling intervals of every five minutes.
If you connect Jive to external resources like an LDAP server, SSO system, SharePoint, or Netapp storage, we recommend establishing monitoring on these external and shared resources. This is especially vital if Jive synchronizes with an LDAP server or authenticates against an SSO, as it helps troubleshoot login issues. We have observed outages due to LDAP server availability problems in our hosted environments.
Monitoring Items
1. All Nodes
What to Monitor:
- Memory utilization
- CPU load
- Disk space
- Disk I/O activity
- Network traffic
- Clock accuracy
Why Monitor: Regular checks on these metrics assist in troubleshooting and ensuring optimal performance:
- Memory Utilization: If consistently near 75%, consider increasing memory.
- CPU Load: Healthy nodes typically show a load between 0 and 10. Above 5 consistently may require thread dumps using the
jive snap
command and then opening a support case with Support. - Disk Space: Require sufficient space for search indexes, attachments, images, and binary content caching. The default limit for the binstore cache is 512MB.
- Network Traffic: Tracking helps understand traffic patterns and drop-offs.
- Clock Accuracy: Ensure clocks are synchronized in clustered environments, preferably using NTP.
2. Jive Web Applications
What to Monitor:
- Synthetic health checks using tools such as WebInject both for individual web application servers and through the load balancer's virtual IP address.
Why Monitor: WebInject verifies the application's functionality:
- Request login and homepage to check the service status and database connectivity.
- Set checks to run every five minutes initially, with adjustments for false alarms.
Example Checks:
3. Cache Server
What to Monitor:
- JMX hooks (heap)
- Disk space (logs)
Why Monitor:
- Heap: Monitor for excessive garbage collection; if consistently near 75%, increase heap size. For details, see Adjusting Java Virtual Machine (JVM) settings.
- Ensure sufficient disk space for logging.
4. Databases (Activity Engine, Analytics, and Web Application)
What to Monitor:
- Stats for connections, transactions, longest query time, slow queries
- Verify ETLs are running
- Disk space
- Disk I/O activity
Why Monitor: These checks can indicate potential resource issues:
- Connections: Monitor the number to manage memory usage appropriately.
- Transactions: Measure overall traffic volume, though secondary to CPU and memory usage monitoring.
- Queries: Slow queries should be logged and monitored for optimization.
- ETLs: Verify running status to ensure data accuracy; check
jivedw_etl_job
table. - Disk Space: Monitor for minimum availability of 50% on the database server to avoid complications.
5. Document Conversion
What to Monitor:
- Tomcat I/O
- Heap
- Queue statistics (average length and wait times)
- Running OpenOffice service statistics
- Overall conversion success rate
Why Monitor: Service statistics accessible via JMX help verify conversion processes.
6. Activity Engine
What to Monitor:
- Activity Engine service
- JMX hooks (heap) and ports
- Queue statistics (average length and wait times)
Why Monitor:
- Ensure service health via JMX metrics.
- Manage memory effectively based on heap usage. For queue details, see Configuring Activity Engine.
Advanced Monitoring Data Points
JMX Data Points
Node | Data Type | JMX Object Name | JMX Attribute Name | Data Point |
---|---|---|---|---|
Jive Web Applications | JVM heap memory | java.lang:type=Memory | HeapMemoryUsage | max / used |
Cache Server | JVM heap memory | java.lang:type=Memory | HeapMemoryUsage | max / used |
Activity Engine | JVM heap memory | java.lang:type=Memory | HeapMemoryUsage | max / used |
PostgreSQL Data Points
Collect PostgreSQL data points for the core application and Activity Engine databases, with an option for Analytics database data:
Query Method | Type | Data Points |
---|---|---|
poll_postgres.py script | Connections | Total , Active , Idle |
This script makes one query to the database. The query returns all data points at once. | Locks | Total , Granted , Waiting , Exclusive , Access Exclusive |
Latencies | Connection latency , SELECT Query latency | |
Tuple Rates | Returned , Fetched , Inserted , Updated , Deleted |