Contributed Content
Carl Waldbieser, an active member of the CAS community, was kind enough to share this analysis.

Lafayette College has an active user base of XXX and regularly records 78 CAS authentication events/minute on average with peaks of 220 events/minute. In preparation of deploying CAS 5.1.x, locust.io was used to put CAS under load and soak and stress tests. Results indicate that CAS 5.1.x deployed with reasonable hardware in a multi-node deployment architecture using nginx+ and hazelcast. Deployment architecture, testing scenarios and results are detailed in the rest of this blogs post.

In preparation for a service upgrade from CAS server version 5.0.x to version 5.1.x, load testing trials were conducted on the CAS stage environment. All trials were carried out against the same deployment architecture, with all nodes configured identically. The deployment architecture and nodes have not changed since the last load test was conducted around April 25, 2017.

Overview

The deployment architecture itself consists of 3 virtual machine nodes:

cas3.stage.lafayette.edu
cas4.stage.lafayette.edu
cas5.stage.lafayette.edu

architecture-500x538

Each node has 3.7 GiB real memory available to it and 2 CPUs. The characteristics of the CPUs are as follows:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Model name: Intel Xeon E312xx (Sandy Bridge)
Stepping: 1
CPU MHz: 1899.999
BogoMIPS: 3799.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
NUMA node0 CPU(s): 0,1

The nodes are deployed behind an Nginx+ proxy in an active-active-active configuration. The nodes share ticket information using encrypted hazelcast messages, so any application state is shared.

The Test Swarm

The testing framework used was locust.io, a Python based load testing framework. The test suite deploys a fixed number of “locusts” against a web site. The initial population ramps up with a configurable “hatch rate”. In the tests, locusts were conceptually divided into 3 “lifetime” categories:

Short-lived locusts live approximately 60 seconds.
Medium-lived locusts last for approximately 5 minutes.
Long-lived locusts exist for approximately 2 hours.

The category to which a given locust is assigned is randomly determined with a ratio of short : medium : long being 7:2:1. Ideally, 70% of the population is short-lived, 20% is medium lived, and 10% is long-lived.

The lifetime of a locust determines how long it will retain and make use of a single web SSO session. Short-lived locusts discard their sessions quickly. Long-lived locusts hold on to them for considerable time. All locusts continually request and validate service tickets throughout their lives every 5-15 seconds.

All locusts are only 25% likely to log out upon their deaths. The CAS service must continue to track TGTs of locusts that have not logged out until the ticket expires, so this behavior can put pressure on the memory storage resources of the nodes.

Each locust uses credentials taken randomly from one of 9 test accounts. Each locust has a 1% chance of entering an erroneous password for an account. Locusts that fail to authenticate will die immediately.

When a locust dies, it is reborn immediately. Its lifetime category remains the same, but its SSO session and all other random parameters are reset.

SSO Session Tracking

SSO sessions are tracked by the TGTs they produce. Any event that creates or destroys a TGT is logged, and these observations are plotted after the fact. Because only 25% of locusts will explicitly end a session, many sessions will accumulate and consume storage in the CAS ticket registry until the session times out. Using the probability of long, medium, and short lived locusts in the population, the actual number of active sessions at any time is estimated. The charts produced should provide a reasonable estimate of how many simultaneous sessions are being managed by the CAS service at any given time.

Trial	01
Date / duration	2017-09-05 from 09:30:00-04:00 until 16:44:00-04:00 (7h 14m)
Number of locusts	150
Hatch rate	10/s

The first trial produced authentication events at a rate of 1,800.11 events/minute. The majority of these were service ticket creation and validation events. The trial was concluded with no noticeable degradation in performance.

Net SSO sessions increased at a rate of 73.5 sessions per minute until the idle session timeout duration was reached.

Trial	02
Date / duration	2017-09-20, 09:00:00-04:00 - 17:00:00-04:00 (8 hours)
Number of locusts	50
Hatch rate	10/s

An average of 600.46 events per second were handled by the CAS service under load during this trial. There were no noticeable service disruptions.

Net SSO sessions increased at a rate of 27.4 sessions per minute, until the session idle timeout duration was reached.

Trial	03
Date / duration	2017-09-22, 09:05:00-04:00 - 09:33:00-04:00 (28 minutes)
Number of locusts	175
Hatch rate	10/s

Net SSO sessions increased at a rate of 82.9 sessions per minute.

Trial	04
Date / duration	2017-09-22, 11:49:00-04:00 - 12:30:00-04:00 (41 minutes)
Number of locusts	200
Hatch rate	10/s

Net SSO sessions increased at a rate of 93.0 sessions per minute.

Trial	05
Date / duration	2017-09-22, 15:10:00-04:00 - 15:47:00-04:00 (37 minutes)
Number of locusts	125
Hatch rate	10/s

Net SSO sessions increased at a rate of 64.0 sessions per minute.

Trial	06
Date / duration	2017-09-22, 16:35:00-04:00 - 16:50:00-04:00 (20 minutes)
Number of locusts	250
Hatch rate	10/s

Net SSO sessions increased at a rate of 124.6 sessions per minute.

Effect of Number of Locusts on Mean Rate of Events

Observations from the previous trial and the current trial were plotted in order to give some sense of the influence the number of locusts in the test swarm would have on the mean rate of events processed by the service each minute. The data suggest that for each additional locust added, there are approximately 12 more events generated per minute.

Observed and Predicted Mean Rates

	mean_rate	mean_rate_observed
locusts
0	1.09	N/A
25	300.51	N/A
50	599.93	600.46
75	899.35	N/A
100	1,198.76	N/A
125	1,498.18	1,496.84
150	1,797.60	1,800.11
175	2,097.02	2,095.36
200	2,396.43	2,395.12
225	2,695.85	N/A
250	2,995.27	2,996.53
275	3,294.69	N/A
300	3,594.10	N/A
325	3,893.52	N/A
350	4,192.94	N/A

Effect of Number of Locusts on Increase in SSO Sessions

The rate at which net new SSO sessions are created during the period from the beginning of a trial until the discarded TGTs begin to timeout is also useful. Since it seems to be a linear function of the number of locusts, this figure can be used to predict the number of SSO sessions that will be present were a trial to reach the session timeout mark.

Conclusions

Measurements 1 taken from the production CAS service from September 1-22, 2017 during normal business hours (9am to 5pm) have the following characteristics:

mean	78 events / minute
median	75 events / minute
mode	59 events / minute
max	220 events / minute
min	8 events / minute
standard deviation	28

The data suggests that the production CAS service is operating well under the maximum sustainable load, and should have plenty of capacity to spare for temporary spikes in utilization.

Carl Waldbieser

¹ Splunk query for Sep 1-21, 2017:

index=auth_cas (sourcetype=cas OR sourcetype=cas5) action=* date_hour >= 9 date_hour <= 16 date_wday!="saturday" date_wday!="sunday" | bin _time span=1m | stats count by _time | stats min(count) max(count)  mean(count) mode(count) median(count) stdev(count)

CAS 5.1.x Load Tests by Lafayette College

Overview

The Test Swarm

SSO Session Tracking

Effect of Number of Locusts on Mean Rate of Events

Observed and Predicted Mean Rates

Effect of Number of Locusts on Increase in SSO Sessions

Conclusions

Changes to CAS Security Vulnerability Response

CAS Vulnerability Disclosure

CAS Vulnerability Disclosure

CAS OpenID Connect Vulnerability Disclosure

Java CAS Client JWT Vulnerability Disclosure

Apereo CAS - External Identity Providers

CAS JWT Authentication Vulnerability Disclosure

Performance improvements on the service registry

Apereo CAS Dynamic Configuration Management

Apereo CAS Receives NLnet Grant to Advance CAS Development

CAS 5.1.x Load Tests by Lafayette College

Overview

The Test Swarm

SSO Session Tracking

Effect of Number of Locusts on Mean Rate of Events

Observed and Predicted Mean Rates

Effect of Number of Locusts on Increase in SSO Sessions

Conclusions

Related Posts

Changes to CAS Security Vulnerability Response

CAS Vulnerability Disclosure

CAS Vulnerability Disclosure

CAS OpenID Connect Vulnerability Disclosure

Java CAS Client JWT Vulnerability Disclosure

Apereo CAS - External Identity Providers

CAS JWT Authentication Vulnerability Disclosure

Performance improvements on the service registry

Apereo CAS Dynamic Configuration Management

Apereo CAS Receives NLnet Grant to Advance CAS Development