CAS 5 Load Tests by Lafayette College


Contributed Content
Carl Waldbieser (<waldbiec [at] lafayette.edu>), an active member of the CAS community, was kind enough to share this analysis.

Overview

Load testing trials were conducted on the CAS stage environment in order to provide insight into what kind of sustained load the production CAS service will be able to carry. All trials were carried out against the same deployment architecture, with all nodes configured identically.

Deployment Architecture

The deployment architecture itself consists of 3 virtual machine nodes. Each node has 3.7 GiB real memory available to it and 2 CPUs.

The characteristics of the CPUs are as follows:

  • Architecture: x86_64
  • CPU op-mode(s): 32-bit, 64-bit
  • Byte Order: Little Endian
  • CPU(s): 2
  • On-line CPU(s) list: 0,1
  • Thread(s) per core: 1
  • Core(s) per socket: 2
  • Socket(s): 1
  • NUMA node(s): 1
  • Vendor ID: GenuineIntel
  • CPU family: 6
  • Model: 42
  • Model name: Intel Xeon E312xx (Sandy Bridge)
  • Stepping: 1
  • CPU MHz: 1899.999
  • BogoMIPS: 3799.99
  • Hypervisor vendor: KVM
  • Virtualization type: full
  • L1d cache: 32K
  • L1i cache: 32K
  • L2 cache: 4096K
  • NUMA node0 CPU(s): 0,1

The nodes are deployed behind an Nginx+ proxy in an active-active-active configuration. The nodes share ticket information using encrypted Hazelcast messages, a feature built into the CAS software, so any application state is shared.

The Test Swarm

The testing framework used was locust.io, a Python based load testing framework. The test suite deploys a fixed number of “locusts” against a web site. To lean more about locust, please see this guide.

The initial population ramps up with a configurable “hatch rate”. In the tests, locusts were conceptually divided into 3 “lifetime” categories:

  • Short-lived locusts live approximately 60 seconds.
  • Medium-lived locusts last for approximately 5 minutes.
  • Long-lived locusts exist for approximately 2 hours.

The category to which a given locust is assigned is randomly determined with a ratio of short : medium : long being 7:2:1. Ideally, 70% of the population is short-lived, 20% is medium lived, and 10% is long-lived.

The lifetime of a locust determines how long it will retain and make use of a single web SSO session. Short-lived locusts discard their sessions quickly. Long-lived locusts hold on to them for considerable time.

All locusts are only 25% likely to log out upon their deaths. The CAS service must continue to track TGTs of locusts that have not logged out until the ticket expires, so this behavior can put pressure on the memory storage resources of the nodes.

Each locust uses credentials taken randomly from one of 9 test accounts. Each locust has a 1% chance of entering an erroneous password for an account. Locusts that fail to authenticate will die immediately.

When a locust dies, it is reborn immediately. Its lifetime category remains the same, but its SSO session and all other random parameters are reset.

The Trials

Trial 1

# of Locusts Hatch Rate
300 10/s

The first trial produced authentication events at a rate of over 3,500 event/minute. The majority of these were service ticket creation and validation events. By 13:40 (~5 hours into the trial), degraded performance became noticeable. By 15:50 (~7 hours in), the nodes were swamped. The trial was discontinued shortly thereafter.

Trial 2

# of Locusts Hatch Rate
300 10/s

The 2nd trial was similar to the first, but the number of locusts was briefly increased to 500 for a 3 minute duration.

The characteristics of trial 2 are very similar to those of trial 1. After responding to ~3,500 / minute for close to 6 hours, the nodes were overwhelmed.

Trial 3

# of Locusts Hatch Rate
150 10/s

3

The 3rd trial used half the number of locusts used in the first 2 trials. The sustained event rate was ~1,700 authentication events per minute. Unlike the previous 2 trials, this trial was concluded prior to the nodes becoming overwhelmed. It is unknown whether the nodes could continue to sustain responding to events at this rate indefinitely.

Trial 4

# of Locusts Hatch Rate
50 10/s

4

The 4th trial showed the nodes were capable of sustaining a ~600 authentication events/minute rate for a full 24 hours. Because the maximum lifetime of a TGT is 8 hours, there is some reason to believe this rate could have been sustained indefinitely.

Trial 5

# of Locusts Hatch Rate
25 10/s

5

This trial was useful in establishing the mean rate of 291 events per minute given a swarm of 25 locusts. This test and test 6 are notably shorter in duration than the other tests, as it is assumed at this point the nodes can sustain the loads indefinitely.

Trial 6

# of Locusts Hatch Rate
10 10/s

6

The mean rate of authentication events for this trial is 118 events per minute.

Conclusions

Measurements taken from the production CAS service from April 17-21, 2017 during normal business hours (9am to 5pm) have the following characteristics:

Mean Median Mode Max Min Std Dev
149 events/minute 139 events/minute 126 events/minute 494 events/minute 2 events/minute 68

While there appears to be some “burstiness” in the rate of authentication events, all 3 types of averages are well below the the expected threshold which the trials suggest are indefinitely sustainable. Even the maximum rate of 494 events per minute is well below the sustained rate of trial 4 (~ 600 events / minute).

The data suggests that the production CAS service is operating well under the maximum sustainable load, and should have plenty of capacity to spare for temporary spikes in utilization.

Related Posts

Apereo CAS - Microsoft Office 365 SAML2 Integration

Learn how to integrate Microsoft Office 365 with Apereo CAS running as a SAML2 identity provider.

Apereo CAS - HappyFox SAML2 Integration

Learn how to integrate HappyFox with Apereo CAS running as a SAML2 identity provider.

Apereo CAS - Cisco Webex SAML2 Integration

Learn how to integrate Cisco Webex with Apereo CAS running as a SAML2 identity provider.

Apereo CAS - VMware Identity Manager SAML2 Integration

Learn how to integrate VMware Identity Manager with Apereo CAS running as a SAML2 identity provider.

CAS 6.0.0 RC4 Feature Release

...in which I present an overview of CAS 6.0.0 RC4 release.

Apereo CAS - Scripting Multifactor Authentication Triggers

Learn how Apereo CAS may be configured to trigger multifactor authentication using Groovy conditionally decide whether MFA should be triggered for internal vs. external access, taking into account IP ranges, LDAP groups, etc.

Apereo CAS 6.0.x - Building CAS Feature Modules

An overview of how various CAS features modules today can be changed and tested from the perspective of a CAS contributor working on the codebase itself to handle a feature request, bug fix, etc.

CAS 6.0.x Deployment - WAR Overlays

Learn how to configure and build your own CAS deployment via the WAR overlay method, get rich quickly, stay healthy indefinitely and respect family and friends in a few very easy steps.

Apereo CAS - Jib at CAS Docker Images

Learn how you may use Jib, an open-source Java containerizer from Google, and its Gradle plugin to build CAS docker images seamlessly without stepping too deep into scripting Dockerfile commands.

Apereo CAS 6 - Administrative Endpoints & Monitoring

Gain insight into your running Apereo CAS 6 deployment in production. Learn how to monitor and manage the server by using HTTP endpoints and gather metrics to diagnose issues and improve performance.