CAS 5 Load Tests by Lafayette College


Contributed Content
Carl Waldbieser (<waldbiec [at] lafayette.edu>), an active member of the CAS community, was kind enough to share this analysis.

Overview

Load testing trials were conducted on the CAS stage environment in order to provide insight into what kind of sustained load the production CAS service will be able to carry. All trials were carried out against the same deployment architecture, with all nodes configured identically.

Deployment Architecture

The deployment architecture itself consists of 3 virtual machine nodes. Each node has 3.7 GiB real memory available to it and 2 CPUs.

The characteristics of the CPUs are as follows:

  • Architecture: x86_64
  • CPU op-mode(s): 32-bit, 64-bit
  • Byte Order: Little Endian
  • CPU(s): 2
  • On-line CPU(s) list: 0,1
  • Thread(s) per core: 1
  • Core(s) per socket: 2
  • Socket(s): 1
  • NUMA node(s): 1
  • Vendor ID: GenuineIntel
  • CPU family: 6
  • Model: 42
  • Model name: Intel Xeon E312xx (Sandy Bridge)
  • Stepping: 1
  • CPU MHz: 1899.999
  • BogoMIPS: 3799.99
  • Hypervisor vendor: KVM
  • Virtualization type: full
  • L1d cache: 32K
  • L1i cache: 32K
  • L2 cache: 4096K
  • NUMA node0 CPU(s): 0,1

The nodes are deployed behind an Nginx+ proxy in an active-active-active configuration. The nodes share ticket information using encrypted Hazelcast messages, a feature built into the CAS software, so any application state is shared.

The Test Swarm

The testing framework used was locust.io, a Python based load testing framework. The test suite deploys a fixed number of “locusts” against a web site. To lean more about locust, please see this guide.

The initial population ramps up with a configurable “hatch rate”. In the tests, locusts were conceptually divided into 3 “lifetime” categories:

  • Short-lived locusts live approximately 60 seconds.
  • Medium-lived locusts last for approximately 5 minutes.
  • Long-lived locusts exist for approximately 2 hours.

The category to which a given locust is assigned is randomly determined with a ratio of short : medium : long being 7:2:1. Ideally, 70% of the population is short-lived, 20% is medium lived, and 10% is long-lived.

The lifetime of a locust determines how long it will retain and make use of a single web SSO session. Short-lived locusts discard their sessions quickly. Long-lived locusts hold on to them for considerable time.

All locusts are only 25% likely to log out upon their deaths. The CAS service must continue to track TGTs of locusts that have not logged out until the ticket expires, so this behavior can put pressure on the memory storage resources of the nodes.

Each locust uses credentials taken randomly from one of 9 test accounts. Each locust has a 1% chance of entering an erroneous password for an account. Locusts that fail to authenticate will die immediately.

When a locust dies, it is reborn immediately. Its lifetime category remains the same, but its SSO session and all other random parameters are reset.

The Trials

Trial 1

# of Locusts Hatch Rate
300 10/s

The first trial produced authentication events at a rate of over 3,500 event/minute. The majority of these were service ticket creation and validation events. By 13:40 (~5 hours into the trial), degraded performance became noticeable. By 15:50 (~7 hours in), the nodes were swamped. The trial was discontinued shortly thereafter.

Trial 2

# of Locusts Hatch Rate
300 10/s

The 2nd trial was similar to the first, but the number of locusts was briefly increased to 500 for a 3 minute duration.

The characteristics of trial 2 are very similar to those of trial 1. After responding to ~3,500 / minute for close to 6 hours, the nodes were overwhelmed.

Trial 3

# of Locusts Hatch Rate
150 10/s

3

The 3rd trial used half the number of locusts used in the first 2 trials. The sustained event rate was ~1,700 authentication events per minute. Unlike the previous 2 trials, this trial was concluded prior to the nodes becoming overwhelmed. It is unknown whether the nodes could continue to sustain responding to events at this rate indefinitely.

Trial 4

# of Locusts Hatch Rate
50 10/s

4

The 4th trial showed the nodes were capable of sustaining a ~600 authentication events/minute rate for a full 24 hours. Because the maximum lifetime of a TGT is 8 hours, there is some reason to believe this rate could have been sustained indefinitely.

Trial 5

# of Locusts Hatch Rate
25 10/s

5

This trial was useful in establishing the mean rate of 291 events per minute given a swarm of 25 locusts. This test and test 6 are notably shorter in duration than the other tests, as it is assumed at this point the nodes can sustain the loads indefinitely.

Trial 6

# of Locusts Hatch Rate
10 10/s

6

The mean rate of authentication events for this trial is 118 events per minute.

Conclusions

Measurements taken from the production CAS service from April 17-21, 2017 during normal business hours (9am to 5pm) have the following characteristics:

Mean Median Mode Max Min Std Dev
149 events/minute 139 events/minute 126 events/minute 494 events/minute 2 events/minute 68

While there appears to be some “burstiness” in the rate of authentication events, all 3 types of averages are well below the the expected threshold which the trials suggest are indefinitely sustainable. Even the maximum rate of 494 events per minute is well below the sustained rate of trial 4 (~ 600 events / minute).

The data suggests that the production CAS service is operating well under the maximum sustainable load, and should have plenty of capacity to spare for temporary spikes in utilization.

Related Posts

CAS 5.3.0 RC4 Feature Release

...in which I present an overview of CAS 5.3.0 RC4 release.

Apereo CAS - Test-Driving Feature Modules

An overview of how various CAS features modules today can be changed and tested from the perspective of a CAS contributor working on the codebase itself to handle a feature request, bug fix, etc.

Feedback on draft Apereo strategy

Focus on revenue to achieve sustainability. Defer everything else.

Apereo CAS Best [Mal]Practice - Supercharged Overlays

An overview of how a CAS overlay prepped for deployment can tap into internal components, altering logic and behavior for good and evil...but mostly evil.

CAS 5.3.0 RC3 Feature Release

...in which I present an overview of CAS 5.3.0 RC3 release.

Apereo CAS - REFEDS MFA Profile with shib-cas-authn3

An overview of the shib-cas-authn3 project and its support for the REFEDS MFA profile with both the Shibboleth Identity Provider and Apereo CAS lending a hand.

Forced Authentication with Apereo CAS

Discourse on supporting forced authentication with the Apereo CAS server from the perspective of an application protected with mod-auth-cas, the Apache httpd module for CAS.

Apereo CAS - Dances with Protocols

A short overview of how Apereo CAS may support multiple authentication protocols simultaneously while acting as both the primary identity provider or proxying another. Two Socks could not be reached for comments.

Why does uPortal use Apache 2 license?

Apache2 chose uPortal.

Apereo CAS - Attribute-based Application Authorization

A walkthrough to demonstrate how one might fetch attributes from a number of data sources, turning them into roles that could then be used to enforce application access and authorization.