CAS 5 Load Tests by Lafayette College


Contributed Content
Carl Waldbieser (<waldbiec [at] lafayette.edu>), an active member of the CAS community, was kind enough to share this analysis.

Overview

Load testing trials were conducted on the CAS stage environment in order to provide insight into what kind of sustained load the production CAS service will be able to carry. All trials were carried out against the same deployment architecture, with all nodes configured identically.

Deployment Architecture

The deployment architecture itself consists of 3 virtual machine nodes. Each node has 3.7 GiB real memory available to it and 2 CPUs.

The characteristics of the CPUs are as follows:

  • Architecture: x86_64
  • CPU op-mode(s): 32-bit, 64-bit
  • Byte Order: Little Endian
  • CPU(s): 2
  • On-line CPU(s) list: 0,1
  • Thread(s) per core: 1
  • Core(s) per socket: 2
  • Socket(s): 1
  • NUMA node(s): 1
  • Vendor ID: GenuineIntel
  • CPU family: 6
  • Model: 42
  • Model name: Intel Xeon E312xx (Sandy Bridge)
  • Stepping: 1
  • CPU MHz: 1899.999
  • BogoMIPS: 3799.99
  • Hypervisor vendor: KVM
  • Virtualization type: full
  • L1d cache: 32K
  • L1i cache: 32K
  • L2 cache: 4096K
  • NUMA node0 CPU(s): 0,1

The nodes are deployed behind an Nginx+ proxy in an active-active-active configuration. The nodes share ticket information using encrypted Hazelcast messages, a feature built into the CAS software, so any application state is shared.

The Test Swarm

The testing framework used was locust.io, a Python based load testing framework. The test suite deploys a fixed number of “locusts” against a web site. To lean more about locust, please see this guide.

The initial population ramps up with a configurable “hatch rate”. In the tests, locusts were conceptually divided into 3 “lifetime” categories:

  • Short-lived locusts live approximately 60 seconds.
  • Medium-lived locusts last for approximately 5 minutes.
  • Long-lived locusts exist for approximately 2 hours.

The category to which a given locust is assigned is randomly determined with a ratio of short : medium : long being 7:2:1. Ideally, 70% of the population is short-lived, 20% is medium lived, and 10% is long-lived.

The lifetime of a locust determines how long it will retain and make use of a single web SSO session. Short-lived locusts discard their sessions quickly. Long-lived locusts hold on to them for considerable time.

All locusts are only 25% likely to log out upon their deaths. The CAS service must continue to track TGTs of locusts that have not logged out until the ticket expires, so this behavior can put pressure on the memory storage resources of the nodes.

Each locust uses credentials taken randomly from one of 9 test accounts. Each locust has a 1% chance of entering an erroneous password for an account. Locusts that fail to authenticate will die immediately.

When a locust dies, it is reborn immediately. Its lifetime category remains the same, but its SSO session and all other random parameters are reset.

The Trials

Trial 1

# of Locusts Hatch Rate
300 10/s

The first trial produced authentication events at a rate of over 3,500 event/minute. The majority of these were service ticket creation and validation events. By 13:40 (~5 hours into the trial), degraded performance became noticeable. By 15:50 (~7 hours in), the nodes were swamped. The trial was discontinued shortly thereafter.

Trial 2

# of Locusts Hatch Rate
300 10/s

The 2nd trial was similar to the first, but the number of locusts was briefly increased to 500 for a 3 minute duration.

The characteristics of trial 2 are very similar to those of trial 1. After responding to ~3,500 / minute for close to 6 hours, the nodes were overwhelmed.

Trial 3

# of Locusts Hatch Rate
150 10/s

3

The 3rd trial used half the number of locusts used in the first 2 trials. The sustained event rate was ~1,700 authentication events per minute. Unlike the previous 2 trials, this trial was concluded prior to the nodes becoming overwhelmed. It is unknown whether the nodes could continue to sustain responding to events at this rate indefinitely.

Trial 4

# of Locusts Hatch Rate
50 10/s

4

The 4th trial showed the nodes were capable of sustaining a ~600 authentication events/minute rate for a full 24 hours. Because the maximum lifetime of a TGT is 8 hours, there is some reason to believe this rate could have been sustained indefinitely.

Trial 5

# of Locusts Hatch Rate
25 10/s

5

This trial was useful in establishing the mean rate of 291 events per minute given a swarm of 25 locusts. This test and test 6 are notably shorter in duration than the other tests, as it is assumed at this point the nodes can sustain the loads indefinitely.

Trial 6

# of Locusts Hatch Rate
10 10/s

6

The mean rate of authentication events for this trial is 118 events per minute.

Conclusions

Measurements taken from the production CAS service from April 17-21, 2017 during normal business hours (9am to 5pm) have the following characteristics:

Mean Median Mode Max Min Std Dev
149 events/minute 139 events/minute 126 events/minute 494 events/minute 2 events/minute 68

While there appears to be some “burstiness” in the rate of authentication events, all 3 types of averages are well below the the expected threshold which the trials suggest are indefinitely sustainable. Even the maximum rate of 494 events per minute is well below the sustained rate of trial 4 (~ 600 events / minute).

The data suggests that the production CAS service is operating well under the maximum sustainable load, and should have plenty of capacity to spare for temporary spikes in utilization.

Related Posts

CAS 6.1.0 RC4 Feature Release

...in which I present an overview of CAS 6.1.0 RC4 release.

Apereo CAS - Multifactor Provider Selection

Learn how to configure CAS to integrate with and use multiple multifactor providers at the same time. This post also reveals a few super secret and yet open-source strategies one may use to select appropriate providers for authentication attempts, whether automatically or based on a menu.

Apereo CAS - Dockerized Hazelcast Deployments

Learn how to run CAS backed by a Hazelcast cluster in Docker containers and take advantage of the Hazelcast management center to monitor and observer cluster members.

Apereo CAS - Configuration Security w/ Jasypt

Learn how to secure CAS configuration settings and properties with Jasypt.

CAS 6.1.0 RC3 Feature Release

...in which I present an overview of CAS 6.1.0 RC3 release.

Apereo CAS - Webflow Decorations

Learn how you may decorate the Apereo CAS login webflow to inject data pieces and objects into the processing engine for display purposes, peace on earth and prosperity of all mankind, etc. Mainly, etc.

Apereo CAS - SAML2 Metadata Query Protocol

Learn how you may configure Apereo CAS to fetch and validate SAML2 metadata for service providers from InCommon's MDQ server using the metadata query protocol.

Saving Time is Time Consuming

May you live in the best of times. May you live in the startup times.

Apereo CAS 6.1.x - Credential Caching & Proxy AuthN

Learn how you may configure Apereo CAS to capture and cache the credential's password and the proxy-granting ticket in proxy authentication scenarios, pass them along to applications as regular attributes/claims. We will also be reviewing a handful of attribute release strategies that specifically affect authentication attributes, conveying metadata about the authentication event itself.

Apereo CAS 6.1.x - Attribute Repositories w/ Person Directory

An overview of CAS attribute repositories and strategies on how to fetch attributes from a variety of sources in addition to the authentication source, merge and combine attributes from said sources to ultimately release them to applications with a fair bit of caching.