Service Level Objectives
|
05-30-2020, 12:46 AM
(This post was last modified: 06-06-2020 11:02 PM by Bill Duncan.)
Post: #1
|
|||
|
|||
Service Level Objectives
Background
This will mostly be of interest to systems operations folk, system administrators (sysadmins), and often called SREs (System Reliability Engineers), DevOps these days. Things fail. Systems fail. Large scale systems often depend on hundreds or even thousands of "Backend" systems these days; usually Virtual Machines (VMs) or more recently "containers". The more backend systems which are used (usually to improve response times), the more likely there will be failures to deal with. The terminology that has grown around this includes: SLI - Service Level Indicators -- things like availability, latency, errors SLO - Service Level Objectives -- objectives based on the indicators that are used to gauge reliability of a service. SLA - Service Level Agreements -- sometimes objectives are communicated with customers in the form of agreements, often with penalties to the provider if the objectives are not met. The reliability (eg. availability, latency, errors) that users experience can dramatically deteriorate when the number of backend systems is increased. This program is about exploring some of the variables involved; how the number and reliability of backend systems impacts the user experience, probably. A more detailed description and background can be found in these two articles: The Tail at Scale The Tail at Scale Revisited Operation: This program enables you to play with the numbers a bit, possibly while developing SLOs (for the backend and/or users) and SLAs. Code:
The "A" and "B" keys (and corresponding registers) translate between "service level" (probability of meeting objective) and "failure rate" (reciprocal). Two ways of describing the same thing. The "C", "D" and "E" keys (and registers) are used to look at the relationship between the front end SLO or SLA and backend SLO for supporting it. The "E" key specifies the number of backend services. "F" key translates back end service level to a level which involves two replicas. Also updates Register 04. Pressing "R/S" after any calculations or storing is finished will bring up the mnemonics again. Pressing "R/S" one more time will turn the calculator off in a way that will display the mnemonics (and remind you what program you're in) when you turn it on again. Using the user keys usually works fine. You can also RCL the register directly or prefix with "XEQ" to force the calculation. (User keys will fail to detect "number entry" if you use an existing number in the X register for example. Just STO the number. Also, if you had entered a number that you hadn't intended to store, pressing a user key will store it. Pressing it again will do the calculation, or use "XEQ" directly.) Example: Some customers are complaining that our services are not meeting target objectives (or agreements). We find that the backend services are failing to meet their time budgets at a rate of about one in a thousand which is a few orders of magnitude better than the front end. (99.9% vs. 90% in the front end.) Most of our customers are small and the queries hit a few dozen backend systems while the few larger customers who are complaining can sometimes hit 500+ systems. What service level objectives should we be aiming for in the backend to meet the objectives for all clients? How can we best do that? Code:
The Code: Code:
|
|||
07-19-2020, 01:29 AM
Post: #2
|
|||
|
|||
RE: Service Level Objectives
I've added another post with a "close enough" approximation.
The approximation is close enough in the range of customer happiness that matters and so simple a calculator isn't really required.. lol.. https://billduncan.org/the-tail-at-scale-approximation/ |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 1 Guest(s)