Platform delivery - part II - low risk

Low Risk level

There are many ways to fail when delivering shared IT platforms, and in part 1 I described a typical high-risk delivery strategy often chosen by default. In this post I propose a better, low-risk way.

TL;DR

work with a small number of app teams with clear requirements, to define a mission statement
work with technology providers to define service level / cost options
follow Lean Startup techniques for service level selection & PoC
validate mission is met & measure service levels
keep job

Platform delivery - part I - high risk

Risks ahead

IT departments often choose to deliver their own shared platform for app teams, rather than opting for cloud-based alternatives. This might be for shared logging, monitoring, platform-as-a-service, container-as-a-service, functions-as-a-service, etc. This post describes some common mistakes…

TL;DR

The biggest risk is in delivering the ‘wrong thing’. The most likely way to do this is:

make platform choice based on platform provider marketing or gut feel
fail to define a clear service level
define platform success based on app team take-up

PaaS Mission Control

PaaS Mission Control

When delivering a PaaS platform it is important not to lose focus on the business value.

TL;DR

Create and publish a mission statement for your PaaS initiative
Measure your success & iterate on it to improve
Make PaaS a quantifiable Value Stream

TL;DR

Continuously deliver your PaaS using DevOps app delivery techniques
Build a cross-functional product team of engineers
Continuously Improve engineering and operations using agile dev team practices
Measure agility using your cycle time on CVE patches
Use community-standard tools, languages and processes, for best effect

Is PaaS like standard infrastructure?

Most enterprises assign an operations team within their IT department for deploying & managing their PaaS infrastructure. This is not a mistake; your chosen PaaS will run on top of virtualized or cloud infrastructure, use Linux container technologies and networking, the skills for which are already held by folks working in IT.

This team will immediately recognise facets of previous infrastructure deployment technologies. Taking Pivotal’s Cloud Foundry (PCF) platform as an example, the signs are: an installer; a management interface that standard users cannot access; operating system access that standard users do not have; admin-level visibility across all PaaS apps; platform logging not available to app teams; platform monitoring endpoints not accessible by app teams.

In the face of all this familiar operator access, the temptation is to operate the platform manually as generally happened with, for example, virtual machine (VM) estates.

The Cloud Foundry Buildpack Cycle

TL;DR

Deploy / Remove buildpacks using CI and IaC. Aim for complete automation
Continually update the ‘standard’ buildpacks that come with your CF installation & roll the apps that run on them
For custom buildpacks: setup 2 automated cycles for Create and Warn/Remove
Define a buildpacks service statement that makes the buildpack cycle clear
Avoid app-specific buildpacks wherever possible

CF Buildpacks vs. traditional infrastructure

With traditional infrastructure, where server installations are centrally managed, IT Security folks have often put pressure on IT departments to roll out server fixes e.g. new WebSphere fixpacks. This is tricky because IT departments do not have the resources to re-test all the apps against the upgraded binaries.

Mr. Enterprise meet Mr. Value Stream

TL; DR

Enterprises tend to ‘optimise’ IT by vertical stack & incur long delays at point of organisational change: waiting days or weeks for ticket fulfilment
In contrast small companies inherently recognize Value Streams
Enterprises overlaying Value Stream teams at organisational change would be much more competitive, including within software delivery

Example 1: New Starter Process

In the last 6 years I’ve had the pleasure of starting work at 2 large (1000+ employee) enterprises, and one startup. The new starter on-boarding process at both these companies was slow – I finally had all the access to all the systems I needed a few weeks after starting - and it involved lots of opening tickets. In both cases thousands of pounds was wasted on unused manpower whilst I sat waiting.

When I checked this with friends at other enterprises I found they’d had worse experiences in many cases: “You got a laptop in your first week?! Wow!”, “You were able to login on your third day? Incredible!”.

I contrast this with working at a small startup: I arrived, was hand-held through the process of accessing all the systems I needed & was adding value by the end of first day.

Small companies natively understand that getting a new starter working quickly is valuable to the company. They inherently use a Value Stream, a concept from the Lean movement which came from Toyota Production System (TPS), to allow cross-functional activities to happen in order to gain value (i.e. eliminate waste) for the company. The cross-functional activities being in this case: providing laptops, getting software licenses, installing software, creating accounts, getting access across systems, locating documentation, etc.

Don't get stuck on Agility

“Our customers are leaving for competitor X. We starting to think it’s because their apps are better” says the CEO to the CTO, “Why is it ours have such low usage figures?”

“We develop new features but it often turns out they were the wrong idea in the first place, or by the time they’re live our competitors already have better versions than ours” replies the CTO.

“Find out how to be more Agile” the order translates to as it hits Technology. A consultancy is hired to tell most of the tech folks at the Enterprise what they already know: the time it takes for an idea to make it into production (aka ‘cycle time’) is too long. Apparently quarterly concurrent releases across 30 apps, of which just a few are customer-facing, managed by 100 IT ops folks aren’t conducive to high velocity IT change. So how to change?

Well its good news and bad.

The good news: despite what most software vendors would have you think, the majority of the change is organisational. Buying a PaaS isn’t a necessity, although it may help. This change hasn’t got to cost a lot in software license terms.

The bad news: organisational change is hard.

TL;DR

It turns out the major changes can be summed up into two changes:

Switch to smaller teams & decouple them in every way. Deliverables are ‘microservices’
Switch from the project to product model
Treat each product like a Lean Startup
Decouple environment dependencies

The Antifragile PaaS

Much has been made of Nassim Nicholas Taleb’s Antifragility book.

At a basic level if fragile describes a system that suffers when put under stress, and robust describes a system that is impervious to stress, an anti-fragile system is one that benefits from stress.

Many things in nature are antifragile – a good example is the human body: you place a small amount of ‘stress’ on muscle, bone or skin and it’ll come back stronger than before the stress. In the world of IT we’re used to building robust systems: running apps on ‘traditional IT’ environments sized for peak load.

I’ve heard antifragility referenced in relation to microservices, heard recommendations on building similar systems and allowing for Darwinian selection amongst similar systems, etc. The next generation of applications will most certainly be antifragile - able to strengthen themselves in response to stressors. Will yours be amongst them?

Well, apps have to run on infrastructure of some kind. Today I’m going to talk about making antifragile PaaS infrastructure & how that can help make apps antifragile.

TL;DR

Some PaaS offerings are getting close to making apps antifragile using auto-scaling, bridge the gap using team process
Make your PaaS infrastructure antifragile using team process
Fail everything all the time & re-route traffic to simulate failures and normal operations

Don't mix cloud with IT depts

When it comes to Cloud, change isn’t coming; it’s already here. Software is eating the world and the only thing between firms dropping out of markets or folding completely is the speed they can change their customer-facing apps. Beating the competition is in many cases now a simple ‘foot race’ on who can deliver the best software experience to their customers. As a result it innovations in software delivery and IT infrastructure will increasingly determine the success or failure of industry players, rather than the quality of the product or service they provide.

Start-ups are taking ground from Enterprises in part due to their ability to better harness seemingly endless resources of the cloud, and yet many enterprises just don’t seem to be able to make the leap to ‘full cloud’ adoption.

Using my experience working with UK Enterprises I apply Disruptive Innovation (DI) theory to Enterprise IT departments & cloud adoption

TL;DR

Attempts to move to a cloud-like infrastructure model, with infrastructure managed by the same IT department as has run their ‘traditional IT’, are unlikely to realise the benefits of cloud

IT departments will face stiff competition from cloud ‘integrators’

An entirely separate ‘cloud services’ team, targeted at resolving the requirements of the security, compliance teams sooner, should enable business to consolidate their cloud platforms

Recruit developers with experience of delivering & operating cloud-ready apps on public cloud & form them into teams separate from ‘traditional IT’ delivery teams

DHR @ cloud

Platform delivery - part II - low risk

TL;DR

Platform delivery - part I - high risk

TL;DR

PaaS Mission Control

TL;DR

PaaS Continuous Delivery

TL;DR

Is PaaS like standard infrastructure?

The Cloud Foundry Buildpack Cycle

Mr. Enterprise meet Mr. Value Stream

Don't get stuck on Agility

The Antifragile PaaS

Don't mix cloud with IT depts

TL;DR

TL;DR

TL;DR

TL;DR

Is PaaS like standard infrastructure?

