Cloud, DevOps, ITIL, Provisioning – A reflection on James Urquhart’s and Dan Wood’s articles

By , March 30, 2010 16:57

James Urquhart (@jamesurquhart) posted a series of articles on operational management in the cloud on his blog: The Wisdom of Clouds.

Following are my comments on his series and the discussions that followed on Twitter

But first, the links to James’ articles, entitled “Understanding cloud and ‘devops'”:

It also refers to his series around Payload description and Application packaging as well as Dan Woods’ article on Virtualization’s Limits

Dan states:

“But automated provisioning and management of cloud computing resources is just one of three elements needed for intelligent workload management. The other two are the ability to set up and configure an application that is going to run on a cloud or virtualized server, and then making sure the data ends up where it is needed. Here’s where the hard problems arise.”

Dan is right, but also wrong in his arguments.

Let’s look back a bit in IT history: 5-10 years ago, the notion of “provisioning” did try to shape the way, DCs should have been managed. Terms like SODC (service oriented datacenter) and OM (operational maturity) were hip. Still they neglected a couple of trivial things, like: Inconsistent upgrade paths of software stacks, and the inherent “need” of app-users to tweak the apps according to their perceived needs.

Let’s look at the latter first: Why did that culture of “tweaking” or “tuning” apps happen? Because in many cases the HW had not been fast enough to fulfill the needs of the end-users. That’s, why tuning was very popular, and happened close to always. But there’s a side-effect to that:

R. Needleman, Editor in Chief of Byte Magazine decades ago, once wrote to this topic in an editorial:

“And no matter what hardware you have, it’s really hard to learn to play piano.”

This might be proof to Dan’s statement, but it also is proof to a dilemma, that many hardware selling and creating companies today have: The need of the software w.r.t. CPU-cycles didn’t keep up with Moore’s Law. That’s why we see more and more underutilized systems, and we experience a shift towards appliances. Because this seems to be the only way for a hardware creator and vendor to survive: Create the margin from something different than the hardware. Add stuff to the stack, so that a competitive advantage occurs across the stack. That’s for example, why Oracle bought Sun. From this also comes a second thing: Standardization. In order to be able to exchange the underlying hardware for cheaper and more powerful hardware, app-deployers and -users now tend to no longer tweak and tune as much as they did decades ago. Today, we see way more “standardized” deployments of software stacks, than we saw decades ago. This is also triggered with the broad acceptance of virtualization. V12N does at least provide a standardized layer for the Operating System, so that here no longer any tweaking or tuning is needed. That also in turn led to the notion of also applying such methods to the apps on top of the OS and we see so-called “images” being the element of access in virtualized environments.

Back to Dan’s argument, and his problem statement:

I’ve been in provisioning for more than a decade now, and I’ve seen 100% automated setups, from Deutsche Bank’s working RCM (Reliable Configuration Management) over to its next version, RCMNG (Next Generation), to the never deployed APE (A Provisioning Environment) at Daimler Chrysler over to the things, that are in production at BG-Phoenics or Deutsche Bahn. These things do work, and, yes, they do a 100% automated bare-metal install, up to app deployment and app-configuration management even up to the content provisioning.

So, back to James’ points, which also addresses the former pain-point mentioned above!

The main problem of all these environments is the fact, that the “meta data”, that James refers to, needs to be adopted and kept up-to-date over the lifetime of an environment to the ever changing pieces it is build of. Never assume, that the install for version X of app C can be used also for version Y of app C. Here, a big maintenance effort has to be done, and with the diversity of the apps themselves, even across versions, this is something, that can’t be neglected. And in an environment, where time-to-market and fine-tuned setup is key, spending time on shaping the meta-handling simply didn’t occur or has not been worthwhile.

So, with the advent of V12N and the term “Cloud Computing” we now get into an era, were due to the more standardized deployments of OSes as well as Apps, and with the fact, that most of the “configuration” of the apps can already now be done during installation, that amount of work needed to manage the “meta data” changes and gets smaller. That in turn allows to again think about provisioning on a broader scale.

James describes in his “Payload description” article and its predecessor exactly the things, that had been the factors for companies like TerraSpring or CenterRun to create their provisioning tools. James calls the envelop a pCard. CenterRun did call this, over a decade ago, a resource. In CenterRun, resources can inherit capabilities (parameters, deployment routines, et.al., it’s a really object oriented approach!) from other resources and can also ask their targets of installation (called hosts, which can by physical or virtual, a virtual host in turn can be an “entity” like a web-server-farm, where you can deploy content or also “apps” into) for their specific capabilities, like payload spare-room, or OS-version, or CPU type, or you-name-it.

So, what’s been needed in order to successfully use tools like CenterRun (and, yes, that’s not the only tool of that time! There’s been way more!) was a modeling of the overall stack, breaking it down into generic, but specific enough resources and hosts, so that deployment can be done over a longer period of time. Pitfalls mostly were, that thinking of “hosts” did limit people to believe, that a host is a “physical machine”.

Now, that we see, that James’ ideas are nothing new, and had already been proven to work close to a decade ago, why did those not have been a great success over the time or are even seen by James as part of the solution to his problem statement? Or even Dan’s ideas of the need for “Systems Management” at a higher level?

I do see mainly two reasons for that, both already being mentioned above:

  • It’s tedious to manage all of the needed meta-data of the whole stack.
  • The stack did change to often to make it worthwhile to use “provisioning” or “automation” of the stack. I once stated: “If you want to automate chaos, you’ll get chaos automatically!”

So, why do people like Dan or James believe, and why do I agree, that now, with the notion of “Cloud Computing”, it’s time again to think about “provisioning”?

First, as mentioned above, the complexity of the stack is reducing itself due to the fact, that V12N is helping with standardization: Less options, easier to manage!

Second, many later-on config and tuning options are now options to the installer, or will simply never be performed. There’s a couple of reasons for that: Again, CPU-cycles are now more easily available, so that fine-grained tuning no longer is a necessity. And, many config-options are now install-time options, making also the handling easier, because the steps to achieve a given goal are reduced. And then many customers learned the hard way, that tweaking a software to its limits killed a possible upgrade-path to newer versions, as some features or “tweaks” had simply disappeared in newer versions. So, CUs tend now to stick to more off-the-shelf installs, hoping to be able to quicker upgrade to newer versions. This in turn also reduces the complexity of the pCard (James’ speaking) or the meta-data-modeling, making it possible to perform such tasks.

Third, we see a “reduction” in options for tasks or problems. There’s a concentration going on in the IT-industry, which in some publications is called “industrialization of IT” or “commoditization”. With that comes the reduction of for example software-solutions for a given task, and also a concentration in the hands of single companies. That leads to more integrated software-stacks, which in turn also simplifies the meta-data, and makes it feasable to start again looking at provisioning of the whole stack. Like in the car industry, you’re no longer looking for the individual parts to build a car from, you’re buying it “off-the-shelf”, or, in the car-manufacturing part of the story, you’re no longer looking for people to build the car, but since the invention of Ford (construction-belt), you’re looking at automating the building of the car.

So, what now is James saying in the so-far 2-part DevOps series?

He’s going back to what I stated above as “Operational Maturity” (ITIL speak). No longer management of individual pieces and being forced to react to changes in those resources, but “designing” stuff, so that they can benefit from whatever underlying layers are available.

In my world, there are also constraints that need to be acknowledged: In order do design stuff, you need to have at least two things: Freedom (and capabilities!) to “implement” your dreams and simple enough elements to build the implementation of those “dreams”. If you would be forced to create a “one-off” for the implementation of your dream (or design), then some basic requirements might be difficult to achieve, like “elasticity” or “rapid deployment”.

So, also here the basic rules of “managing constraints” is still in place. Yes, James is right in that the focus shifts from OSes and servers to applications. That’s why the term “appliance” was created a while ago, and why all vendors today start shifting their focus to easily provide “services” in form of an appliance. An example today from the company I work for is the Exadata 2 DataBase machine. Order it, get it, and use it latest two days after delivery. No more tweaking, configuring, and exception handling if the pieces don’t work as expected. You get, what you want and what you need.

This appliance approach, when brought to the “Cloud” needs rules, so that these appliances can happily live together in the cloud. That’s what James describes in his second article of the series.

Still, my mantra from years ago, applies: “If you automate chaos, you’ll get chaos automatically!”

But: Today it gets easier to manage the chaos, as there are less switches and glitches to manage, due to more standardized elements of the stack. That also in turn makes it easier for the “provisioning tool provider”, as the tools themselves no longer need to be over-sophisticated, but can be stripped down to simpler approaches. That’s, why for example, in Oracle Enterprise Manager Grid Control the provisioning part gets more important over time, and will be an important part of the systems management portfolio. Without the elasticity management capabilities and the deployment capabilities, you no longer can manage, and therefore sell, software.

But, let’s not forget: Here, we’re talking about the “back-end” side of things! The “front-end” side, with the “desktop-computing” part, I did cover in my former post: VDI and its future

Finally, I’ll leave you with Tim O’Reilly, who did publish his thoughts on the Internet Operating System, which Sam Johnston calls the cloud… ;-)

Enjoy!

Matthias

Panorama Theme by Themocracy