Category Archives: Enterprise Architecture

Windows Azure – First Glimpse

Well, I finally got around to signing up for a three month free trial of Microsoft’s Azure. In case you’ve been living under a rock these past few years, Azure is Microsoft’s cloud offering of virtual servers, virtual storage, web servers and associated services such as networking (between the virtual resources), Active Directory, SQL Server and something they call Mobile Services. This last one makes it easier to set up back-end services to support mobile apps for iOS devices and Windows Phones, although from reading the documentation I can’t see how this is any different from setting up a virtual server.

Whereas AWS has a bunch of pre-configured AMIs (Amazon Machine Images) that you can choose so you can get up and running more quickly than starting up a bare server and downloading and installing web servers, databases, applications, etc., I like that Azure distinguishes between Virtual Machines and Web Sites. Under Virtual Machines you can choose from a handful of Microsoft servers (SQL Server, BizTalk, Windows) or a very small collection of Linux servers (CentOS, Redhat and Ubuntu 12. Web Sites lets you choose from a collection of pre-baked application environments including the standards like Joomla, WordPress and Drupal but also some others for photo galleries, blogging and commercial web sites. AWS has an awesome array of choices, but they don’t make it easy for you to shop them. Azure has a much more limited selection, but it’s easy to browse through the list and see what’s on offer, almost like an app store. The breadth and depth of what’s on offer can only get better over time.

My first impression is that the user interface is very slick, intuitive and functional. Slightly better than AWS and way better than HP Public Cloud, which is a bit old school and dated, even though it’s the newest kid on the block. I was able to set up an Ubuntu virtual machine quite easily, even though it took me three tries. The first two times I tried uploading a .pem file to use for authentication (the same file I use to authenticate on my AWS servers), but this was rejected as not being X.509 compliant. There is nothing readily visible on the site on how to generate a compliant key-pair, so I’m left with standard username and password authentication. No worries, I’m only using this for a bit of mucking about to see how it works. But they really shouldn’t let users skip this important security step without giving them the implications and an option to do it right.

It took a lot longer for my server to be set up than on HP Public Cloud or AWS, about 5 minutes vs 2. And then, frustratingly, once it was finally created, it was in STOPPED mode so I had to figure out how to start it. I really like, however, that Azure lets you choose something meaningful for the first part of your server’s url, e.g. myserver.cloudapp.net. AWS dictates something unmemorable like ec2-46-xx-yyy-zzz.ap-northeast-1.compute.amazonaws.com. And with AWS you can get a public IP address to assign to the long url but you’ll pay an extra $0.01 per hour for the pleasure. HP Public Cloud gives you a public IP address but doesn’t charge you any extra. I think Azure gets it right here, although I’d also like to see the option of having a fixed public IP address.

When it comes to global regions, HP Public Cloud gives you three US-based options, AWS gives you the most with three in the US, plus Ireland, Brazil, Tokyo, Singapore and Sydney. Azure comes in the middle with US East, US West, Southeast Asia, East Asia, North Europe and West Europe. They don’t provide much of a clue as to where these servers are physically located. This might matter to some companies for data protection or other regulatory reasons, though companies with those types of concerns probably shouldn’t be putting their applications and data on the cloud in the first place. The thing that slightly irked me, however, was that I set up a virtual Ubuntu machine in North Europe, but when I started up a remote desktop session and surfed to whatismyipaddress.com, it showed me as Microsoft Corporation in Wichita, Kansas! This is important for companies who want to situate their services as close as possible to their users for cost and performance reasons. Maybe there’s a way to specify connection points to the public Internet but it wasn’t immediately obvious.

Lastly, and it’s just as much of a show stopper for me on Azure as it is on HP Public Cloud, there is no easy way to suspend a server and stop paying the hourly charge. Here is a direct quote from the Azure help pages: “You are billed for a virtual machine that exists in Windows Azure whether it is running or stopped. You must delete the virtual machine to stop being billed for it.” To my simple and cloud-indoctrinated mind this just doesn’t compute. Virtualization and cloud are about pay for use, like an electricity model. Pay for as much as you use. Each time you turn off all of your electrical devices and stop drawing electricity from the grid, you don’t have to tear out all the wiring. No, you just carry on and when you switch the lights back on, you start paying again. On AWS they give me some choices: STOP, which equals suspend, and means I don’t pay the server hourly charge, just some nominal, minuscule fee to store the virtual image, and which also means I can later restart the server from exactly where I left off; or TERMINATE, which stops and completely deletes the virtual server. But it seems Microsoft wants to charge you for the virtual server even though you’re not using it. Which means either they haven’t figured out all  the automated provisioning, scripting, clean-ups, billing, etc.; they want to discourage this type of behaviour; or they’re hoping to gouge their users by charging them for services they may not be using.

I’ll carry on playing with Azure until my three month free trial runs out, but I won’t become a paying customer until Azure offers the ability to suspend virtual images without paying the run-time charges.

App-ifying Business

Between the Apple, Google, Blackberry and Microsoft stores there are over 1 million apps that you can download to a handheld device to do anything from playing games and watching movies to managing your finances and booking travel. Consumers can perform thousands of tasks using their smart devices, to the point that PC sales are declining relative to historical trends while sales of tablets and smart phones are going through the roof. Consumers are doing more on smart devices and less on traditional form factor PCs. But so far, with limited exceptions, business users continue to perform the majority of their day to day work-related tasks on desktop and laptop PCs. There are a number of reasons for this, including usability, security, suitability and other functional reasons; but there are other less tangible constraints such as cultural inertia and the inability of IT departments to react quickly enough in retrofitting new end-user technologies onto legacy business systems. Technically it can be done. But IT departments are notorious for getting stuck in their ways.

I have no doubt that in five years time a majority of work functions will be initiated / performed / managed on smart devices. These devices will be a mix of tablets, phablets, phones and a new breed of laptops and PCs. This new breed of PCs will be more like tablets than traditional PCs in the way you buy them, the way you put applications onto them, the security model and the way software is updated. The big difference will be in how applications are developed, distributed and accessed by business users.

Today, the part of a corporate system that users see is usually a laptop or desktop PC with a proprietary and standardized configuration, or build, of Windows with a collection of specific office and productivity tools, email client, browser, anti-virus software, third party and custom-built fat client applications, etc. These are usually doled out and supported by an in-house or outsourced IT department. Change is slow, and it seems that once you’ve upgraded from XP to Windows 7, it’s time to contemplate Windows 8, fearing that once that update is done, there will be a new version to roll-out.

But what if the business user experience more closely matched the consumer experience? What if you could go to the Apple app store or Google Play store, search for and download your company’s app. Once it’s loaded onto your device, you authenticate and voila – your corporate workplace is available and you can perform all of the tasks you are authorized to perform. On any compatible device, phone to PC. All of the hassles associated with keeping everyone’s desktop up to date have just vanished. To a certain extent it’s already happening. Users – the bane of some IT departments’ existence – are out there buying all manner of the latest devices and figuring out how to access corporate email, collaboration and other services. So it’s users that will drive this.

How much can companies save with this approach? It’s hard to quantify because of the wide number of variables. Application development and support costs will temporarily go up but should stabilize and return to where they are once IT departments figure out how to do this. Infrastructure costs should go down. If the average annual cost per seat for a desktop or laptop PC, including hardware, software and support is $1,000, then after the migration to the app model has happened there is no reason why this number shouldn’t be halved. If you have 20,000 seats then that’s a cool ten million. But cost savings won’t be the only driver. Most companies will do this because it’ll be easier, and they can get staff to buy their own equipment – BYOC! And staff can work from anywhere.

So what does this mean? It means that the next big thing in IT is going to be the ‘app-ification’ of business. Once it starts it will be bigger than Y2K, bigger than Cloud. Companies will scramble for expertise, resources, quick wins. Careers will be launched and made. New companies purporting to have the magic answer to appifying your business will come out of nowhere. The big IT companies will reassuringly tell you that they’ve been working on this for years. Who should you trust? Who should you go to? It’s too early to tell. Certainly there is a lot of expertise in India and China in building apps for smart devices. So there will be a lot of work done there. But the business apps will need to integrate with legacy systems, which implies that existing application support teams will need to be involved.

Considerations for Data Centre Migration

The vast majority of applications, once implemented into production, will never move. This is for two simple reasons: 1) It’s difficult and therefore costly and risky, and 2) it’s hard to build a business case to fix something that isn’t broken.

Migrating business systems from one or more locations is a complex undertaking that involves a number of issues including connectivity, application compatibility, shared data, inter-system communications  OS compatibility, hand over of support mechanisms, and more. The level of complexity is usually aligned to the business criticality of the associated systems. If the business can be without the system for a few days, then by all means, unplug it, throw it in the back of a truck, drive it to its new home, plug it in and fiddle with the network connections until it’s back on-line  But for systems where the business cannot tolerate more than a few hours of downtime, or none at all, you’re looking at a whole new level of complexity, time, cost and risk.

At a high level, data centre migration is more often an exercise in replicating application code, data and connections, rather than in migrating physical hardware. There is no one-size-fits-all approach; a number of migration strategies exist. In some cases where the business could survive without the system for two or three days, physical ‘lift and shift’ may be the recommended approach. However, for most systems the approach will be more complex, cost more, take longer but be less disruptive and less risky. These various approaches are outlined later in this article.

Any system migration does introduce risk. The objective of a chosen approach is to balance the cost, time and effort involved against the allowable risk to the business. It is therefore necessary to understand the potential impact to the business should the migrating system be unavailable for a period of time, and to involve the business stakeholders in deciding which approach is appropriate for each business system or grouping of business systems. This implies that a critical objective in the analysis and planning phase of a migration programme is to facilitate the decision making process with clear and business-appropriate inputs. These might include: a) Per system or system grouping, the assessed criticality and impact should the system be unavailable for longer than agreed during the migration, expressed in financial and reputational terms; b) Analysis of the migration approach options, with a recommendation, for each system or system grouping, in terms of time, cost, dependencies, assumptions and risks; c) A high level timeline for the migration, overlaid with the larger ongoing change programmes that may be in-flight or contemplated; d) A business case for the programme that proves quantitative and qualitative benefits; e) An overall programme risk plan covering commercial, financial, resourcing, third party and other foreseeable dependencies.

Some of the issues that need to be addressed when designing a data centre migration:

System groupings – There is usually a strong correlation between the time a business system has been in production and the number of connections it supports and is dependent upon. Many systems do not stand alone and therefore cannot be migrated individually. There may be inter-process and inter-system connections, real-time and batch, that work well on the data centre LAN, but which will run too slowly over a WAN connection during the migration programme. There may also be shared data or gateway dependencies between systems, meaning that both have to move at the same time. For example, a mainframe and a number of midrange systems may share the same channel-attached storage arrays or SAN. If you move the mainframe and SAN, you probably also have to move many of the midrange systems because they need local SAN speeds for data access. There will be many reasons why one system cannot migrate independently of others and so must all be migrated on the same week-end. The analysis to uncover this complexity and create these logical system groupings is perhaps the most difficult piece of work in the data centre migration analysis and planning phase. The success of the overall programme depends upon getting this right: target platform configuration, migration phasing, resourcing, cost estimating, and more.

IP addressing – Older systems, and yes, even more modern ones where there was a lack of architectural discipline, may have hard-coded IP addressing, socket connections, ODBC connections and other direct access methods embedded into code. Also, many data centre architectures use private address ranges, meaning that if more than one data centre is involved in the ‘as-is’ world, a new scheme is needed in the ‘to-be’. These issues just add to the overall complexity. A strategy is needed: clean things up before, during or after the migration?

OS compatibility – Many system environments contain a mix of various flavours and release versions of Unix, Linux, Microsoft Server, Z/OS, iSeries, P-Series, Tandem, and so on. Since the chosen migration approach for many of these systems may be to replicate them in the target location, you may not want to source, install and support old operating systems, some of which may no longer be supported by the vendor. And if you do need to install older operating systems, these may not even run on the newer hardware you want in your shiny new data centre! Replicating onto up-to-date operating systems will almost certainly introduce application compatibility risks that can be avoided by not doing an OS upgrade in conjunction with the migration. In some cases it may be advantageous to perform an OS upgrade prior to migration, in other cases it may be better to move onto an exact replica of the older operating environment and upgrade at a later date. Or you might get lucky and be able to migrate directly onto an updated platform. Discussions with applications providers, testing and trade-off analysis between cost and risk is needed.

Application Remediation – The chances are that applications and databases as they are currently implemented will not port straight away onto the target operating environment. This is for many reasons, for example the application may not be compatible with the newer hardware and up-to-date operating system version in the target environment; or it runs on an old release of a database or 4GL framework. Or you may be migrating the application from a physical to a virtual server. You may even be planning to completely change the underlying architecture of the application to leverage cloud compute, storage, database and communications constructs. For example, today the application may attach directly to an Oracle database but you want to migrate onto a cloud platform and leverage something like Amazon’s RDS (Relational Database Service). Or you may want to consolidate several databases and database servers onto a database appliance. Whatever the reason, and even if you’re migrating to a like environment, you will have to involve inhouse and third party applications providers and support teams, and conduct testing to determine how much, if any, remediation is necessary. Remediation can include steps from a simple re-compile all the way to re-write or even replace. Only thorough analysis and testing can help you decide which is required on an application by application basis. For high level planning purposes, a rule of thumb is that 10% of applications and databases will port with no remediation necessary, 30% will require minor remediation, 20% medium, 20% high levels of remediation, and between 20% and 50% of the applications portfolio will be put on the ‘too difficult’ pile and be migrated more or less as-is onto legacy platforms that you hoped not to be filling up your shiny new data centre with!

Latency – Systems migration may introduce additional round-trip communications time, depending upon where the new data centre is, where the users are, and other factors such as network quality, link bandwidth, router quality, hops, firewalls, etc. Therefore, an important step in the detailed planning phase is testing! Perform proof-of-concept testing between users and the new site. Hire a network specialist to do an in depth analysis of your existing network situation and the implications for migration to the target site. In most cases this will be a be a non-issue, or certainly nothing that cannot be addressed through proper design or the use of WAN optimisation technology.

Time and resources – The limiting factor in data centre migration is not usually hardware or software, but is the amount of change the business can sponsor and absorb at any given time. Migrating a system involves the input and participation of business staff, internal and external application support teams, network and security experts, hardware specialists, software vendors, network providers, project managers and so on. Much of the work can be done by dedicated project staff but there will be dependencies upon people with day jobs who may not see data centre migration as strategically important to the business. The success of the migration will depend upon proper resource planning, dependency management and governance to handle issue resolution and prioritisation. And it will take longer than you think! Rule of thumb: 2 to 4 systems or system groupings per month, following 3 to 6 months of planning. So if you’re moving 20 systems, the whole project may take anywhere from 8 to 16 months.

Migration Strategies

When migrating systems from one data centre to another several migration strategies are available. Choosing the right strategy is influenced by an understanding of the following factors:

  • Criticality of the system or system grouping
  • Financial risk should the system be unavailable
  • Reputation risk should the system be unavailable
  • Number of systems involved in the grouping
  • Allowable downtime
  • OS version and compatibility of application code with newer version
  • Database version and compatibility issues with newer OS, newer hardware
  • Size of the data sets and databases
  • Number and complexity of inter-process and inter-system communications, real-time and batch
  • Nature of inter-system communications, i.e. remote procedure call, socket, asynchronous messaging, message queing, broadcast, multi-cast, xDBC, the use of message brokers and enterprise service buses
  • Amount of remediation necessary to move the application or database onto the target platform
  • Security domain considerations, the use of firewalls, and the associated constraints
  • Availability of spare hardware
  • Existing backup and restore architecture and capability
  • Data replication architecture
  • Deployment topology: standalone vs clustering, single site vs multiple sites
  • Distance between old and new site
  • DR strategy and existing capabilities; are you trying to fix this during migration?
  • Rollback / back-out strategies available
  • Life-cycle for software stack components
  • Software licensing cost implications
  • Life-cycle of the existing hardware
  • Network bandwidth
  • Opportunity windows
  • Protocols being used between different systems
  • Service levels
  • Storage architectures: direct-attached vs. network-attached vs. SAN
  • User community locations and connection methods

As you can see, this is a fairly lengthy list of things to consider during the analysis and planning phase! Certainly gathering as much of this data as possible and immersing yourself in it will help in analysis and choosing the most appropriate approach for each system or system grouping. These approaches are discussed below:

Lift-and-shift: Physical migration is the simplest form of moving a system to a new environment. Switch it off, move it, plug it in and hope it works. Since the system has to power down for the move no data synchronization issues will arise because no new updates could have been made due to system unavailability. This strategy can only be used when there is sufficient time available for the whole process. Where high availability is required with no allowable down time, then clearly this strategy will not work. This is a highly risky approach. If for any reason the shipment of the system fails or the system can not be restarted at the new site or there is a network connection issue, no rollback / back-out option is available other than to ship it back and hope it works at the old site!

Re-host on new hardware: Re-hosting on new hardware mitigates most of the risks associated with the previous strategy of ‘lift and shift’, however the cost is much higher, since you’ll need to buy all new hardware. It may be possible to buy some new hardware and re-purpose the older hardware for the next wave of migration, but this will depend upon how old and fit-for-purpose the hardware is. Installing new hardware usually requires installation of the latest OS and other software, since often the old OS may not run on the newer hardware. This may lead to extra licensing requirements for upgrading other software components in the management stack. Porting to new hardware can have the advantage that the hardware is usually faster and can support more applications, potentially reducing the amount of boxes needed and reducing the overall software portfolio required, thus reducing licensing costs. The risks involved in this strategy are lower than with the first; a rollback / back-out solution should be built into the design and tested thoroughly. Compatibility between the application and the new software stack on the new hardware can be fully tested before cut over is done. Data synchronization can be an issue since the data needs to be moved from the old to the new environment while the old system is still processing updates. There are various ways to solve this synchronization issue, such as asynchronous replication, log-shipping, or just cutting off any further updates on the older system, performing the data migration and cutover, and carrying on on the newer system.

Swing kit: Using temporary hardware is similar to the re-hosting strategy except it involves double the effort. This is because, once the application and data are moved onto the temporary hardware, the original hardware is moved and the swing kit is swapped out. This strategy can use pre-production testing or development hardware, or hardware borrowed or leased from suppliers. It doesn’t matter as long as it is suitable for the migrating system or system cluster. Wherever possible you should avoid having to migrate the applications and data twice. The associated time and cost, as well as the additional risk to the business, are likely higher than purchasing new hardware in the first place. But there will be scenarios where borrowed kit is a viable option, usually where the box is quite large like a mainframe or Superdome.

Move DR first: This strategy involves moving the disaster recovery hardware first and then migrating onto it. This strategy will work if there is a fully configured and available DR system, and if the business can tolerate the risk of downtime and lost data during the time the DR system is being moved and tested.

Half cluster migration: This strategy works where the migrating system is currently deployed in a high availability cluster such that it will continue to support 100% availability if one half has failed. Take the redundant half down, move it, bring it up in the new site and re-attach it to the old site, then take down and move the other half. There are a number of dependencies and potential issues associated with this strategy, mostly to do with the fact that many high availability architectures have a live-live configuration for application servers but the database is in live-backup mode, meaning the application servers at the new site would have to access the database at the old site. This may work but usually going from SAN-attached storage to WAN-attached storage is too much of a penalty to pay and application performance degrades unacceptably.

Many to one: This is a derivation of the re-hosting on new hardware strategy. Quite often the new hardware is bigger and faster, and so through hardware sharing or virtualization you may be able to re-host multiple applications that used to run on individual servers onto a shared physical environment, reducing cost and complexity.

Virtualised Image Movement: If you already have a number of virtualised server images, then you may be able to migrate these fairly easily. But before you set up the new virtual server environment and start moving images, however, you’re going to have to consider what the applications on the server are doing, who or what is accessing them, what other systems are called upon by the application, how the application accesses data and whether this data needs to migrate before, during or afterwards. If you move the virtual image but not the database because other applications need to access the database, then the migrating application will need to access the database over a wide area connection. Will this work? Hmmm, not as easy as it seems then!

Cloud: there, I said it, the C word. The state of the art in private and public cloud offerings has advanced tremendously in the last few years, to the point where most organisations should seriously consider migration of suitable workloads to cloud computing. This will involve some work in the applications space as the applications will likely have to be rebuilt. But if your objective is to get out of an existing data centre, then moving to a new data centre may not always be the only option.

If you have a data centre migration project and would like some help in planning it or running it, please get in touch.

Thoughts on the new HP Public Cloud

As a user of Amazon Web Services (AWS) I thought I’d give the new HP Public Cloud a try. First of all, my use case is pretty simple: I’m a casual user who wants to be able to fire up a server now and then in various regions around the world to host my son’s Minecraft world, to run a remote desktop, or to run a remote VPN server. I want to be able to turn them on when I need them and turn them off when I don’t, without losing the server, its software, configuration or data.

AWS lets me fire up a server in about 5 countries: Brazil, USA, Ireland, Singapore, Tokyo. I can choose a Linux or Windows operating system, with a vast array of pre-configured server and software images available. Once a server is up and running, I can suspend it and not incur the hourly charge. When I need the server again, I can start it up within seconds. It’s a great service, very reliable and very cheap. I also use Amazon Simple Storage Service (S3) and Simple Email Service (SES) which are also excellent and cheap.

So when HP announced their Public Cloud Beta, I was excited to dip my toe in the water. Here is what I found:

  1. It’s pretty simple to start up a server, although at this stage they only have a few regions in the USA.
  2. They give you a public IP address with your server, which is a good thing. Amazon gives you a long string they call ‘Public DNS’ which works fine, except when connecting from the Windows Minecraft client, but if you want a standard public IP address they charge you 1 cent per hour. So far so good for HP.
  3. Pricing is vary competitive with Amazon. You’d think they would try to undercut AWS pricing to gain some market share, especially when AWS has so many more features and global regions, but hey, it’s still pretty cheap.
  4. Here’s the deal-breaker, at least for me: HP Public Cloud offers no easy way to suspend a server when you don’t need it, and come back later in a few hours, days, weeks or even months, restart the server and pick up where you left off without having incurred any charges except a very minimal charge for storing the virtual image. On AWS I can do this with two mouse clicks. On HP it’s still possible, but you have to be a command line guru and run complicated scripts to take a snapshot of your server and store it for later use. Not easy and not user friendly. Maybe this won’t matter to users who have no need to suspend servers and later restart them. But for me it’s a big deal and means I won’t be using this service.