The vast majority of applications, once implemented into production, will never move. This is for two simple reasons: 1) It’s difficult and therefore costly and risky, and 2) it’s hard to build a business case to fix something that isn’t broken.
Migrating business systems from one or more locations is a complex undertaking that involves a number of issues including connectivity, application compatibility, shared data, inter-system communications OS compatibility, hand over of support mechanisms, and more. The level of complexity is usually aligned to the business criticality of the associated systems. If the business can be without the system for a few days, then by all means, unplug it, throw it in the back of a truck, drive it to its new home, plug it in and fiddle with the network connections until it’s back on-line But for systems where the business cannot tolerate more than a few hours of downtime, or none at all, you’re looking at a whole new level of complexity, time, cost and risk.
At a high level, data centre migration is more often an exercise in replicating application code, data and connections, rather than in migrating physical hardware. There is no one-size-fits-all approach; a number of migration strategies exist. In some cases where the business could survive without the system for two or three days, physical ‘lift and shift’ may be the recommended approach. However, for most systems the approach will be more complex, cost more, take longer but be less disruptive and less risky. These various approaches are outlined later in this article.
Any system migration does introduce risk. The objective of a chosen approach is to balance the cost, time and effort involved against the allowable risk to the business. It is therefore necessary to understand the potential impact to the business should the migrating system be unavailable for a period of time, and to involve the business stakeholders in deciding which approach is appropriate for each business system or grouping of business systems. This implies that a critical objective in the analysis and planning phase of a migration programme is to facilitate the decision making process with clear and business-appropriate inputs. These might include: a) Per system or system grouping, the assessed criticality and impact should the system be unavailable for longer than agreed during the migration, expressed in financial and reputational terms; b) Analysis of the migration approach options, with a recommendation, for each system or system grouping, in terms of time, cost, dependencies, assumptions and risks; c) A high level timeline for the migration, overlaid with the larger ongoing change programmes that may be in-flight or contemplated; d) A business case for the programme that proves quantitative and qualitative benefits; e) An overall programme risk plan covering commercial, financial, resourcing, third party and other foreseeable dependencies.
Some of the issues that need to be addressed when designing a data centre migration:
System groupings – There is usually a strong correlation between the time a business system has been in production and the number of connections it supports and is dependent upon. Many systems do not stand alone and therefore cannot be migrated individually. There may be inter-process and inter-system connections, real-time and batch, that work well on the data centre LAN, but which will run too slowly over a WAN connection during the migration programme. There may also be shared data or gateway dependencies between systems, meaning that both have to move at the same time. For example, a mainframe and a number of midrange systems may share the same channel-attached storage arrays or SAN. If you move the mainframe and SAN, you probably also have to move many of the midrange systems because they need local SAN speeds for data access. There will be many reasons why one system cannot migrate independently of others and so must all be migrated on the same week-end. The analysis to uncover this complexity and create these logical system groupings is perhaps the most difficult piece of work in the data centre migration analysis and planning phase. The success of the overall programme depends upon getting this right: target platform configuration, migration phasing, resourcing, cost estimating, and more.
IP addressing – Older systems, and yes, even more modern ones where there was a lack of architectural discipline, may have hard-coded IP addressing, socket connections, ODBC connections and other direct access methods embedded into code. Also, many data centre architectures use private address ranges, meaning that if more than one data centre is involved in the ‘as-is’ world, a new scheme is needed in the ‘to-be’. These issues just add to the overall complexity. A strategy is needed: clean things up before, during or after the migration?
OS compatibility – Many system environments contain a mix of various flavours and release versions of Unix, Linux, Microsoft Server, Z/OS, iSeries, P-Series, Tandem, and so on. Since the chosen migration approach for many of these systems may be to replicate them in the target location, you may not want to source, install and support old operating systems, some of which may no longer be supported by the vendor. And if you do need to install older operating systems, these may not even run on the newer hardware you want in your shiny new data centre! Replicating onto up-to-date operating systems will almost certainly introduce application compatibility risks that can be avoided by not doing an OS upgrade in conjunction with the migration. In some cases it may be advantageous to perform an OS upgrade prior to migration, in other cases it may be better to move onto an exact replica of the older operating environment and upgrade at a later date. Or you might get lucky and be able to migrate directly onto an updated platform. Discussions with applications providers, testing and trade-off analysis between cost and risk is needed.
Application Remediation – The chances are that applications and databases as they are currently implemented will not port straight away onto the target operating environment. This is for many reasons, for example the application may not be compatible with the newer hardware and up-to-date operating system version in the target environment; or it runs on an old release of a database or 4GL framework. Or you may be migrating the application from a physical to a virtual server. You may even be planning to completely change the underlying architecture of the application to leverage cloud compute, storage, database and communications constructs. For example, today the application may attach directly to an Oracle database but you want to migrate onto a cloud platform and leverage something like Amazon’s RDS (Relational Database Service). Or you may want to consolidate several databases and database servers onto a database appliance. Whatever the reason, and even if you’re migrating to a like environment, you will have to involve inhouse and third party applications providers and support teams, and conduct testing to determine how much, if any, remediation is necessary. Remediation can include steps from a simple re-compile all the way to re-write or even replace. Only thorough analysis and testing can help you decide which is required on an application by application basis. For high level planning purposes, a rule of thumb is that 10% of applications and databases will port with no remediation necessary, 30% will require minor remediation, 20% medium, 20% high levels of remediation, and between 20% and 50% of the applications portfolio will be put on the ‘too difficult’ pile and be migrated more or less as-is onto legacy platforms that you hoped not to be filling up your shiny new data centre with!
Latency – Systems migration may introduce additional round-trip communications time, depending upon where the new data centre is, where the users are, and other factors such as network quality, link bandwidth, router quality, hops, firewalls, etc. Therefore, an important step in the detailed planning phase is testing! Perform proof-of-concept testing between users and the new site. Hire a network specialist to do an in depth analysis of your existing network situation and the implications for migration to the target site. In most cases this will be a be a non-issue, or certainly nothing that cannot be addressed through proper design or the use of WAN optimisation technology.
Time and resources – The limiting factor in data centre migration is not usually hardware or software, but is the amount of change the business can sponsor and absorb at any given time. Migrating a system involves the input and participation of business staff, internal and external application support teams, network and security experts, hardware specialists, software vendors, network providers, project managers and so on. Much of the work can be done by dedicated project staff but there will be dependencies upon people with day jobs who may not see data centre migration as strategically important to the business. The success of the migration will depend upon proper resource planning, dependency management and governance to handle issue resolution and prioritisation. And it will take longer than you think! Rule of thumb: 2 to 4 systems or system groupings per month, following 3 to 6 months of planning. So if you’re moving 20 systems, the whole project may take anywhere from 8 to 16 months.
When migrating systems from one data centre to another several migration strategies are available. Choosing the right strategy is influenced by an understanding of the following factors:
- Criticality of the system or system grouping
- Financial risk should the system be unavailable
- Reputation risk should the system be unavailable
- Number of systems involved in the grouping
- Allowable downtime
- OS version and compatibility of application code with newer version
- Database version and compatibility issues with newer OS, newer hardware
- Size of the data sets and databases
- Number and complexity of inter-process and inter-system communications, real-time and batch
- Nature of inter-system communications, i.e. remote procedure call, socket, asynchronous messaging, message queing, broadcast, multi-cast, xDBC, the use of message brokers and enterprise service buses
- Amount of remediation necessary to move the application or database onto the target platform
- Security domain considerations, the use of firewalls, and the associated constraints
- Availability of spare hardware
- Existing backup and restore architecture and capability
- Data replication architecture
- Deployment topology: standalone vs clustering, single site vs multiple sites
- Distance between old and new site
- DR strategy and existing capabilities; are you trying to fix this during migration?
- Rollback / back-out strategies available
- Life-cycle for software stack components
- Software licensing cost implications
- Life-cycle of the existing hardware
- Network bandwidth
- Opportunity windows
- Protocols being used between different systems
- Service levels
- Storage architectures: direct-attached vs. network-attached vs. SAN
- User community locations and connection methods
As you can see, this is a fairly lengthy list of things to consider during the analysis and planning phase! Certainly gathering as much of this data as possible and immersing yourself in it will help in analysis and choosing the most appropriate approach for each system or system grouping. These approaches are discussed below:
Lift-and-shift: Physical migration is the simplest form of moving a system to a new environment. Switch it off, move it, plug it in and hope it works. Since the system has to power down for the move no data synchronization issues will arise because no new updates could have been made due to system unavailability. This strategy can only be used when there is sufficient time available for the whole process. Where high availability is required with no allowable down time, then clearly this strategy will not work. This is a highly risky approach. If for any reason the shipment of the system fails or the system can not be restarted at the new site or there is a network connection issue, no rollback / back-out option is available other than to ship it back and hope it works at the old site!
Re-host on new hardware: Re-hosting on new hardware mitigates most of the risks associated with the previous strategy of ‘lift and shift’, however the cost is much higher, since you’ll need to buy all new hardware. It may be possible to buy some new hardware and re-purpose the older hardware for the next wave of migration, but this will depend upon how old and fit-for-purpose the hardware is. Installing new hardware usually requires installation of the latest OS and other software, since often the old OS may not run on the newer hardware. This may lead to extra licensing requirements for upgrading other software components in the management stack. Porting to new hardware can have the advantage that the hardware is usually faster and can support more applications, potentially reducing the amount of boxes needed and reducing the overall software portfolio required, thus reducing licensing costs. The risks involved in this strategy are lower than with the first; a rollback / back-out solution should be built into the design and tested thoroughly. Compatibility between the application and the new software stack on the new hardware can be fully tested before cut over is done. Data synchronization can be an issue since the data needs to be moved from the old to the new environment while the old system is still processing updates. There are various ways to solve this synchronization issue, such as asynchronous replication, log-shipping, or just cutting off any further updates on the older system, performing the data migration and cutover, and carrying on on the newer system.
Swing kit: Using temporary hardware is similar to the re-hosting strategy except it involves double the effort. This is because, once the application and data are moved onto the temporary hardware, the original hardware is moved and the swing kit is swapped out. This strategy can use pre-production testing or development hardware, or hardware borrowed or leased from suppliers. It doesn’t matter as long as it is suitable for the migrating system or system cluster. Wherever possible you should avoid having to migrate the applications and data twice. The associated time and cost, as well as the additional risk to the business, are likely higher than purchasing new hardware in the first place. But there will be scenarios where borrowed kit is a viable option, usually where the box is quite large like a mainframe or Superdome.
Move DR first: This strategy involves moving the disaster recovery hardware first and then migrating onto it. This strategy will work if there is a fully configured and available DR system, and if the business can tolerate the risk of downtime and lost data during the time the DR system is being moved and tested.
Half cluster migration: This strategy works where the migrating system is currently deployed in a high availability cluster such that it will continue to support 100% availability if one half has failed. Take the redundant half down, move it, bring it up in the new site and re-attach it to the old site, then take down and move the other half. There are a number of dependencies and potential issues associated with this strategy, mostly to do with the fact that many high availability architectures have a live-live configuration for application servers but the database is in live-backup mode, meaning the application servers at the new site would have to access the database at the old site. This may work but usually going from SAN-attached storage to WAN-attached storage is too much of a penalty to pay and application performance degrades unacceptably.
Many to one: This is a derivation of the re-hosting on new hardware strategy. Quite often the new hardware is bigger and faster, and so through hardware sharing or virtualization you may be able to re-host multiple applications that used to run on individual servers onto a shared physical environment, reducing cost and complexity.
Virtualised Image Movement: If you already have a number of virtualised server images, then you may be able to migrate these fairly easily. But before you set up the new virtual server environment and start moving images, however, you’re going to have to consider what the applications on the server are doing, who or what is accessing them, what other systems are called upon by the application, how the application accesses data and whether this data needs to migrate before, during or afterwards. If you move the virtual image but not the database because other applications need to access the database, then the migrating application will need to access the database over a wide area connection. Will this work? Hmmm, not as easy as it seems then!
Cloud: there, I said it, the C word. The state of the art in private and public cloud offerings has advanced tremendously in the last few years, to the point where most organisations should seriously consider migration of suitable workloads to cloud computing. This will involve some work in the applications space as the applications will likely have to be rebuilt. But if your objective is to get out of an existing data centre, then moving to a new data centre may not always be the only option.
If you have a data centre migration project and would like some help in planning it or running it, please get in touch.