Nov 8, 2013

How we chose to implement the data center move (part 2)

Continued from Part 1

Dealing with our new hosts
From day 1, I thought this new high-tech relationship was not going to last too long. I was proven wrong on my initial reactions. The vendor provided their "A" team. They were very quick at turning over the servers in the PROD environment.
We ran into a few hick-ups (a couple of defective parts network configuration and reordering problems). Other than that we had a relatively good start. I was impressed. Over time me level of confidence grew as we were knocking down issues that would come up. One by one the servers of our data center environment were being turned: VM hosts, Databases, Web Servers, App Servers, etc. We had to step up and install our software and unit test it.

Building the environment
What a ride! The first thing the team tried to figure out was how to organize the 1001 steps we needed to do and in what order. We decided to organize the "punch list" in the order of dependencies for our environment to work. We decided to make sure that for everything to work correctly at the new location we needed (an over simplified list):
  1. The network hardware to be operational and available (both internal and DMZ networks)
  2. initial estimated SAN space was ready to go
  3. The VM Host was installed
  4. The Domain Controllers to be installed and configured
  5. Database servers operational
  6. Application Servers operational
  7. Web Servers operational
  8. Virtual client machines operational
Once all those components were in place we were ready to start our unit tests. One of the benefits of the frameworks we use in our environment allowed us to, over time, create a battery of software unit tests (over 700 and counting). This unit test library allowed us to test in a semi-automatic fashion a decent chunk of the functionality that our QA/Testers were going to cover. Our unit test suite gave us better confidence of overall progress.

Scheduling Testing
In our case, we have an in-house development shop. At the time it was all on-shore development staff. Therefore, in addition to the obvious PROD environment, we also had to take into consideration our QA and DEV environments for this move.  

It was challenging to schedule testing around development timelines and scheduled releases of the core apps that we maintain. The time and schedules of Business users, Testers, Developers and QA process colleagues had to be orchestrated just right, almost like the orchestration of a big marching band. What I'm getting at, in so many words, is that you better get a hold of a very good (perhaps even great) Project Manager.

Make sure this person is very seasoned in that trade and most of all, if by any chance there are any treacherous political waters in your organization, do everyone a favor and make sure your PM is aware of them. You most definitively don't want this person to jump into your data center migration project without a nav-chart!

Scheduling User Acceptance
Write this down in an 8.5 x 11 in. piece of paper, in nice big bold font: Some things will go wrong. After you do that, then put it away in place that you can have easy access to. This will serve as a reminder of this fact.

Now the trick... no... better yet... the art of this matter is to communicate, with your utter most politeness, to the business users this inescapable condition. Many of them will understand the situation and maybe even draw up a page like that for themselves. However there is a fraction, about 3.45% of your audience who will not. Make sure you order extra patience as you deal with them. To those folks, you will need to have that discussion over, and over, and over... well, you get the picture.

The big milestone here is to get the group to agree that the mission is a "GO". Resist with all your might any pressure to commit to executing the cut over to the new Data Center if you have division and difference of opinions among the team members. For those who have any reservations on committing to a "go" ask them the following, as needed (some of the following is borrowed from the Core Protocols - check it out at the McCarthyShow.com):
  • "what is the technical reason why you oppose the 'go ahead'" (or any other aspect of the project)
  • "in a scale of 1 to 10 (10 being 'I agree with everything'), how would you rate this issue"
    • as they are answering, do not let them use comments like the following:
      • "I can't put my finger on it, but something is not right"
      • "I have a bad feeling about this"
      • "We've never done it this way before" 
      • Respectfully, point out that those are not "technical reasons"
    • a follow up question to that is:
      • "What would it take for you to give it a 10?"
  • To any one offering negativity or annoying whining comments with something like "I knew this would happen / fail", immediately counter with
    • "If you knew this, why did you keep quiet?" 
Launch Day
Make sure you don't pick any of these points in time for your launch day (in no particular order):
  • Month-end Ledger Closing
  • Quarter-end Ledger Closing
  • Year-end Ledger Closing
  • If happily married, your anniversary date or your spouse's birth day
As all your processes and services cut over to the new Data Center, you can breathe a sigh of relief (assuming no major show-stopper issues appear) around the following points in time:
  • after the first 24 hours
  • after the first week
  • after the first month-end closing
  • after the first quarter-end closing
  • after the first year-end closing 
Dealing with the fallout
As we mentioned earlier, some thing(s) will go wrong. 
Some pointers that can help:
  • Be prepared
  • As bugs and issues star coming, use the same approach that you developed for the staged tests
    • record
    • triage 
    • assign severity
    • assign resource to oversee resolution
    • PM must periodically check for status changes
  • We used a SharePoint site for tracking / monitoring issues throughout the project. It saved our rear ends many a times
Setting up the Disaster Recovery
This portion is slightly easier since you can decide the level of involvement:
  • no DR site
    • I would not recommend this
    • No Cost Benefit Analysis (CBA) needed
    • Make sure you also keep, at all times, a polished version of your resume, just as a precaution
  • DR site-lite
    • The minimalist approach to DR configuration
    • CBA would be great
  • SLA-based DR site configuration
    • Based on the SLAs that are important to business, prioritize the configuration of your DR site
    • Dude, you gotta do a CBA for this
  • Mirror the configuration of your Data Center at a separate building
    • This means you have A LOT of resources for this project... so go to town!
    • This also means that you will pay through the nose for a bunch of expensive resources that will be sitting idle for a good chunk of time
    • "CBA? we don't need no stinkin' CBA!"
Lining up the Decommissioning and Tear-down of your Legacy Data Center
If you have a great PM on the project, there will already a crew scheduled to stop by your legacy Data Center room and tear it down the first week after the first month-end closing. Ours got tore down in about 4 days. It went very fast.

Closing the Project
Here are some suggestions:
  • Take some time off
  • Recognize team members on professional sites when appropriate
  • If there was ever a reason to go out with a big team to a nice place for a meal, please do! Preferably, after the first 7 days of the data center running smoothly.
Happy moving!

No comments: