Virtualisation, the foundation for all things cloud and the core to many medium to large enterprise’s infrastructure. There is no shortfall in getting advice from vendors on how and why you should adopt virtualisation across your enterprise. Over the last decade we’ve seen numerous variants to:
- Server virtualisation.
- Storage virtualisation.
- Desktop virtualisation.
- Application virtualisation.
- Network virtualisation.
Coupled with combinations of and evolution to, technology vendors are working on software defined networks and virtualised data centres just to further extol the benefits of a software defined world. What is forgotten in all this technical bliss is that at the end of the day, the buck stops somewhere (technically speaking) and someone has to know how it all works so when things go wrong a saviour will come and rescue the situation. As each layer of abstraction is introduced with virtualisation, understanding what it means to the previous layer is important for performance and supportability. Lets briefly explore how virtualisation affects the relationship between technical layers to demonstrate why more than ever, organisations need technical SME’s who can work across the different layers and understand the glue that holds everything together.
Now if you happen to be a generation older than me, you’ll know about IBM’s mainframe and its virtualisation capabilities, but for the bulk of the IT fraternity, it was VMware’s products that have brought virtualisation to the masses. The interesting thing about server virtualisation is that fundamentally, processor, memory and I/O interfaces are not really ‘owned’ by the virtual machine’s operating system anymore. Where traditionally, you would configure the operating system to manage these resources, you don’t really need to anymore. You have in effect moved the control and management of hardware’s physical resources from the server (virtual machine now) to the virtualising platform. Yet for all this technical bliss, I’m sure if you look hard enough at your operating system configuration, you’ll find all sorts of configuration parameters set that are not ‘maximum performance’ settings. Most vendors have guides regarding performance tuning of their operating systems such as Microsoft and RedHat. Hardware vendors also provide guidelines regarding performance characteristics of their kit such as IBM, HP.
In the good ‘ol days, hard drives were installed and racked with their server. Then SAN’s became the defacto storage container. Yet problems remained with SAN’s. Inefficient slicing and dicing of the hundreds of hard drives proliferated. Server virtualisation vendors tried to assist by offering “virtualised” storage at the server layer and created features such as thin provisioning to improve utilisation efficiencies. But these can only go so far. Thus enter the big SAN vendors such EMC, NetApp, IBM who now offer virtualisation of their SAN and in the process espouse massive gains in enterprise wide storage efficiencies. Yet, in much the same way I alluded to previously with Server Virtualisation, there will be many cases where the organisation has not gone back and revisited their server (physical or virtualisation platform) to see what should be changed. For instance, having your VMware platform configured for thin provisioning of storage and your actual SAN configured with thin provisioning. Think about it.
The panacea to managing all those workstations and having a consistent desktop environment or another headache worthy of significant thinking about performance considerations? Desktop virtualisation is yet another wondrous technology but managing it, understanding it and knowing how to tackle problems is another ball game. Desktop Virtualisation is enabled through a combination of physical servers ‘being the desktop’, high performing IOPS on your SAN and solid network backbone to facilitate all that data transfer.
So desktop virtualisation isn’t your thing, yet you may be considering application virtualisation to help with licensing, application management or even multi-version support for applications. Application virtualisation isn’t without its own hurdles such as discovering your application isn’t suitable for virtualisation or discovering the sheer effort required to package an application for virtualisation through sequencing, sorting out dependencies and writing any necessary deployment scripts puts a dent in your ROI.
Traditionally, network virtualisation is all about the use of virtual LANs and virtual routing and forwarding. Interestingly though, throw in some virtual servers, virtual switches, vLAN trunks, maybe some virtual firewalls (possibly even a host based firewall) along with some edge firewalls and load balancers in the middle and all of a sudden when something brakes and the server doesn’t seem to be responding to a client’s requests or an upstream system can no longer see a downstream system on the network, it must be the network.
Large organisations typically break up their support teams across different technology domains. Similarly, many support staff specialise on those technology domains (storage, servers, networks, application development, end user computing) Unfortunately, there are times when there are issues from an end-user perspective that manifests itself in many weird and wonderful ways. Take the hypothetical example of a web-based business application that suffers from performance issues. The trouble ticket is sent to the network team because the application seems to work most of the time so it must be the network causing trouble. The network team investigate and after scouring their reports and logs from the timeline involved they see no anomalous activity and bump it back to the applications team. The applications team check their release schedule and although they released a newer version of the application several weeks ago, their testing passed and they haven’t had any issues until now. So now it is bumped to the server guys to check it out. The server guys check the application and discover it’s actually four virtual servers across their VMware cluster. Now these servers are memory hogs but from a VMware perspective, everything seems to be running ok so then they check the actual VM’s operating system logs and again there is nothing particularly obvious. So they bump the ticket back to the application team. They then decide to investigate a bit further and following several more weeks of monitoring, waiting for repeat incidents they discover that the problem appears to occur when the user is accessing a new feature of the latest release. This feature is used by different clients at different times during their clients business processes. Further investigation of what this feature actually does is runs a bunch of queries against some database tables. So they bump the ticket to the database team. The database team run their reports and discover that some indexes weren’t quite right. so they fix the indexing, make sure the statistics are up to scratch and problem solved – right ? Well, not quite. You see some clients are still complaining. There appears to be no standard pattern so around the trouble ticket merry go round we go. It turns out after several more checks and balances, that multiple adjustments were made:
- Different clients were coming through different network touch points operating on different VRF’s. However, the routing for these users came through different load balancers which happened to use different load balancing algorithms on them such that it created the effect that a different group of users had ‘better access’ to the application so when the back-end servers where reaching certain resource peaks, the different group of load balancers would direct transactions to different VM’s.
- It so happened that the four virtual machines are the result of a P to V process a while ago and all seemed good afterwards so no further changes were made. The VMs were actually hosting multiple web applications each with a different JVM. Running multiple JVM’s is not considered good practice in the VM world. It is usually considered a better practice to increase the HEAP size. (It wasn’t an issue until new software versions were released).
- Because the underlying physical infrastructure support the virtual machines is so much larger than the hardware the application servers were located on previously, the organisation adjusted the number of vCPUs available to the virtual machines to align with the NUMA as well as the application guys changing the configuration of the JVMs to leverage additional threads.
- The organisation could have also decided to create more virtual machines, adding them to the load balancers and adjust the VM server sizing (vCPU, Memory and JVM Heap sizes) to increase performance.
- It so happened that the database was a virtualised Oracle RAC instance and was resident on the same physical server cluster. None of the recommended guidelines were applied to Oracle.
Whilst this example does not cover all the virtualisation technologies I have outlined, I hope it serves to demonstrate that solving problems cannot always be achieved by a single team fixing a single problem. Understanding the environment in which each of the technology domains operates in is just as important and this can only be achieved by having the different support teams working together and at least having an awareness that how they configure their technical domain it can (and does) have a cascading effect on the other technical domains. Specialists from all technical domains must collaborate and work together if the organisation is to realise the best that virtualisation has to offer.