Internet Research Group
June 2005
IRG has focused on the branch office as one of the most interesting perspectives from which to view the evolution of modern IT. The Internet has made it possible to deliver critical business applications anywhere in the modern world, not just to headquarters users. Universal access to applications in turn has enabled important changes in business process efficiency and effectiveness (e.g. modern CRM or supply chain applications). All of a sudden the branch office is a critical part of the IT fabric because that is where much of the real business of the company occurs. At the same time there is intense pressure to keep IT spending in check, and to get more out of the money spent. Computer and communication technology continues to improve and seems to provide the raw horsepower to bring first class IT support to the branch office, but only if we can keep from being overcome by the complexity of the solutions and keep application availability high while minimizing operational costs. We can see all these important but mutually conflicting forces being sorted out in innovative branch office IT solutions.
It costs very little to richly provision a branch office with servers and storage. The issue is how to do this without diminishing application reliability (too many moving parts) or being overwhelmed by administration and service costs (having any dedicated expertise in each branch office more than negates any possible benefit from the local technology). It is tempting to think of solving these problems by moving all the system and storage resources back into a data center and providing only remote access in the branch office, but despite the continuing improvement in WAN bandwidth and price, a T-1 link back to headquarters is no substitute for local storage and processing.
How can we minimize the operational costs of having powerful system technology in branch offices so that the benefits of local technology can be cost-effectively achieved with high application availability? How can all that technology be engineered so that it requires almost no local administrative or service support? In other words, how can we use telecommunications cost-effectively and deploy enough storage and server capacity to a branch office to make applications sing without ever having to send people out to configure or service it? How can all the provisioning and management be done remotely? Clearly the solution will leverage the communications link to the remote site (the critical element of minimizing support costs), but doing so requires solving a number of problems
- There has to be enough system redundancy to keep applications working in the light of hardware failure without requiring expensive emergency service calls.
- We need robust remote access to the system in order be able to perform management functions remotely. Neither the primary communication link failing nor a server failing should require a service call.
- Being able to service the system remotely shouldn't depend on complex applications and operating system working correctly. As long as there is some form of communications in place to the remote site and we have some hardware working we should be able to run enough software to diagnose the problem remotely and make whatever configuration or software component changes are required to restore normal operation quickly.
Anyone who uses a modern PC probably realizes that today's computer technology is very reliable. When a system crashes or hangs the probable cause is a software failure -- running into circumstances where a design or implementation logic error is encountered. When our laptop hangs we try to kill the offending program and if that doesn't work we reboot the system by powering it off and on. These software crashes are annoying to be sure, but in the case of remote office IT they are disastrous because you can't assume that anyone in the office knows anything useful enough to really help and sending a service expert to the site is disastrously expensive.
Having enough redundant system capacity to survive infrequent hardware failures doesn't add much to the system cost. Computer, storage and communications technology is so inexpensive that it doesn't cost much to have a spare, and considering the intrinsic reliability of the hardware it doesn't take a lot of redundancy to provide a high probability of enough hardware working to avoid expensive emergency service calls. The art in system redundancy is assuring to the degree possible that a failure in one piece of hardware (a power supply module or disk drive for example) doesn't cause failures in other hardware or software.
Providing robust communication to the remote site is another interesting engineering challenge. To have robust communications the remote site needs to be carefully and multiply connected so that a failure in the basic communications link (e.g. a failure in the T-1 line) doesn't make any solution impossible. A DSL link runs over a conventional phone line so it can be backed up with a dial-up modem connection on the underlying telephone line, or with an entirely separate DSL (cable modem, frame relay) connection. With suitable link redundancy in place you still have to carefully engineer the solution so that when a link does fail (or more likely when it hangs) it doesn't lock up the whole system and render useless the redundant communication path. When the primarily link fails the communication subsystem must be designed such that both sides of the link fall back to the same default back up link and logically sync up again quickly, and the solution can't require that the application server is up and running.
The bigger problem with remote diagnosis and service is to keep from being dependent on the correct operation of the tens of millions of lines of source code that collectively constitute the operating system and key applications. Even if the operating system and Fortunately these aren't brand new problems that have never been encountered before. Interesting examples of solutions are available in the mainframes of a decade or more and from large telephone switches as well. The essential element of these solutions is to have a very simple, reliable and robust maintenance system built into the system. In a mainframe it was a diagnostic scan system - a means by which the logic of the mainframe can be diagnosed and a hardware problem isolated even if the system is not healthy enough to run diagnostic software. In a modern PC server or network device or subsystem the same kind of arrangement can be easily imagined - an additional microprocessor running simple firmware, capable of viewing and controlling the various elements of the system independently of the health of the central processor and especially independently of the health of the resident operating system.
To provide an affective platform for branch office information systems (computers, storage and communications) the whole platform has to be designed for a management subsystem, not just the CPU. The management system must have reliable access to I/O busses and device control logic even if the core operating system is tied in knots.
The purpose of the management subsystem is three-fold:
- To provide a platform for robust remote operation, providing a regular "heart beat" indicating proper operation and then providing a robust means of analyzing and diagnosing problems should they occur.
- To provide a remote means for restarting all elements of a remote system should they become hung for whatever reason.
- To serve as the underlying platform for providing software component updates as needed to keep the software up to date.
The final purpose - managing software updates - is worth spending a little attention on. In this context remember than all the elements of the IT fabric are software intensive. Software updating is as much a problem for routers and firewalls as it is for the operating system of a PC. Modern software evolves on an ongoing basis driven by the need to fix software defects and to provide desired new functionality. With a potent remote management system and a suitably modular operating system we have the ability to do full or partial software updates (including the operating system and communications facility) remotely without suffering downtime.
Summary
The ideal way to enable high performance reliable information systems in branch offices is to find a way to leverage modern communications to put modern server and storage technology in the branch office. For this to be done cost-effectively means must be found to operate and maintain those systems from remote centers without any requirement for expertise in the branch office. Such solutions are possible although they require careful re-engineering of the applications, operating system and the underlying hardware and communications platform. Fortunately good models of effective solutions exist in mainframe and telephony systems. The economic reward of providing excellent branch office application support cost-effectively justifies the significant re-engineering effort required.
About The Internet Research Group
www.irg-intl.com
The Internet Research Group (IRG) provides market research and market strategy services to product and service vendors. IRG services combine the formidable and unique experience and perspective of the two principals: John Katsaros and Peter Christy, each an experienced industry veteran. The overarching mission of IRG is to help clients make faster and better decisions about product strategy, market entry, and market development. Katsaros and Christy recently published a book on high tech business strategy Getting It Right the First Time - Praeger, 2005 www.gettingitrightthefirsttime.com.