Basis of
the IT Operations & Services Methodology
Implementing
Best Practices, Ensuring Customer Success
IT space
comprises of four major activity phases:
- Planning
for new functionality and applications
- Building
the new features and/or applications
- Deployment
of the new functionality
- Operations
on the new systems
GrayCell's Application Development
and Support methodology covers phases 1
and 2. The IT Operations and Services methodology
covers activities in phases 3 and 4.
A structured
process approach to various types of activities
in during Deployment and Operations phases
of the IT systems lays down a framework
for capturing metrics and bring visibility
to these operations. This helps in justifying
IT costs in terms of business value to the
organization, as shown here:
IT
Function
|
Activity Cost
|
Benefit
|
Help desk
|
Cost per incident
per user
|
Ability
to build help desk staff increases into
project budgets (capital expenditures)
based on estimates of new user/new incident
volumes, thus preventing productivity
losses when users suffer system- or
service-related work stoppages and help
desk is not adequately staffed to handle
the request volumes |
System
administration |
Cost per
change type (major, standard, and so
on) |
Ability
to provide operational cost estimates
to keep applications/systems up-to-date
once in production |
Monitoring |
Cost per
minute/hour of downtime per application |
Ability
to demonstrate value to the bottom line
provided by problem resolution effectiveness
and by preventative measures |
Configuration and
'Release'
Control of
managing changes in process, and technology
in the provision of service solutions requires
identification of items that need to be
configured and controlled. The Configuration
Items (CIs) include:
- Line-of-business (LOB)
application systems
- Website content
- Hardware
- Operations processes/procedures
- Communication processes,
team structures
- Infrastructure software
(database servers, application servers,
operating systems)
A release
is considered to be any change, or group
of changes, that must be incorporated into
a managed IT environment consisting of a
set of Configuration Items (CIs). These
changes are not handled separately, but
rather as a packaged release that can be
tracked, installed, tested, verified, and/or
uninstalled as a single, logical release.
Hence, a 'release' may include one or a
set of following changes, resulting in new
configuration:
- A new or updated line-of-business
(LOB) system
- A new or updated website
including content propagation
- New hardware (server,
network, client, and so on)
- New or updated operations
processes or procedures
- Changes in communication
processes and/or team structures
- New infrastructure software
- Physical change in the
building or environment
Incident, Problem
and Change Management
Item
|
Definition
|
Management
|
Incident
|
Any event that deviates
from the expected operation of a system
or service.
|
The process
of managing and controlling faults and
disruptions in the use or implementation
of IT services, including applications,
networking, hardware, and user-reported
service requests. |
Service
Request |
Service
Desk support calls that have defined
standard operating procedures (SOP) |
Managing
staffing, scheduling, task allocations;
performance and management of defined
SOP |
Problem |
A condition
identified from multiple incidents exhibiting
common symptoms, or from a single significant
incident, indicative of a single error,
for which the cause is unknown. |
Structuring
the escalation process of investigation,
diagnosis, resolution, and closure of
problems to ensure |
Known error |
A condition
identified by successful diagnosis of
the root cause of a problem when it
is confirmed which configuration item
is at fault, and a temporary fix or
workaround is in place. |
Identifying
and documenting errors from the IT infrastructure,
and either creating a workaround or
initiating a request for change (RFC)
to resolve or eliminate the root cause
(where supported by the business case
for doing so). |
Request
for Change (RFC) |
A packet
of work identifying the known problem
and the faulty configuration items at
its root cause, that needs to change
to permanently fix the problem. |
Documenting,
prioritizing, assigning resources, acquisition/modification,
regression testing and deployment of
the RFCs. |
The following diagram depicts the interrelationship
of these items and players
Operations and Service
Process Model
Processes to perform
and manage IT operations and service can
be categorized in four major quadrants,
as per their missions. This provides a natural
view to the management.
|
Quadrant
|
Mission
|
Management Reviews
|
Evaluation
Criteria |
Changing
|
Introduce new service
solutions, technologies, systems,
applications, hardware, and processes
|
Release
Readiness; Performed prior to new release. |
-The release
(the changes)
-The release package (all of the tools,
processes, and documentation)
-The target (production) environment
and infrastructure
-Rollout and rollback plans
-The risk management plan
-Training plans
-Support plans
-Contingency plans |
Operating |
Execute
day-to-day tasks effectively and efficiently. |
Operations
Review; Performed periodically. |
-IT staff
performance
-Operational efficiency
-Personnel skills and competencies
- Operations level agreements |
Supporting |
Resolve
incidents, problems, and inquiries quickly. |
Service
Level Agreement; Performed periodically. |
-SLA defined
targets and metrics
-Customer satisfaction
-Costs |
Optimizing |
Drive changes
to optimize cost, performance, capacity,
and availability in the delivery of
IT services. |
Change Initiation;Performed
at change identification. |
-Cost/benefit
evaluation of proposed changes
-Impact to other systems and existing
infrastructure |
Operations &
Services Functions
IT operations and services consist of several
activities and processes. The functions
are categorized into quadrants as per their
missions. The key activities of the operations
and services function are described here:
Quadrant
|
Services and Functions
|
Key Activities
|
Changing
|
Change Management
|
-
Recording of Requests for Changes (RFC)
- RFC analysis, impact categorization
and
- RFC prioritization and change authorization
(approvals)
- Resources Allocation
- Carry out analysis changes and regression
test
- Release management and review |
Configuration
Management |
-
Identify Configuration Items (CIs) and
Establish Configuration
- Define CI access policy, and provide
necessary access to CIs
- CI Change Management
- Periodic review and retiring of CIs |
Release
Management |
-
Release planning
- User and operations acceptance testing
- Release preparation and training
- Release deployment |
Operating |
System Administration
(Application, OS, App Server, DB) |
-
Application administration & management
(BASIS, Oracle Apps, PeopleSoft, Siebel)
- Operating system administration (Windows,
Unix, Linux)
- Messaging administration (Exchange)
- Database administration (Oracle, DB2,
Sybase, Informix)
- Web server administration (Apache,
IIS)
- Application server administration
(WebLogic, WebSphere, TomCat, MQ) |
Security
Administration |
-
Defining security policy and plans
- Defining security controls based on
the platform and technology being used
- Patch management administration
- Auditing and intrusion detection
- Security incident management |
Service
Monitoring and Control |
-
Process heartbeat
- Job status
- Queue status
- Server resource loads
- Response times
- Transaction status and availability
- Escalation to appropriate administration
groups |
Supporting |
Service
Desk Management |
-
Demand forecasting and staffing management
on a day-to-day basis.
- Roster / Schedule creation and Task
allocation
- Receiving and recording customer communication
using phone / email / on-line systems
- Carry out service requests using defined
processed
- Escalate incidents to Level-2 support |
Incident
Management |
-
Incident recording
- Incident alerting and communication
- Incident contro
l- Incident investigation, diagnosis
& classification
- Incident recovery and resolution
- Incident closure - Incident information
management. |
Problem
Management |
-
Identifying common/recurring incident
patterns.
- Investigation and diagnosis of root
cause of the problem- Temporary fix
or workaround
- Raise RFC for application development
and support teams for permanent fix |
Known
error |
Service
Level Management |
-
Creating a service catalog.
- Identifying and negotiating service
level requirements for service level
agreements.
- Ensuring that service level requirements
are met within financial budgets.
- Setting accounting policies.
- Monitoring and reviewing support services. |
Capacity
Management and Infrastructure Engineering |
-
Gathering, analyzing and modeling future
requirements
- Service monitoring
- Performance management
- Demand management
- Workload management
- Optimization
- Change initiation |
|