Emory

  • Network/Applications Monitoring Engineer

    Job Number
    29078
    Job Type
    Regular Full-Time
    Division
    LITS: Library and IT Services
    Department
    LITS: Network/App Monitoring
    This position may involve the following Health and Safety issues:
    Not Applicable
    Job Category
    Information Technology
  • Description

    Network/Applications Monitoring Engineer

     

    Operates and supports monitoring and automation software responsible for event detection, notification and incident generation for all Emory University IT services.  Responds proactively and reactively to Enterprise monitoring and automation software issues. Utilizes fault isolation and repair techniques to quickly isolate the root cause.  Performs regularly scheduled application and system maintenance including software patching, failover testing and data purging/cleanup.  Creates and generates reports for monitored services managed by Emory LITS teams and customers. 

     

    Responds to investigation requests for missed events and works with customer to create new monitoring which closes monitoring gaps.  Monitors assigned ticketing queues, monitors applications such as SMARTS, airwave, zabbix, develops software and tools for optimizing network and application monitoring and performance, monitors network performance and prepares statistics to document incidents and provide history for future reference and research. 

     

    Acts as a subject matter expert on monitoring services and supports other IT staff to troubleshoot monitoring issues as well as implement new monitoring based on customer needs.  Document user guides monitor deployment processes and trains front-line IT and service desk staff in the use of monitoring software and automation.   Monitors, reviews, assigns and accepts trouble tickets in all assigned ticketing systems to ensure all tickets are resolved within Service Level Management guidelines. Responds to alert notifications received via supplied monitoring tools such as SMARTS to reduce impact of potential issues. Perform troubleshooting to clear or identify issues.

     

    Interacts with IT departments to drive issue resolution. Interacts with customers to obtain additional information, provide status reports and evaluate short and long-term solutions.  Follows up with customers to ensure customers test solutions and to ensure application functionality. Monitors, reviews, assigns and accepts WOs in all assigned ticketing systems to fulfill customer requests. Reviews requests for accuracy to ensure requirement information is gathered for completing work and resolving issues within established deadlines. Performs related responsibilities as required.

     

     

    JOB DESCRIPTION: **This is a central university office position** Responsible for the development, implementation and configuration of current monitoring systems such as SMARTS. Provides 24x7 on call support and ongoing daily maintenance of the network management systems and related infrastructure, including identification of gaps and system improvements. Handles all day-to-day and project-based activities related to event management and monitoring.

     

    Engage with other Technical Operations Center (TOC) team members, customers, vendors and internal departments to provide continuous improvement in benchmarking, monitoring, logging, event responsiveness and accuracy. Supports the TOC by providing technical expertise, leadership and guidance for incident management, work order (WO) fulfillment, and other departmental activities. Provides back-up for TOC operations and is expected to actively investigate incident reports, manage all assigned tickets, process WO and field customer calls. Reviews reports generated daily by the SMARTS servers and all other monitoring systems to ensure software is functioning properly.

     

    Installs patches, updates and revisions of software to maintain up-to-date software versions and to provide bug fixes, new features and certifications. Configures and thoroughly tests escalation policies to allow automated ticket creation, paging, e-mail and texts of network and server events. Manages all service monitoring changes including system additions, deletions, upgrades and threshold tunings to provide efficient change management controls. Develops, reviews and manages all documentation related to the monitoring systems and processes to maintain an up-to-date knowledge base. Develops custom scripts to provide data about the network, to provide tools for TOC Operators and to document changes to the system. Creates custom scripts to perform custom monitoring for customer-specific services.

     

    Develops and tests adaptors to provide methods for getting alarms into and out of SMARTS. Designs custom consoles to provide alarms in user-specific formats. Tests devices and submits requests to third party vendors for certification templates to ensure interoperability with current monitoring systems.

     

    Mentors and assists to provide day-to-day technical development of TOC Engineers. Identifies departmental training opportunities to enhance the knowledge level of the TOC team. Strategizes with TOC management, internal departments and third party vendors to evaluate, plan and implement new software, features and enhancements. Monitors, reviews, assigns and accepts WOs in all assigned ticketing systems to fulfill customer requests. Performs related responsibilities as required.

    MINIMUM QUALIFICATIONS: A bachelor's degree in IT or related field and three years of relevant experience, OR and equivalent combination of experience, training, and/or education. DATE REVIEWED/MODIFIED/CREATED: 11/05/10 JB

    Additional Details

    PREFERRED QUALIFICATIONS:

    Good scripting and has experience with infrastructure device monitoring and management.

     

     

    Options

    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed

    Connect With Us!