Emory

Senior Manager, High Performance Computing

Job Number
110173
Job Type
Regular Full-Time
Division
Office Information Technology
Department
OIT: App Devlpm & Informatics
This position may involve the following Health and Safety issues:
Not Applicable
Job Category
Information Technology
Campus Location (For Posting) : City
Atlanta
Location : Name
Emory Campus-Clifton Corridor

Discover Your Career at Emory University

Emory University is a leading research university that fosters excellence and attracts world-class talent to innovate today and prepare leaders for the future. We welcome candidates who can contribute to the diversity and excellence of our academic community.

Description

Seeking a highly motivated, experienced technical manager to lead the operational support of Emory’s innovative hybrid high-performance computing cluster platform and service. This position manages a team of high-performance computing engineers and works with a variety of technical teams and with Emory faculty to engineer and manage the operations of well-designed high performance computing infrastructure that advances knowledge discovery, tackles challenging scientific problems, and provides the basis for education and training in cutting-edge technology. This new position aligns with Emory University’s strategic direction to provide hybrid cloud and on-premise IT infrastructure to meet the growing computing and analysis needs of its research and teaching community, particularly in the multi-disciplinary field of AI and its applications to many domains of scientific inquiry.

 

JOB DESCRIPTION:

  • Leads, manages and supervises Information Technology (IT) and operations in an integral IT area.
  • Has overall responsibility for a specific IT area including leading specific IT projects and implementation of new versions of software, management of systems and coordination with other IT projects across the division.
  • Is accountable for a specific product and technical environment.
  • Advises management on issues within the specific area that impact the division and enterprise-wide IT services.
  • Ensures compliance and uniform, transparent systems across the division and enterprise by working closely with other area managers and directors.
  • Implements uniform administrative procedures and systems throughout a division.
  • Hires, trains and supervises staff.
  • Performs related responsibilities as required.
  • Overseeing the design, implementation, installation and maintenance of Emory’s community high-performance computing cluster infrastructure on the cloud and on premise
  • Designing, implementing, and supporting novel hybrid cloud HPC mechanisms and associated robust and secure IT solutions within a fast-paced research environment
  • Actively seeking to understand the latest AI research computing requirements, planning infrastructure upgrades and feature releases to follow evolving trends and demands,
  • Maintaining collaborative relationships with Emory faculty, seeking to understand and anticipate their needs, offer efficient solutions and recommend design or operational improvements, and help make strategic decisions around the deployment of and investments in HPC solutions for complex research problems
  • Defining and operationalizing support services for end-users, including training, assistance in scripting, software installation, technical troubleshooting and identifying cloud vs on-premise fit
  • Setting the strategy around monitoring and maintaining the health and integrity of the HPC clusters, including upgrading and patching, defining and tracking performance metrics as well as utilization to ensure efficient current and future use of IT resources
  • Partnering and proactively coordinating with a variety of technical teams, including distributed IT support units, Software Engineering, Data Solutions, IT Architecture, Middleware, Infrastructure, Networking, Information Security, Business Analysts, Product Management and Project Management units to enable and streamline the timely deployment of HPC services
  • Maintaining relationships with external technology vendors and acting as liaison to technology providers for researchers; maintaining relationships with internal purchasing, finance, shipping & receiving
  • Communicating progress and roadblocks, success metrics, roadmap priorities and timelines on a regular basis to a variety of IT and academic stakeholders; advising leadership and community stakeholders on issues and improvements that impact the HPC platform and service; helping inform the budget for the maintenance and operational costs and future HPC investments
  • Ensuring quality outcomes through best practices in security, compliance, infrastructure as code, streamlined release processes, thorough testing and validation, efficient ticket queue management, and striving towards taking those extra steps to ensure a quality product
  • Driving the systematic documentation of use cases, reusable patterns and technical guidelines and regularly reviewing process and procedures for systems management that must meet regulated data compliant standards
  • Hiring, training and supervising a highly functional team of system engineers involved in multiple projects and activities, promoting training in HPC engineering to keep team skills current and increase retention, and ensuring high levels of accountability, respect, equity, inclusion, and collegiality towards Emory’s diverse community members
  • Participating in cross-functional management teams and projects and contributing to implementing uniform administrative procedures and systems across the division.

MINIMUM QUALIFICATIONS:

  • A bachelor's degree and seven years of related IT experience including demonstrated technical expertise in specific IT areas, project management skills and lead or supervisory experience or an equivalent combination of education, training and experience.
  • Extensive experience in the implementation and production support of an enterprise system.

PREFERRED QUALIFICATIONS:  

  • Bachelor's degree in computer science, math, engineering, or a related field OR High School Diploma and equivalent combination of education, training and experience
  • 7+ years of experience of related IT management experience, including project management skills and lead or supervisory experience with demonstrated positive outcomes, team performance skills, service mindset approach, and the ability to act as a trusted advisor
  • Specialized knowledge in HPC Cluster technology, ability to lead the development of the technology and/or extensive experience in the implementation and production support of an enterprise system, coordinating with cross-functional teams and in direct relation with business stakeholders Strong and precise communications skills and the ability to create and foster positive and creative relationships with faculty, staff, and students at various levels of the organization Demonstrated problem-solving skills and accountability for high-stakes efforts Technical experience with principles of scientific computing, large-scale storage and network systems for research/scientific data
  • 3+ years of experience in Linux administration 3+ years of professional HPC cluster user support, planning and management experience
  • 2+ years of experience with scripting in Python and Bash 1+ year of experience in working with Cloud Infrastructure such as Amazon Web Services.
  • Broad knowledge of the deployment and management of HPC systems (e.g. storage, cluster computing, network, database, containerization, virtualized systems)
  • Working knowledge of the AWS Parallel Cluster product
  • Experience in Hybrid HPC platforms
  • Good understanding of the various AWS products and their applications
  • Strong knowledge of the Slurm cluster management software
  • Experience with specialized computing, like GPU, parallelization, and DevOps aspects such as containers and automation
  • Experience working within an academic, research, or scientific institution in a user-facing role
  • Experience working with standard practices in project management (PMI), service management (ITIL) and DevOps
  • Familiarity with research computing problem space, scientific data, bioinformatics packages, big data analysis methods or machine learning principles and algorithms

NOTE: This role will be granted the opportunity to work from home regularly but must be able to commute to Emory University location as needed.  Emory reserves the right to change this status with notice to employee.

Emory Supports a Diverse and Inclusive Culture

Emory University is dedicated to providing equal opportunities and equal access to all individuals regardless of race, color, religion, ethnic or national origin, gender, genetic information, age, disability, sexual orientation, gender identity, gender expression, and veteran's status. Emory University does not discriminate in admissions, educational programs, or employment on the basis of any factor stated above or prohibited under applicable law. Students, faculty, and staff are assured of participation in University programs and in the use of facilities without such discrimination. Emory University complies with Executive Order 11246, as amended, Section 503 of the Rehabilitation Act of 1973, the Vietnam Era Veteran's Readjustment Assistance Act, and applicable executive orders, federal and state regulations regarding nondiscrimination, equal opportunity and affirmative action. Emory University is committed to achieving a diverse workforce through application of its affirmative action, equal opportunity and nondiscrimination policy in all aspects of employment including recruitment, hiring, promotions, transfers, discipline, terminations, wage and salary administration, benefits, and training. Inquiries regarding this policy should be directed to the Emory University Department of Equity and Inclusion, 201 Dowman Drive, Administration Building, Atlanta, GA 30322.


Emory University is committed to providing reasonable accommodations to qualified individuals with disabilities upon request. To request this document in an alternate format or to request a reasonable accommodation, please contact the Department of Accessibility Services at 404-727-9877 (V) | 404-712-2049 (TDD). Please note that one week advance notice is preferred.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed

Connect With Us!