Emory’s Department of Biomedical Informatics (DBMI) is seeking a Senior Operating System Analyst/Administrator to assist the expansion of its on-premise and cloud computational environment. The DBMI comprises about 100 researchers performing cutting-edge research on medical data using state-of-the-art machine learning and signal processing approaches on large data sets. The HIPAA-compliant storage and compute environment includes a complex distributed PHI-protected infrastructure consisting of an HPC cluster, GPU cluster (NVIDIA DGX) and a ~200TB parallel file system. A detailed description about our computational facilities can be found here: https://med.emory.edu/departments/biomedical-informatics/research/facilities.html. Researchers at the DBMI also work on GCP, AWS and Azure. We have a particularly strong relationship with Google, with which we work closely to innovate cloud infrastructures.
From early on, you’ll help to empower cutting-edge research, be given challenging assignments, lead initiatives, and take ownership and hands-on responsibility for the design, development and deployment of research infrastructure. You’ll be given the opportunity for exceptional training with global leaders across industry and academia to help you become a leader in your field. Working with an additional dedicated sysadmin and multiple core research engineers, you’ll be supporting the infrastructure for world-class research applying AI to Health.
The ideal candidate for this role will have a strong systems engineering background and is someone looking for the next step in their Linux server administration/systems engineering career path, with opportunities to work across the entire infrastructure stack (hardware/software/network/cloud) and further opportunities to develop deeper skills in specific areas.
JOB DESCRIPTION:
- Plans and implements one or more muli-platform operating systems, utilities, and related software to meet organizational needs.
- May be responsible for applications on dedicated servers. Ensures the availability, integrity and reliability of assigned systems.
- Maintain and upgrade the DBMI HIPAA compliant computing environment. You will work in a systems team that already includes one System Administrator and may expand to include a third system administrator in the future
- Provide documentation and technical specifications to the Chair and Vice Chair for planning and implementing new or upgrades of IT infrastructure Responsible for capacity, storage planning, and database performance
- Responsible for IT security, particularly with health information
- Monitor datacenter health using preexisting management tools and respond to hardware issues as they arise; help build, test, and maintain new servers as needed
- Work with Emory University and Emory Healthcare IT teams on issues pertaining to data access, networking and compliance of Emory policies
- Adopt best practices for systems administration.This includes:Running diagnostics, documenting problems and resolutions, prioritizing problems, and assessing the impact of issues
- Perform or delegate regular backup operations and implement appropriate processes for data protection, disaster recovery, and failover procedures
- Perform routine/scheduled audits of the systems, including all backups
- Plans and implements one or more multi-platform operating systems, utilities, and related software to meet organizational needs.
- May be responsible for applications on dedicated servers. Ensures the availability, integrity and reliability of assigned systems.
MINIMUM QUALIFICATIONS:
- Five years of operating systems analysis/administration experience OR a bachelor's degree and three years of operating systems analysis/administration experience.
- RedHat or RHEL-equivalent (CentOS/Rocky Linux/Scientific Linux/etc.) System Administrator certified or equivalent experience.
- Experience working in, and maintaining a HPC environment. Batch systems like PBS/SLURM.
- Experience managing parallel and cluster file systems such as NFS, Gluster, Lustre, etc.
- Experience in scripting and automation tools (Bash, Perl, Python, Ansible, etc.).
- Strong organizational and documentation skills.
- Strong people skills to provide a service-focused operation internally to the department, working across departments, and with external entities.
Additional Desired Skills:
- Familiarity with identity management, security tools and best practices (e.g. LDAP, FreeIPA, SELinux, HIPAA).
- Familiarity with Debian or Ubuntu Server-based Linux distributions and their system management differences from RHEL-like distributions.
- Experience in deploying and maintaining high-speed interconnects, especially Infiniband.
- Experience in managing virtual machine farms (oVirt, KVM).
- Familiarity with real-time data stream management systems (Kafka, Flink, etc.)
- Familiarity with server monitoring tools such as Nagios or Ganglia.
- Experience in using and maintaining the following cloud applications or similar are useful but not required: GSuite, Salesforce, GitHub, Asana, and Mailchimp.
- Strong preference would be given to candidates with familiarity with cloud-based systems, containerization, and orchestration systems such as Kubernetes. System administration and IT certifications in AWS and GCP, or other network-related fields are a plus.
NOTE: This role will be granted the opportunity to work from home regularly but must be able to commute to Emory University on a flexible weekly schedule based upon business needs. Schedule is based on agreed upon guidelines of department of work. Emory reserves the right to change remote work status with notice to employee.