Red Hat Senior Service Reliability Engineer in Remote, United States
At Red Hat, we connect an innovative community of customers, partners, and contributors to deliver an open source stack of trusted, high-performing solutions. We offer cloud, Linux, middleware, storage, and virtualization technologies, together with award-winning global customer support, consulting, and implementation services. Red Hat is a rapidly growing company supporting more than 90% of Fortune 500 companies.
Keep a globally-distributed, cloud-based, containerized service (Enterprise Kubernetes) running for our customers. The Red Hat OpenShift Online Site Reliability Engineering (SRE) team is looking for a Senior Service Reliability Engineer to join us. In this role, you will work on OpenShift by Red Hat, which is a leading Enterprise Kubernetes platform, in the first team that hosts and manages its code in the public cloud. You’ll play a key role within the team, keeping the OpenShift by Red Hat environment available and secure while providing guidance and mentoring support to the larger team. You’ll interact with other SRE and product engineering associates around the world to deliver large, containerized cluster environments. You’ll be responsible for provisioning, upgrades, problem detection and automated recovery scenarios, incident management, and understanding complicated, interconnected data points to resolve faults when issues arise. As a Senior Service Reliability Engineer, you’ll need to be able to work in a complicated and fast-paced environment while quickly learning new skills and creating ways to consistently meet service level agreements (SLAs). Successful applicants must reside in a state where Red Hat is registered to do business.
Primary job responsibilities
Develop automation to autocorrect or completely prevent issues in our online solution
Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
Participate in the release cycles of the offering, deploying code to integration, staging and production environments, integrating with continuous integration (CI) and continuous delivery (CD) tooling, monitoring, and change management
Perform software updates, peer code reviews, testing, and Common Vulnerabilities and Exposures (CVE) analysis; respond to security threats
Interact with automated monitoring and healing infrastructure to ensure healthy environments
Resolve customer issues in cooperation with Red Hat's global customer support team
Create and maintain standard operating procedures for performing maintenance tasks, applying configuration changes, and remediating problems in our environment
Participate in a regular shift and on-call rotation; this role includes a weekend working schedule
10+ years of experience managing Linux servers running Red Hat Enterprise Linux (RHEL), CentOS, or Fedora hosted at a cloud providers like Amazon Web Services (AWS), Google Compute Engine (GCE), or Microsoft Azure
5+ years of experience with enterprise systems monitoring; knowledge of Zabbix or Nagios is a plus
5+ years of experience with enterprise configuration management tools like Ansible by Red Hat, Puppet, or Chef
5+ years of experience programming in at least one object-oriented language; Golang, Java, or Python are a plus
3+ years of experience delivering hosted services
1+ year(s) of experience with Kubernetes
1+ year(s) of experience with Docker-based containers
Superior communications skills and experience working directly with and presenting to customers
Demonstrated ability to quickly and accurately troubleshoot system issues
Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, uniformed services, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.
Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees, commissions, or any other payment related to unsolicited resumes or CVs except as required in a written contract between Red Hat and the recruitment agency or party requesting payment of a fee.
Job ID 63255
Category Software Engineering