Overview:
We are seeking a Senior Linux Reliability Engineer with extensive knowledge of Linux OS architecture to join our IT Operations team. The ideal candidate will have strong experience with Puppet code, AWX, RHEL, SUSE, and Oracle Linux. As a key member of the RUN team, you will serve as an escalation point for major incidents, providing Level 3 operations support, including root cause analysis. This role also involves collaborating with application owners and technical teams, leading cross-functional initiatives, and integrating new technologies into our enterprise.
Responsibilities:
- Collaborating with cross-functional teams to understand business needs and identify requirements gaps.
- Maintaining the health, reliability, and security of Linux operating systems.
- Analyzing system performance data and proactively identifying potential issues.
- Implementing solutions to ensure optimal performance.
- Defining metrics for monitors and optimizing operating system performance.
- Diagnosing and correcting monitoring policy failures.
- Identifying opportunities for process improvements and automation.
- Serving as an escalation point for major incidents and providing Level 3 operations support.
- Coordinating and leading cross-functional initiatives and projects.
- Ingesting new technologies, writing documentation, and performing ingestion validation duties.
- Utilizing scripting for system management and automation.
- Ensuring Operating Systems comply with the organization’s standards.
Requirements:
- Bachelor’s degree or technical institute degree/certificate in a relevant field or equivalent work experience.
- 8+ years of relevant IT work experience.
- Relevant certification is required.
- Excellent problem-solving skills.
- Strong communication and presentation skills.
- Experience with ITSM processes (change management, incident and request creation/tracking).
- Experience with Agile delivery and modern cloud-based software tools (Git, JIRA, Confluence).
- Proficiency with Puppet, AWX, RHEL, and SUSE.