Site Reliability Engineer (SRE)
Company: INSPYR Solutions
Location: Houston
Posted on: November 19, 2024
|
|
Job Description:
Title: Site Reliability Engineer (SRE)
Scroll down to find the complete details of the job offer,
including experience required and associated duties and tasks.
Location: Houston, TX or Bartlesville, OK
Duration: 12 month+ contract
Work Requirements: US Citizen, GC Holders or Authorized to Work in
the U.S.
Our Client is seeking a Site Reliability Engineer (SRE) to become a
part of their growing Digital IT team focused on building an
OpenShift/Kubernetes capability. The SRE will support the
reliability of Digital IT/OT critical applications. This
transformative role involves automating IT infrastructure tasks and
driving SRE best practices, tools, and processes. The ideal
candidate should exhibit a growth mindset and proactively monitor
and work with application developers to respond to incidents for
optimal user experience.
The candidate must have senior level experience deploying OpenShift
on premises and supporting applications in Kubernetes. The ideal
candidate will have experience in both on-prem OpenShift and Azure
Kubernetes container platforms.
The successful candidate will possess strong infrastructure and
developer background as well as interpersonal skills needed to
communicate design requirements and objectives while providing
thought leadership to peers and leadership.
Candidates should be self-motivated and collaborative IT
professionals with a strong background in software development,
systems administration and IT automation.
Responsibilities:
Maintaining survivability and reliability of IT/OT critical
resources.
Write and build CI/CD pipelines and build/release processes for
IT/OT workflow applications.
Provide mentoring to the IT/OT Devops team in the best practices
associated with CI/CD deployments using ADO, and GIT.
Perform periodic load and scalability testing to establish
baselines, drift, and capacity planning.
Conduct weekly operational state reviews covering performance
trends, anomalies, errors, and other availability events with SREs,
product owners, and development teams.
Participate in quarterly business and operational reviews aligning
on roadmaps, development velocity, efficiency, growth trends,
patching, etc.
Plan and execute periodic Disaster Recovery exercises including
both tabletop and simulated failures (fault injection).
Required Qualifications
Candidates must have a bachelor's degree and 7 years of IT
experience.
Senior level experience with OpenShift and Kubernetes.
Familiarity with continuous integration/deployment processes and
tools such as IDEs (Eclipse), Source Code management. (GIT/Stash),
ADO Pipelines, Maven, Nexus artifacts, etc.
Strong understanding of SRE practices: incident response,
change/release management, capacity planning, infrastructure
automation, elastic environments, chaos engineering and blameless
postmortems.
Expertise in application performance monitoring, observability, and
proactive alert correlation, including monitoring containers and
failure-based alerting.
Scripting experience such as Python and Bash
Experienced in deploying applications in OpenShift in both public
and private cloud.
Excellent written and oral communications skills
Demonstrated ability to communicate to nontechnical audience on
technical issues.
Demonstrated ability to communicate on a technical level to a
technical audience.
Strong interpersonal skills, adaptable and able to learn
quickly.
Requires limited supervision and have excellent time management
skills.
Self-motivated and self-starter.
Ability to work and interact with others in a structured/team
environment.
Technology Stack Experience with at least one technology in each of
the tech stack categories below:
Monitoring and Logging Tools(s): AppDynamics, Splunk, ELK Stack,
DataDog, Prometheus, AWS CloudWatch/X-Ray, Grafana
Programming: C# .NET, PowerShell, Python, YAML
Containers: Docker, Helm Chart
OS: Linux - RHEL, Ubuntu, CentOS
Code Repos: Azure Repos, GitHub, GitLab
Infrastructure as code: Terraform, Ansible
Automation Tools: Ansible,Jenkins, Chef, Puppet
Agile: JIRA, SAFe
Desired Qualifications Experience in cloud/virtual technologies and
management - OpenShift, VMware, AWS, Azure, etc.
Familiarity with security best practices for containerized
applications.
Knowledge of DevOps practices and tools.
Knowledge, skills and abilities to automate the creation of
Platform as a Services (PaaS) infrastructure using industry
standard tools such as Ansible and Chef.
Familiarity with Industrial Control System (ICS) security
architecture - Purdue model
Work Location: On-Site-Houston or Bartlesville
Our benefits package includes: Comprehensive medical benefits
Competitive pay
401(k) retirement plan
---and much more!
About INSPYR Solutions
Technology is our focus and quality is our commitment. As a
national expert in delivering flexible technology and talent
solutions, we strategically align industry and technical expertise
with our clients' business objectives and cultural needs. Our
solutions are tailored to each client and include a wide variety of
professional services, project, and talent solutions. By always
striving for excellence and focusing on the human aspect of our
business, we work seamlessly with our talent and clients to match
the right solutions to the right opportunities. Learn more about us
at inspyrsolutions.com.
INSPYR Solutions provides Equal Employment Opportunities (EEO) to
all employees and applicants for employment without regard to race,
color, religion, sex, national origin, age, disability, or
genetics. In addition to federal law requirements, INSPYR Solutions
complies with applicable state and local laws governing
nondiscrimination in employment in every location in which the
company has facilities
Keywords: INSPYR Solutions, Baytown , Site Reliability Engineer (SRE), Engineering , Houston, Texas
Click
here to apply!
|