SRE (Site Reliability Engineer)
This job is no longer active!View all jobs K2View activeView all jobs SRE (Site Reliability Engineer) active on Hipo.roView all jobs Engineering active on Hipo.roView all jobs IT Hardware active on Hipo.roView all jobs IT Software active on Hipo.ro |
Employer: | K2View |
Domain: |
|
Job type: | full-time |
Job level: | 1 - 5 years of experience |
Location: |
|
Updated at: | 21.08.2022 |
Remote work: | On-site |
K2view is looking for SRE (Site Reliability Engineer) positions as follows:
K2view is looking for a Site Reliability Engineer (SRE) to join our SRE team and help maintain superb performance of K2view mPaaS and PaaS offering focusing on service Availability, Reliability, Visibility and performance. K2view service runs on one of the major IaaS providers more specifically AWS, Azure & GCP clouds.
As a member of the SRE team, you’ll be working closely with K2view DevOps, Tier 1, R&D, COE (Center of Excellence) and K2view Customers. You will be part of the group taking care of K2view mPaaS/PaaS Availability, Reliability & Performance, while designing and building cutting edge monitoring and alerting services to be able to achieve 99.9% of availability.
Job Summary
At K2View, we’re passionate about big data systems and data management platforms. We count on our site reliability engineers (SREs) to empower our Customers with a rich infrastructure and monitoring tools to maintain high availability, Reliability, and stellar performance level to pursue their objectives. As we expand our customer deployments, we are currently seeking an SRE to deliver insights from massive scale data in real time. Our SRE’s are responsible for creating, configuring and maintaining monitoring environments and tools. They are experts in analyzing production systems metrics, identifying the root cause of systems performance issues and taking reactive/proactive actions to remain the system in healthy state.
Responsibilities
Provide 24*7 monitoring of customers production systems
Creating monitoring dashboards and setting thresholds for tracking overall systems health
- Provide SLA Infrastructure and Dashboards for service availability
- Provide Generate periodic system health reports.
- Identify system trends and prevent production failures
For critical production issues run initial triaging & Open escalation bridge
Investigation, Recording and analysis of production Errors
Run daily production processes
Where applicable, restore the system to operational state
Support Tier1 group investigations if required
Support K2view offering mPaaS (managed PaaS) installation for new customers
Manage deployment for on-going Change Requests
Run various production Investigation such as Cassandra, Golden Gate & Kafk
Requirements:
- Bachelor’s degree in computer science or other highly technical, scientific discipline
- Some experience in Linux and Windows operating systems
- Ability to program (structured and Object Oriented) with one or more high level languages (Java – advantage)
- Ability to analyze/debug large and complicated systems
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Some Experience in SQL/NoSQL Databases such as PostgreSQL, SQLite etc.
- MUST: Fluent English speaking/writing skills
- Availability for at least 5 shifts a week (including night shifts, Friday and Saturday shifts)
- Huge Advantage: Previous experience as SRE member for SaaS/PaaS offering running on one of the major Clouds: AWS, Azure or GCP.
Job-uri similare care te-ar putea interesa: |
|
---|---|
Site Reliability Engineer BUCURESTI, | |
Booking Holdings Romania - Senior Site Reliability Engineer Hybrid | |
AERO Stress Engineer Hybrid | |
Vezi job-uri similare (512) |
Raporteaza eroarea la