SRE
Employer: | Euro-Testing Software Solutions |
Domain: |
|
Job type: | full-time |
Job level: | 1 - 5 years of experience |
Location: |
|
Updated at: | 15.12.2024 |
Remote work: | Hybrid |
Short company description
Euro-Testing Software Solutions is a privately-owned software company specialized in Full-Service Software Testing, Penetration Testing, Vulnerability Identification & Management, Application and Data Security, Static & Dynamic Code Analysis as well as, DevOps/DevSecOps, Robotic Process Automation, Implementation and Customization for Atlassian and Micro Focus (HPE) products.
Requirements
• Experience in using: Linux, UNIX and Windows
• DB administration & maintenance: Oracle, Cassandra, PostgreSQL, AWS DB setups, Caching DB.
• Familiar with: GIT, Jira, Jenkins, Ansible
• Strong knowledge of DevOps and CI/CD pipeline (GitHub, Terraform)
• Knowledge of monitoring solutions: Grafana, Prometheus, Dynatrace
• 'Hands-on' AWS implementation experience across a broad range of AWS services.
• Must have AWS development experience (Containerization - Docker, Amazon EKS, Lambda, EC2, S3, Amazon DocumentDB, PostgreSQL)
• Experience with core AWS platform architecture, including areas such as: Organizations, Account Design, VPC, Subnet, segmentation strategies.
• Comfortable working with cloud-native infrastructure, such as AWS Lambda, Google App Engine, and Azure Cloud Services.
• Backup and Disaster Recovery approach and design
• Environment and application automation
• Proficiency in programming languages such as Python, Go, or Java
• Familiar with Encryption, Logging, and Privacy/Security Protocols (e.g., TLS 1.2, ELK stack)
• Good knowledge of REST/SOAP/JSON web service API implementation.
• Bachelor's degree in Computer Science, Information Technology, or a related field.
• Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.
• Strong understanding of cloud-based applications and infrastructure, including AWS, Azure, or Google Cloud.
• Experience with IT operations best practices such as ITIL, COBIT, or DevOps.
• Experience with IT service management tools such as ServiceNow or Remedy.
• Familiarity with banking customer acquisition applications is preferred.
Responsibilities
• Monitoring system performance, identifying bottlenecks, and executing pipeline optimization.
• Implementing comprehensive service metrics to track and report on system reliability, performance, and efficiency.
• Developing and maintaining CI/CD pipelines, enhancing the consistency and speed of software deployment.
• Automating routine tasks and creating tools to improve team efficiency and system robustness.
• Collaborating with development teams to integrate operational considerations into the software development life cycle.
• Managing incident response protocols, including on-call rotations for junior engineers and strategic planning for senior personnel.
• Conducting post-incident reviews to prevent recurrence and refine the system reliability framework.
• Contributing to disaster recovery plans and ensuring robust backup systems are in place.
• Partner with development teams to improve services through rigorous testing and release procedures.
• Participate in system design consulting, platform management, and capacity planning.
• Create sustainable systems and services through automation and uplifts.
• Balance feature development speed and reliability with well-defined service-level objectives.
• Working on-call shift to prevent incidents from ever happening.
• Running our infrastructure with Ansible, Terraform, GitLab CI/CD, and Kubernetes.
You do some of this daily:
• Approach operations challenges with a software engineering perspective, leveraging: Coding, Automation and Engineering principles.
• Monitor and appropriate address system issues.
• Create strategies to detect issues.
• Design systems to troubleshoot automatically.
• Write and review post-mortems.
• Collaborate with development teams and other stakeholders to identify potential risks.
• Once risks are identified, you will analyze and evaluate potential impact and likelihood of occurrence.
• Based on the risk assessment, you will implement various risk mitigation strategies to mitigate operational risks.
• Continuously monitor and review the effectiveness of their risk strategies.
• Study historical trends in terms of performance by using metrics like charts and graphs.
• Trace the problems with system monitoring tools.
• Monitor the log files to manage infrastructures at scale.
• Minimizing the MTTR for reliable systems is necessary to reduce downtime. As an SRE, you can improve this metric by resolving the incidents quickly.
• Maintain internal tooling.
Other info
Just about you
• Have an enthusiastic, go-for-it attitude.
• Focus on quality of your work.
• Excellent communication skills and team player.
• Open-minded and flexible.
• Hard-worker and passionate.
• Demonstrated ability to adapt to new technologies and learn quickly.
• Works well under pressure and meets deadline.
• Ability to problem solve in a fast-paced, high-stakes environment.
• Proven ability to collaborate with multi-disciplinary teams of business analysts, developers, data scientists, and subject matter experts.
Job-uri similare care te-ar putea interesa: |
|
---|---|
Network Service Manager BUCURESTI, | |
Alchemy - Engineering Internship, Summer 2025 (Bucharest) Aplica fara CV | |
Backend Software Engineer Hybrid | |
Vezi job-uri similare (6) |
Raporteaza eroarea la