Site Reliability Engineer Jobs in Inetum in Bucharest - Romania

Site Reliability Engineer

Inetum

Posted on : 16-01-2025

Employer Active

1 Vacancy

The job posting is outdated and position may be filled

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Bucharest - Romania

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 16-01-2025

Job Description

Approach operations challenges with a software engineering perspective leveraging:

Coding Automation and Engineering principles.
Monitor and appropriate address system issues.
Create strategies to detect issues.
Design systems to troubleshoot automatically.
Write and review postmortems.
Collaborate with development teams and other stakeholders to identify potential risks.
Once risks are identified you will analyze and evaluate potential impact and likelihood of occurrence.
Based on the risk assessment you will implement various risk mitigation strategies to mitigate operational risks.
Continuously monitor and review the effectiveness of their risk strategies.
Study historical trends in terms of performance by using metrics like charts and graphs.
Trace the problems with system monitoring tools.
Monitor the log files to manage infrastructures at scale.
Minimizing the MTTR for reliable systems is necessary to reduce downtime
As an SRE you can improve this metric by resolving the incidents quickly.
Maintain internal tooling.
Monitoring system performance identifying bottlenecks and executing pipeline optimization.
Implementing comprehensive service metrics to track and report on system reliability performance and efficiency.
Developing and maintaining CI/CD pipelines enhancing the consistency and speed of software deployment.
Automating routine tasks and creating tools to improve team efficiency and robust system.
Collaborating with development teams to integrate operational considerations into the software development life cycle.
Managing incident response protocols including oncall rotations for junior engineers and strategic planning for senior personnel.
Conducting postincident reviews to prevent recurrence and refine the system reliability framework.
Contributing to disaster recovery plans and ensuring robust backup systems are in place.
Partner with development teams to improve services through rigorous testing and release procedures.
Participate in system design consulting platform management and capacity planning.
Create sustainable systems and services through automation and uplifts.
Balance feature development speed and reliability with welldefined servicelevel objectives.
Working oncall shift to prevent incidents from ever happening.
Running our infrastructure with Ansible Terraform GitLab CI/CD and Kubernetes.

Qualifications :

Experience in using: Linux UNIX and Windows
DB administration & maintenance: Oracle Cassandra PostgreSQL AWS DB setups Caching DB.
Familiar with: GIT Jira Jenkins Ansible
Strong knowledge of DevOps and CI/CD pipeline (GitHub Terraform)
Knowledge of monitoring solutions: Grafana Prometheus Dynatrace
Handson AWS implementation experience across a broad range of AWS services.
Must have AWS development experience (Containerization Docker Amazon EKS Lambda EC2 S3 Amazon Document DB PostgreSQL)
Experience with core AWS platform architecture including areas such as: Organizations Account Design VPC Subnet segmentation strategies.
Comfortable working with cloudnative infrastructure such as AWS Lambda Google App Engine and Azure Cloud Services.
Backup and Disaster Recovery approach and design
Environment and application automation
Proficiency in programming languages such as Python Go or Java
Familiar with Encryption Logging and Privacy/Security Protocols (e.g. TLS 1.2 ELK stack)
Good knowledge of REST/SOAP/JSON web service API implementation.
Bachelors degree in Computer Science Information Technology or a related field.
Relevant industry certifications such as through the Site Reliability Engineering (SRE) Foundation.
Strong understanding of cloudbased applications and infrastructure including AWS Azure or Google Cloud.
Experience with IT operations best practices such as ITIL COBIT or DevOps.
Experience with IT service management tools such as ServiceNow or Remedy.
Familiarity with banking customer acquisition applications is preferred.

Additional Information :

Benefits:

Full access to foreign language learning platform
Personalized access to tech learning platforms
Tailored workshops and trainings to sustain your growth
Medical subscription
Meal tickets
Monthly budget to allocate on flexible benefit platform
Access to 7 Card services
Wellbeing activities and gatherings

Hybrid: 12 days/week from office (Bucharest)

Remote Work :

Employment Type :

Fulltime

Employment Type

Full-time

Company Industry

Key Skills

Apply Now

About Company

Inetum

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.