Company
4 year old fast growing Mumbai based fintech company operating into lending space.
Job Summary: We are looking for a Site Reliability Engineer to help ensure the reliability scalability and performance of our systems. You will focus on monitoring incident management and continuous improvement of our infrastructure.
Responsibilities:
Monitor system health and uptime using industrystandard tools.
Design and implement incident management processes.
Optimize system performance and ensure uptime.
Collaborate with developers to improve system design for reliability.
Automate repetitive tasks and processes for greater efficiency.
Skills & Requirements:
3 years of experience as an SRE or in a similar role.
Strong understanding of monitoring and logging tools (Prometheus ELK etc.).
Experience in incident management and root cause analysis.
Proficiency with scripting and automation (Python Shell etc.).
Good understanding of cloud platforms (GCP preferred).
Strong problemsolving skills and a passion for improving systems
monitoring,automation,incident management,system design,prometheus,python,root cause analysis,problem-solving,google cloud platform,cloud platforms