Service Reliability Engineer
PRODUCT ENGINEERING – PLATFORM & RELIABILITY
My client is looking for a passionate, skilled and enthusiastic Site Reliability Engineer to join their team in Singapore. As a SRE you will build, operate and improve a highly available, performant, scalable, cost-effective, reliable and secure Cloud Platform using latest tools and technologies.
- Build and maintain the Cloud Platform by:
- Ensuring that the platform is highly available and the infrastructure can scale without any downtime for the customer
- Managing performance and analysing infrastructure problems
- Utilising the latest tools and technologies to enable application containerisation and microservices architecture
- Enabling self-service usage of the platform
- Assessing 3rd party solutions and advising in build-vs-buy decisions
- Ensuring that development teams can autonomously make use of the platform to deliver product features at a higher pace, with reduced coordination
- Develop and improve engineering tools and processes by:
- Defining and improving continuous integration, delivery and deployment processes for the platform and application
- Ensuring accessibility, integration, performance and security for all tools used in the product life cycle
- Being automation driven over manual processes
- Develop strong SRE/DevOps mind set and culture by:
- Sharing knowledge and best practices like Continuous Integration, Delivery and Deployment to the team and organisation so that we can enable the SRE/DevOps approach
- Helping the engineering teams to increase the deployment frequency and reduce the lead time for changes and operating their service in production
- Increase observability and operability by:
- Improving all aspects of monitoring of the Cloud Platform and all auxiliary services
- Helping engineering teams get deep insights into their applications in production
- Ensuring that dashboards provide the right level of information to the right people in the organisation
- Promote technical excellence by:
- Nurturing and monitoring the product technical excellence and high quality by working as a team
- Opening to experiment new and unconventional tech solutions
- Organising the accumulated knowledge and information gained by the team, making it available and easy to retrieve whenever needed by any team member, with an increase focus on the team process.
You need to have:
- 5-7 years of experience
- Solid knowledge of public cloud services (at least 3 years of experience)
- Understanding of cloud native applications and distributed systems
- Software development and testing skills (Go is a big plus, Java, Python, unit testing, integration testing, etc.)
- Good knowledge of network and network security
- Experience with configuration management and infrastructure-as-code tools
- Experience with managing application on Kubernetes (at least 2 years)
- Experience with application monitoring and alerting on scale
- Experience with incident management, on-call rotation
- Familiar with continuous deployment strategies (blue/green, canary)
- Exposure to the Software Development Life Cycle, Continuous Integration and Deployment processes
- Good knowledge of Linux systems (including networking and security)
- Interest in Cloud Operations. This is not an ops position, but the goal is to create engineering solutions for ops problems
- Basic knowledge of database administration
- Application security knowledge (secure software development practices) is a plus
- Strong communication, organisational and problem-solving skills.
Compensation and benefits:
- Competitive salary.
- Flexible working hours.
- Generous Health Insurance allowance.
- 20 days of holiday and summer schedule.
- Coffee, tea, snacks and regular events in our modern and centrally located office.
- Professional career growth by providing access to training and conferences.
Licence No: 14S7347