Site Reliability Engineer

Job Type Permanent
Salary Attractive salary and Bonus
Reference 33911

My client is a consumer internet and technology business with an unrivalled sports media, gaming, social, and fintech platform which serves millions of daily active users across the globe via technology and operations hubs across more than 10 countries and 3 continents.

The company has already built a team of 300+ high achievers from a diverse set of backgrounds  and the company is looking for more talented individuals to drive further growth and contribute to the innovation, creativity and hard work that currently serves the company's users further via their grit and innovation.

The Stack

Backend Application Framework: Spring Boot (Java Config + Embedded Tomcat)
Frontend Application Framework: VueJS
Micro Service Framework: Spring Cloud Dalston (Netflix Eureka + Netflix Eureka + Netflix Ribbon + Feign)
Database: AWS RDS, RDS Proxy, MONGODB
Public Cache: AWS ElastiCache + Redis
Message Queue: Apache RocketMQ, RabbitMQ
Distributed Scheduling: Dangdang Elastic Job
Data Index and Search: ElasticSearch
Log Real-time Visualization: ElasticSearch + Logstash + Kibana, Grafana Loki
Business Monitoring: Prometheus + Grafana
Reverse Proxy: Nginx
CDN: Cloudflare
Server Virtualization Container: AWS EKS + AWS EC2
Server Operation System: CentOS
Static File Storage: AWS S3
Inner DNS Resolution: AWS Route 53
Network Management: AWS VPC
Cluster Management and Scaling: AWS OpsWorks
Cluster Monitoring: Prometheus + AWS CloudWatch
HTTPS Certificate Management: AWS Certificate Manager
Malicious Attack Defending: AWS WAF & Shield
Cluster Alert: AWS SNS + Slack
Continuous Integration/Deployment: Jenkins, Rancher, ArgoCD
Configuration Tool: Ansible, Chef, Salt


Work with a team of DevOps/SRE and DBA professionals
Improve existing infrastructure and processes in the 6 countries the company is currently deployed in as well as streamlining processes deploy to new countries in the future
Holistically improve all aspects of the company's current infrastructure including: reducing costs; streamlining environment provisioning; lowering response times and incorporating the latest techniques and technologies
Monitor and maintain the existing cloud infrastructure via autoscaling, automated alerts, andOpsWork and Grafana dashboards
Take ownership and responsibility for the team's cloud operation activities
Liaise with external security agencies for annual audits as well as perform the company's own internal security sweeps
Aid in reconfiguring existing architecture to allow for rapid deployments to new countries
Mentoring less experienced team members


3+ years SRE experience
Experience independently leading the planning and deployment of a project
Experienced with cloud platforms, especially AWS, including solid knowledge of how to utilize cloud resources to fulfill the demand from other teams and production
A sound understanding of modern Micro Services and Service Mesh concepts
Experience managing Kubernetes, including CI / CD with Kubernetes
Solid networking knowledge, especially the TCP / IP stack and HTTP protocol
A strong understanding of cache, including CDN, HTTP cache, Redis / Memcached
Excellent troubleshooting skills, including Linux OS issue diagnosis and OS parameter optimization, JVM optimization would be highly advantageous
Experienced with CloudNative Monitoring solution in Large distributed system using observation model


Quarterly and flash bonuses
Flexible working hours
Top-of-the-line equipment
Education allowance
Referral bonuses
28 days paid annual leave
Annual Global and Team company retreats - the company is going on a luxury 5 day retreat to Dubai in November this year!
Highly talented, dependable co-workers in a global, multicultural organisation
The company scores 100% on The Joel Test
The company's teams are small enough for you to be impactful
The company's business is globally established and successful, offering stability and security to company's Team Members.

Apply Now