Deep dive into development lines, learning and understanding the mechanism of every application component, and promoting product scalability, stability and performance.
Setup, manage and maintain product, middleware, big-data applications and services.
Perform regular and ad-hoc server-side deployments, performance fine-tuning and troubleshooting.
Design and develop automations for workflows.
Capacity and Resource management.
Responsible for the full-chain stress test to enhance the performance and remove redundancy of applications.
Prepare routine operation documentation.
Job Requirements
Bachelor's or higher degree in Computer Science, Engineering, Information Systems or related fields.
Minimum 2 years of working experience in Site Reliability Engineer roles.
Extensive and hands-on knowledge with Linux operating systems (Ubuntu, CentOS, etc.).
Knowledge of Computer Network(TCP/IP, DNS, etc.) and OS.
Hands-on experience with at least one of the programming languages: Bash, Python, Go.
Strong analytical and problem-solving skills with the ability to thrive under difficult and stressful situations.
Passion and high sense of responsibility for work.
Fast learning ability and a good team player.
Detailed-oriented, cautious and prudent.
Skills Below Are Optional But Preferable
Experience with automation tools like Ansible, Jenkins.
Experience with monitoring tools like Prometheus, Zabbix, Grafana etc.
Experience with load balancing tools like LVS, Nginx, Openresty or HAProxy.
Experience with container technology such as Docker, Kubernetes.