Director, Operations - GPU Cloud

SingTel

Early Applicant

8 days ago
Be among the first 50 applicants

Exp: 10-12 Years

Full time

Singapore

Telecom/ISP

Job Description

Job Description :

Be a Part of Something BIG!

Make an Impact by

To lead and manage the GPU Infrastructure-as-a-Service (IaaS) platform. This role will oversee the GPU infrastructure, storage infrastructure and associated services, ensuring seamless integration and operation.

Infrastructure and Resource Management:

Manage the maintenance and operations of Data centre with liquid cooling setup that hosts the GPU cloud.
Optimization of GPU infrastructure and associated hardware.
Optimize resource allocation to meet the performance requirements of both data centre operations and cloud hardware operations, as well as cost-effectiveness goals.
Lead the operations team to ensure compliance to the SLA needs of customers and the product.
Enhance system scalability and reliability through automation and continuous improvements. Enforce industry-standard operational process with reference to standards like ISO 27001 or equivalent in the data centre and cloud operations

Operational Excellence:

Handle general incidents, including operations management and escalation management across the AI cloud product.
Develop and implement operational strategies to ensure the reliability and efficiency of our GPU Cloud infrastructure.
Collaborate with other departments to streamline processes, enhance customer experience, and meet service level agreements.
Support services and improve the lifecycle of GPU cloud hardware and the data centre environment with monitoring, logging, and alerting through deployment, operation, and refinement.
Establish Ops systems/processes (SOPs, EOPs etc) and to manage daily operational issues.
Possess strong operational management skill set, which involves organising the internal cross functional teams and external vendors to ensure an efficient and resilient ops setup.

Team Management:

Build and lead a high-performing operations team to foster a culture of innovation, collaboration, and continuous improvement.
Set clear goals and objectives, mentor team members, and drive professional development initiatives.
Oversee resource management and allocation to optimize team productivity and effectively meet operation goals.

Security and Compliance:

Lead security incident management processes, focusing on identification, containment, and resolution of threats in the data center environment and GPU cloud hardware.
Enforce best practices for security and compliance.
Stay abreast of industry security trends and implement measures to safeguard customer data and platform integrity.

Skills for Success

Proven track record of managing and escalating complex cloud and data centre infrastructure issues and leading operation teams.
Experience in liquid cooling operations would be great
Strong understanding of hardware infrastructure operation, security, management, and best practices.
Excellent leadership, communication, and interpersonal skills, with the ability to lead cross-functional teams.
Proficiency in managing customer interactions and improving service delivery to enhance customer experience.
Experienced in Linux and hypervisor administration for GPU infrastructure and cloud.
Complex technical problem-solving with a proactive approach to system operation and optimization.
Knowledge of storage technologies and experience in capacity planning, troubleshooting, and data protection.
Experience in GPU and GPU infrastructure management, including configuration, monitoring, and performance.

Rewards that Go Beyond

Flexible work arrangements
Full suite of health and wellness benefits
Ongoing training and development programs
Internal mobility opportunities

Your Career Growth Starts Here. Apply Now!

More Info

Industry:Telecom /ISP

Function:Technology

Job Type:Permanent Job

Skills Required

Technical problem-solving

Hypervisor administration

Capacity Planning

Storage infrastructure

Date Posted: 16/11/2024

Job ID: 100504863

Report Job

About Company

SingTelJob Source: groupcareers.singtel.com

Singtel is Asia's leading communications technology group, providing a portfolio of services from next-generation communication, 5G and technology services to infotainment to both consumers and businesses. The Group has presence in Asia, Australia and Africa and reaches over 740 million mobile customers in 21 countries. Its infrastructure and technology services for businesses span 21 countries, with more than 428 direct points of presence in 362 cities.

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs