Product Reliability Engineer, Global Customer Service

Netflix   •  

Los Gatos, CA

Industry: Media

  •  

Not Specified years

Posted 28 days ago

It's an amazing time to be joining the Netflix team as we continue to transform entertainment around the world.  We are enabling people to have more control over what they watch and when they watch it. Our efforts have helped to grow Netflix to over 117 million members globally, and we won’t stop there.

That's where you come in. 
As a Product Reliability Engineer within the Customer Service org, you will apply systems analysis techniques to internal services at Netflix with the end goal of resolving product complexity and friction for our global customer base. From a technical perspective, this will require designing and documenting procedures, analysis, and testing techniques for the purpose of ensuring services are working optimally for our customers. This includes error and fault tracking, alerting, release and canary testing, and various other general reliability practices.

Separately, this role requires great communicators, storytellers, and advocates who can consult with users, technical support teams, product service teams, PM’s, and external companies who share integrated services with the Netflix product. Just as with the analysis skill set, your end goal with these groups is to facilitate and spearhead initiatives to resolve product complexity based on data informed decision making, frequently utilizing our robust Big Data engines or customer service sentiment to extract the necessary data.

Product Reliability Engineers within the CS org also play an essential role in product launches and sunsets, ensuring release canary success, reliability for internal CS services, and serving as a SME for technical integrations or technical service discussions.

Core Responsibilities:

  • Apply systems analysis techniques to internal services at Netflix including: writing SQL queries, establishing alerting around key performance indicators, and creating visualizations to establish clear baselines on product complexity and prioritization.
  • Consult with stakeholders including: technical support, product teams, PM’s, and external companies who integrate with our service to identify and remove friction and complexity within the product.
  • Design and document procedures, analysis, and testing techniques for the purpose of ensuring services are working optimally for our customers, including error and fault tracking, release and canary testing, and general reliability practices. 
  • Manage Netflix’s reaction to emergency events as a first-responder during a crisis.
  • Provide subject matter expertise to CS operations groups as a technical liaison with product teams. 
  • Advocating for consumers with product teams as a SME for CS, assisting on a variety of projects from implementation & design to advising on new innovation efforts and testing.

Qualifications and Expectations:

  • Thrive in an innovative culture where autonomy is necessary, and communication is paramount. 
  • Intermediate to advanced SQL and Hive knowledge and skill extracting data and navigating datasets. 
  • Excellent systems analytical skills, including the ability to pull data directly, analyze data sets, and identify key insights to drive business decisions.
  • Proven record of managing and prioritizing complex technical projects, especially under tight deadlines or during emergency events.
  • BA/BS degree in Computer Science, Computer Engineering, or relevant and proven industry experience around case analysis and data science.
  • Excellent oral and written communication skills and the ability to effectively communicate complex subjects to both technical and non-technical audiences at all levels. 
  • Ability to debate effectively while using data as the basis of your argument.

Nice to haves:

  • Working knowledge of other business analytics tools (e.g., Tableau and Business Objects)
  • Familiarity with issue management systems such as JIRA and Zendesk
  • Familiarity with crisis management, incident lead, or SRE training
  • Familiarity with AWS or cloud-based solutions
  • Familiarity with Python, Java, and other coding languages