DevOps Team Leader

Israel

About The Position

Zebra has set out on a mission to help hundreds of millions of people receive access to fast, accurate medical diagnosis, by teaching computers to read and diagnose medical imaging data.

Zebra Medical vision is revolutionizing healthcare by creating an AI-based radiologist. Our product is a scalable analytic engine, which uses deep learning algorithms to detect anomalies in medical images, at global scale.

We are looking for a Dev. Ops team leader, to establish the DevOps and System IT domain in the company - and bring harmony and productivity to our development, research, and Production sites.

Technologies: CephFS, MaaS, Docker, Rancher, Kubernetes, GPU, GitLab, Prometheus, Node.js, Mongo DB, Postgress, ElasticSearch, Kibana, Redash, TensorFlow, Python, Node.JS and more….

Responsibilities

  • Build a team of DevOps and system engineers
  • Manage the data production environment - where we store all our data and train our algorithms : fully containerized environment, with PB+ storage, Ceph FS cluster, tens of GPU’s, thousands of cores, many DB’s and more…
  • Make our environment scalable, our apps and systems available, our provisioning self-service, and data read/write blazing fast.
  • Shorten time from Dev. to Production - turn every commit into a fully is packaged app, and with one click deployed to production.
  • Scale our Machine learning research (“ResearchOps”) - maximize our capacity to run experiments and train on huge amounts of data, while keeping all algorithms results traceable in dashboards.
  • Create robust production environments - deployment, monitoring, alerting, logging and other tooling - both on cloud and on premise.

Requirements

  • 4+ years as DevOps in a modern software company
  • 2+ years as team leader
  • Fluent in Linux
  • 1+ years with Docker in production
  • Experience building/supporting a large distributed production environment
  • Deep understanding in hardware - from network equipment to storage devices
  • Enjoys the combination between reactive / supporting users, with taking initiative, and proactively defining and executing solutions to bottlenecks and gaps.

Skills

Anything from our technology list is a big plus:

CephFS, MaaS, Docker, Rancher, Kubernetes, GPU, GitLab, Prometheus, Node.js, Mongo DB, Postgress, ElasticSearch, Kibana, Redash, TensorFlow, Python, Node.JS and more….

Apply for this position