MLOPs Architect Job at Quantum World Technologies Inc., Dallas, TX

THdiQmI2MmVtNW44My9Wa3J0QW1Kem5sQUE9PQ==
  • Quantum World Technologies Inc.
  • Dallas, TX

Job Description

We are seeking an experienced MLOps/LLMOps Architect with deep expertise in building scalable, production-grade machine learning and generative AI pipelines. The ideal candidate will have strong experience in observability, model performance monitoring , and using tools such as Arize to ensure the reliability and scalability of AI/ML models in production. This role requires a strategic mindset and hands-on technical skills to design and implement robust MLOps/LLMOps frameworks, ensuring seamless model deployment, monitoring, and optimization.

Key Responsibilities :

  • Architect and Implement MLOps/LLMOps Frameworks:
  • Design and build scalable MLOps/LLMOps pipelines for model training, deployment, monitoring, and retraining.
  • Establish automated CI/CD pipelines to streamline model development and deployment.
  • Model Observability and Monitoring:
  • Develop and implement model observability strategies using tools like Arize to track model performance, drift, and bias.
  • Create real-time dashboards and alerts for proactive issue identification and resolution.
  • Performance and Scalability:
  • Ensure high availability, low latency, and scalability of deployed models.
  • Optimize model inference and serving using best practices in distributed computing and cloud infrastructure.
  • Manage and optimize compute costs for large-scale Gen AI models by implementing intelligent load balancing, autoscaling, and infrastructure tuning.
  • Model Governance and Compliance:
  • Establish frameworks for model versioning, auditing, and explainability to meet regulatory and business requirements.
  • Ensure alignment with Responsible AI and ethical AI guidelines.
  • Cross-Functional Collaboration:
  • Partner with data scientists, ML engineers, platform teams, and business stakeholders to align MLOps strategies with business objectives.
  • Provide technical leadership and mentorship to junior team members.

Required Skills and Qualifications :

  • Experience: 10 to 15 years of experience in machine learning, MLOps, and AI model deployment in enterprise environments.
  • MLOps/LLMOps Expertise: Strong background in MLOps and LLMOps, including model lifecycle management, monitoring, and automation.
  • Observability Tools: Proficient in using observability platforms such as Arize, Weights & Biases, TensorBoard, MLflow , or similar tools.
  • Cloud Platforms: Experience with cloud-based ML solutions (e.g., AWS, Azure, GCP ).
  • Programming: Strong programming skills in Python and experience with ML frameworks such as TensorFlow, PyTorch, and Hugging Face .
  • Containerization and Orchestration: Hands-on experience with Docker, Kubernetes , and distributed computing frameworks.
  • Model Monitoring: Experience in detecting and mitigating model drift, bias, and data quality issues.
  • Performance Tuning: Expertise in model optimization, inference acceleration, and efficient resource utilization.

Job Tags

Similar Jobs