LLMOps: Quantization models & Inference ONNX Generative Runtime  #datascience  #machinelearning

LLMOps: Quantization models & Inference ONNX Generative Runtime #datascience #machinelearning

The Machine Learning Engineer

3 месяца назад

143 Просмотров

In this video I will show you how to install onnx runtime GPU support and do inference with a generative Model. We will use a Phi3-mini-4k quantized to 4int.
After that we will convert an original Phi3-mini-128k into a 4int quantized version with the runtime onnx

Notebook:
https://github.com/olonok69/LLM_Notebooks/blob/main/onnx/Phi3__ONNX_gpu.ipynb

Тэги:

#ONNX #Generative_ai #LLMs #GPU #Microsoft_Phi3_instruct #Machine_Learning #Deep_Learning #Python
Ссылки и html тэги не поддерживаются


Комментарии: