Quantcast
Viewing all articles
Browse latest Browse all 2810

Ubuntu 22.04: Running llama-2-13b for inferencing in Windows 11 WSL2 resulted in Killed error

I've posted my question at Facebook Research, but got no solutions from the community that work.

https://github.com/facebookresearch/llama/issues/936

I've also posted my question at Nvidia, but got no answers.

https://forums.developer.nvidia.com/t/running-llama-2-13b-for-inferencing-in-windows-11-wsl2-resulted-in-killed-gpu-barely-used/277074

I'm using Jammy on WSL2 on Windows 11. This is my run.py code:

import torchimport transformersimport requestsprint(torch.cuda.is_available())device = torch.device("cuda" if torch.cuda.is_available() else "cpu")# Load model and adapter weights from local directorymodel = transformers.AutoModelForCausalLM.from_pretrained("/home/maxloo/src/pastoring/llama/llama-2-13b")model.to(device)adapter = transformers.AutoModelForCausalLM.from_pretrained("/home/maxloo/src/pastoring/adapter", config=transformers.configuration.AdapterConfig.from_json_file("adapter_config.json"))model.load_state_dict(adapter.state_dict())adapter.load_state_dict(model.state_dict())# Define promptprompt = "Hello, I am a chatbot."# Perform inferenceresponse = model.generate(prompt, max_length=50)# Print responseprint(response)

When I use Bash to run: python3 run.py

I expect a chat message to be displayed and a prompt for my chat input, but this is the actual output:

Killed

Could someone please help with this error?


Viewing all articles
Browse latest Browse all 2810

Trending Articles