TorchServe server predicting the next sequence of requests for EVMX
TorchServe requires a .mar
file. Some versions of the models are already pre-packaged under model-store
and are ready to be served (skip to Serving the model
)
To run a new model, you'll first need to save it and package its artifacts.
Train your model using Pytorch and save its dict state and vocabulary:
model_name = 'my_model'
torch.save(model.state_dict(), f'state_dict_{my_model}.pt')
vocab_dict = {k: v for k, v in enumerate(vocab)}
filename = f'index_to_names/index_to_name-{model_name}.json'
with open(filename, 'w') as f:
json.dump(vocab_dict, f, indent=2)
Pull the image of latest TorchServe release
docker pull pytorch/torchserve
Documentation to run with GPU
Install Pytorch, TorchServe and its dependencies
- Create a model file including the model architecture (see model_large_bs8_sl32_emb128.py)
- Create a
.mar
file with the model
- Start container by sharing the
model-store
andserve
directories
docker run --rm -it -p 8080:8080 -p 8081:8081 --name mar -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/serve:/home/model-server/serve -v $(pwd)/index_to_names:/home/model-server/index_to_names pytorch/torchserve:latest
- Bind and get the bash prompt of running container
docker exec -it mar /bin/bash
You will be landing at /home/model-server/
- Rename the
index_to_name
file
cp index_to_name_evmxo_large_bs8_sl32_emb128.json index_to_name.json
- Now Execute
torch-model-archiver
command, the MAR file will be created inmodel-store
torch-model-archiver --model-name lmmodel_large_bs8_sl32_emb128 --model-file serve/model_large_bs8_sl32_emb128.py --serialized-file model-store/state_dict_evmxo_large_bs8_sl32_emb128.pt --extra-files serve/evmxHandler.py,serve/base_model.py,index_to_names/index_to_name.json --handler serve/handler.py -v 1.0 --export-path model-store -f
Start container to serve all models available in model-store
docker run --rm \
--shm-size=4g --ulimit memlock=-1 --ulimit stack=67108864 --cpus "2.0" \
-p 8080:8080 -p 8081:8081 -p 8082:8082 \
--name serve \
-v $(pwd)/model-store:/home/model-server/model-store \
-v $(pwd)/serve:/home/model-server/serve \
pytorch/torchserve:latest \
torchserve --start --ts-config /home/model-server/serve/config.properties
Try it out!
curl localhost:8080/predictions/lmmodel_large_bs8_sl32_emb128 \
-H "Content-Type: application/json" \
-d '{"data": ["KP:0x2d843a734b19e3060ad27f48460f80b9d97e4ae112317b193a5682a57113c", "KS:444", "KS:189", "KS:18a", "KS:602", "KS:18d", "KS:5c3"]}'