Skip to content

streamingfast/evmxo

Repository files navigation

EVMX Optimizer

TorchServe server predicting the next sequence of requests for EVMX

Getting started

TorchServe requires a .mar file. Some versions of the models are already pre-packaged under model-store and are ready to be served (skip to Serving the model)

To run a new model, you'll first need to save it and package its artifacts.

Train your model using Pytorch and save its dict state and vocabulary:

model_name = 'my_model'
torch.save(model.state_dict(), f'state_dict_{my_model}.pt')

vocab_dict = {k: v for k, v in enumerate(vocab)}
filename = f'index_to_names/index_to_name-{model_name}.json'
with open(filename, 'w') as f:
    json.dump(vocab_dict, f, indent=2)

With Docker

Pull the image of latest TorchServe release

docker pull pytorch/torchserve

Documentation to run with GPU

Local development

Install Pytorch, TorchServe and its dependencies

Packaging the model artifacts

  1. Start container by sharing the model-store and serve directories
docker run --rm -it -p 8080:8080 -p 8081:8081 --name mar -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/serve:/home/model-server/serve -v $(pwd)/index_to_names:/home/model-server/index_to_names pytorch/torchserve:latest
  1. Bind and get the bash prompt of running container
docker exec -it mar /bin/bash

You will be landing at /home/model-server/

  1. Rename the index_to_name file
cp index_to_name_evmxo_large_bs8_sl32_emb128.json index_to_name.json
  1. Now Execute torch-model-archiver command, the MAR file will be created in model-store
torch-model-archiver --model-name lmmodel_large_bs8_sl32_emb128 --model-file serve/model_large_bs8_sl32_emb128.py --serialized-file model-store/state_dict_evmxo_large_bs8_sl32_emb128.pt --extra-files serve/evmxHandler.py,serve/base_model.py,index_to_names/index_to_name.json --handler serve/handler.py -v 1.0 --export-path model-store -f

Serving the model

Start container to serve all models available in model-store

docker run --rm \
    --shm-size=4g --ulimit memlock=-1 --ulimit stack=67108864 --cpus "2.0" \
    -p 8080:8080 -p 8081:8081 -p 8082:8082 \
    --name serve \
    -v $(pwd)/model-store:/home/model-server/model-store \
    -v $(pwd)/serve:/home/model-server/serve \
    pytorch/torchserve:latest \
    torchserve --start --ts-config /home/model-server/serve/config.properties

Try it out!

curl localhost:8080/predictions/lmmodel_large_bs8_sl32_emb128 \
    -H "Content-Type: application/json" \
    -d '{"data": ["KP:0x2d843a734b19e3060ad27f48460f80b9d97e4ae112317b193a5682a57113c", "KS:444", "KS:189", "KS:18a", "KS:602", "KS:18d", "KS:5c3"]}'