secml_malware.models package
Subpackages
Submodules
secml_malware.models.basee2e module
- class secml_malware.models.basee2e.End2EndModel(embedding_size: int, max_input_size: int, embedding_value: int, shift_values: bool)
Bases:
torch.nn.modules.module.Module
- abstract classmethod bytes_to_numpy(bytez: bytes, max_length: int, padding_value: int, shift_values: bool) numpy.ndarray
It creates a numpy array from bare bytes. The vector is max_length long.
- Parameters
bytez (bytes) – byte string containing the sample
max_length (int) – the max input size of the network
padding_value (int) – the value used as padding
shift_values (bool) – True if values are shifted by one
- Returns
the sample as numpy array, cropped or padded
- Return type
numpy array
- compute_embedding_gradient(numpy_x: numpy.ndarray) torch.Tensor
It computes the gradient w.r.t. the embedding layer.
- Parameters
numpy_x (numpy array) – the numpy array containing a sample
- Returns
the gradient w.r.t. the embedding layer
- Return type
torch.Tensor
- abstract embed(input_x, transpose=True)
It embeds an input vector into MalConv embedded representation.
- abstract embedd_and_forward(x: torch.Tensor) torch.Tensor
Compute the embedding for sample x and returns the prediction.
- Parameters
x (torch.Tensor) – the sample as torch tensor
- Returns
the result of the forward pass
- Return type
torch.Tensor
- forward(x: torch.Tensor) torch.Tensor
Forward pass.
- Parameters
x (torch.Tensor) – the sample to test
- Returns
the result of the forward pass
- Return type
torch.Tensor
- abstract classmethod list_to_numpy(x, max_length, padding_value, shift_values)
It creates a numpy array from bare bytes. The vector is max_length long.
- abstract classmethod load_sample_from_file(path, max_length, has_shape, padding_value, shift_values)
It creates a numpy array containing a sample. The vector is max_length long. If shape is true, then the path is supposed to be a (1,1) matrix. Hence, item() is called.
- load_simplified_model(path: str)
Load the model weights.
- Parameters
path (str) – the path to the model
- classmethod path_to_exe_vector(path: str, max_length: int, padding_value: int, shift_values: bool) numpy.ndarray
Creates a numpy array from the bytes contained in file
- Parameters
path (str) – the path of the file
max_length (int) – the max input size of the network
padding_value (int) – the value used as padding
shift_values (bool) – True if values are shifted by one
- Returns
the sample as numpy array, cropped or padded
- Return type
numpy array
- training: bool
secml_malware.models.c_classifier_ember module
- class secml_malware.models.c_classifier_ember.CClassifierEmber(tree_path: Optional[str] = None)
Bases:
secml.ml.classifiers.c_classifier.CClassifier
The wrapper for the EMBER GBDT, by Anderson et al. https://arxiv.org/abs/1804.04637
- extract_features(x: secml.array.c_array.CArray) secml.array.c_array.CArray
Extract EMBER features
- Parameters
x (CArray) – program sample
- Returns
EMBER features
- Return type
CArray
secml_malware.models.c_classifier_end2end_malware module
- class secml_malware.models.c_classifier_end2end_malware.CClassifierEnd2EndMalware(model: secml_malware.models.basee2e.End2EndModel, epochs=100, batch_size=256, train_transform=None, preprocess=None, softmax_outputs=False, random_state=None, plus_version=False, input_shape=(1, 1048576), verbose=0)
Bases:
secml.ml.classifiers.pytorch.c_classifier_pytorch.CClassifierPyTorch
- compute_embedding_gradient(x: secml.array.c_array.CArray, penalty_term: torch.Tensor)
Compute the gradient w.r.t. embedding layer.
- Parameters
x (CArray) – point where gradient will be computed
penalty_term (float) – the penalty term
- Returns
the gradient w.r.t. the embedding
- Return type
CArray
- embed(x: secml.array.c_array.CArray, transpose: bool = True)
Embed the sample inside the embedding space
- Parameters
x (CArray) – the sample to embed
transpose (bool, optional, default True) – set True to return the transposed feature space vector
- Returns
the embedded vector
- Return type
torch.Tensor
- embedding_predict(x)
Embed the sample and produce prediction.
- Parameters
x (CArray) – the input sample
- Returns
the malware score
- Return type
float
- get_embedding_size()
Get the embedding space dimensionality
- Returns
the dimensionality of the embedding space
- Return type
int
- get_embedding_value()
Get the value used as padding
- Returns
a value that is used for padding the sample
- Return type
int
- get_input_max_length()
Get the input window length
- Returns
the window input length
- Return type
int
- get_is_shifting_values()
Get if the model shifts the values by one
- Returns
return if the values are shifted by one
- Return type
bool
- gradient(x, w=None)
Compute gradient at x by doing a forward and a backward pass.
The gradient is pre-multiplied by w.
- gradient_f_x(x, **kwargs)
Returns the gradient of the function on point x.
- Parameters
point (x {CArray} -- The) –
- Raises
NotImplementedError – Model do not support gradient
- Returns
CArray – the gradient computed on x
- load_pretrained_model(path: Optional[str] = None)
Load pretrained model weights
- Parameters
path (str, optional, default None) – The path of the model, default is None, and it will load the internal default one
secml_malware.models.c_classifier_sorel_net module
- class secml_malware.models.c_classifier_sorel_net.CClassifierSorel(model_path, use_malware=True, use_counts=True, use_tags=True, n_tags=11, feature_dimension=2381, layer_sizes=None)
Bases:
secml.ml.classifiers.c_classifier.CClassifier
- extract_features(x: secml.array.c_array.CArray)
- load_model(model_path)
- class secml_malware.models.c_classifier_sorel_net.SorelNet(use_malware=True, use_counts=True, use_tags=True, n_tags=None, feature_dimension=2381, layer_sizes=None)
Bases:
torch.nn.modules.module.Module
This is a simple network loosely based on the one used in ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation (https://arxiv.org/abs/1903.05700) Note that it uses fewer (and smaller) layers, as well as a single layer for all tag predictions, performance will suffer accordingly.
- forward(data)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool
secml_malware.models.malconv module
Malware Detection by Eating a Whole EXE Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, Charles Nicholas https://arxiv.org/abs/1710.09435
- class secml_malware.models.malconv.MalConv(pretrained_path=None, embedding_size=8, max_input_size=1048576)
Bases:
secml_malware.models.basee2e.End2EndModel
Architecture implementation.
- embed(input_x, transpose=True)
It embeds an input vector into MalConv embedded representation.
- embedd_and_forward(x)
Compute the embedding for sample x and returns the prediction.
- Parameters
x (torch.Tensor) – the sample as torch tensor
- Returns
the result of the forward pass
- Return type
torch.Tensor
- training: bool