secml_malware.models package

Subpackages

secml_malware.models.tests package

Submodules

secml_malware.models.basee2e module

class secml_malware.models.basee2e.End2EndModel(embedding_size: int, max_input_size: int, embedding_value: int, shift_values: bool)

Bases: torch.nn.modules.module.Module

abstract classmethod bytes_to_numpy(bytez: bytes, max_length: int, padding_value: int, shift_values: bool) → numpy.ndarray

It creates a numpy array from bare bytes. The vector is max_length long.

Parameters

bytez (bytes) – byte string containing the sample
max_length (int) – the max input size of the network
padding_value (int) – the value used as padding
shift_values (bool) – True if values are shifted by one

Returns

the sample as numpy array, cropped or padded

Return type

numpy array

compute_embedding_gradient(numpy_x: numpy.ndarray) → torch.Tensor

It computes the gradient w.r.t. the embedding layer.

Parameters: numpy_x (numpy array) – the numpy array containing a sample
Returns: the gradient w.r.t. the embedding layer
Return type: torch.Tensor

abstract embed(input_x, transpose=True): It embeds an input vector into MalConv embedded representation.

abstract embedd_and_forward(x: torch.Tensor) → torch.Tensor

Compute the embedding for sample x and returns the prediction.

Parameters: x (torch.Tensor) – the sample as torch tensor
Returns: the result of the forward pass
Return type: torch.Tensor

forward(x: torch.Tensor) → torch.Tensor

Forward pass.

Parameters: x (torch.Tensor) – the sample to test
Returns: the result of the forward pass
Return type: torch.Tensor

abstract classmethod list_to_numpy(x, max_length, padding_value, shift_values): It creates a numpy array from bare bytes. The vector is max_length long.

abstract classmethod load_sample_from_file(path, max_length, has_shape, padding_value, shift_values): It creates a numpy array containing a sample. The vector is max_length long. If shape is true, then the path is supposed to be a (1,1) matrix. Hence, item() is called.

load_simplified_model(path: str)

Load the model weights.

Parameters: path (str) – the path to the model

classmethod path_to_exe_vector(path: str, max_length: int, padding_value: int, shift_values: bool) → numpy.ndarray

Creates a numpy array from the bytes contained in file

Parameters

path (str) – the path of the file
max_length (int) – the max input size of the network
padding_value (int) – the value used as padding
shift_values (bool) – True if values are shifted by one

Returns

the sample as numpy array, cropped or padded

Return type

numpy array

training: bool

secml_malware.models.c_classifier_ember module

class secml_malware.models.c_classifier_ember.CClassifierEmber(tree_path: Optional[str] = None)

Bases: secml.ml.classifiers.c_classifier.CClassifier

The wrapper for the EMBER GBDT, by Anderson et al. https://arxiv.org/abs/1804.04637

extract_features(x: secml.array.c_array.CArray) → secml.array.c_array.CArray

Extract EMBER features

Parameters: x (CArray) – program sample
Returns: EMBER features
Return type: CArray

secml_malware.models.c_classifier_end2end_malware module

class secml_malware.models.c_classifier_end2end_malware.CClassifierEnd2EndMalware(model: secml_malware.models.basee2e.End2EndModel, epochs=100, batch_size=256, train_transform=None, preprocess=None, softmax_outputs=False, random_state=None, plus_version=False, input_shape=(1, 1048576), verbose=0)

Bases: secml.ml.classifiers.pytorch.c_classifier_pytorch.CClassifierPyTorch

compute_embedding_gradient(x: secml.array.c_array.CArray, penalty_term: torch.Tensor)

Compute the gradient w.r.t. embedding layer.

Parameters

x (CArray) – point where gradient will be computed
penalty_term (float) – the penalty term

Returns

the gradient w.r.t. the embedding

Return type

CArray

embed(x: secml.array.c_array.CArray, transpose: bool = True)

Embed the sample inside the embedding space

Parameters

x (CArray) – the sample to embed
transpose (bool, optional, default True) – set True to return the transposed feature space vector

Returns

the embedded vector

Return type

torch.Tensor

embedding_predict(x)

Embed the sample and produce prediction.

Parameters: x (CArray) – the input sample
Returns: the malware score
Return type: float

get_embedding_size()

Get the embedding space dimensionality

Returns: the dimensionality of the embedding space
Return type: int

get_embedding_value()

Get the value used as padding

Returns: a value that is used for padding the sample
Return type: int

get_input_max_length()

Get the input window length

Returns: the window input length
Return type: int

get_is_shifting_values()

Get if the model shifts the values by one

Returns: return if the values are shifted by one
Return type: bool

gradient(x, w=None)

Compute gradient at x by doing a forward and a backward pass.

The gradient is pre-multiplied by w.

gradient_f_x(x, **kwargs)

Returns the gradient of the function on point x.

Parameters: point (x {CArray} -- The) –
Raises: NotImplementedError – Model do not support gradient
Returns: CArray – the gradient computed on x

load_pretrained_model(path: Optional[str] = None)

Load pretrained model weights

Parameters: path (str, optional, default None) – The path of the model, default is None, and it will load the internal default one

secml_malware.models.c_classifier_sorel_net module

class secml_malware.models.c_classifier_sorel_net.CClassifierSorel(model_path, use_malware=True, use_counts=True, use_tags=True, n_tags=11, feature_dimension=2381, layer_sizes=None)

Bases: secml.ml.classifiers.c_classifier.CClassifier

extract_features(x: secml.array.c_array.CArray)

load_model(model_path)

class secml_malware.models.c_classifier_sorel_net.SorelNet(use_malware=True, use_counts=True, use_tags=True, n_tags=None, feature_dimension=2381, layer_sizes=None)

Bases: torch.nn.modules.module.Module

This is a simple network loosely based on the one used in ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation (https://arxiv.org/abs/1903.05700) Note that it uses fewer (and smaller) layers, as well as a single layer for all tag predictions, performance will suffer accordingly.

forward(data)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

secml_malware.models.malconv module

Malware Detection by Eating a Whole EXE Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, Charles Nicholas https://arxiv.org/abs/1710.09435

class secml_malware.models.malconv.MalConv(pretrained_path=None, embedding_size=8, max_input_size=1048576)

Bases: secml_malware.models.basee2e.End2EndModel

Architecture implementation.

embed(input_x, transpose=True): It embeds an input vector into MalConv embedded representation.

embedd_and_forward(x)

Compute the embedding for sample x and returns the prediction.

Parameters: x (torch.Tensor) – the sample as torch tensor
Returns: the result of the forward pass
Return type: torch.Tensor

training: bool

secml_malware.models package

Subpackages

Submodules

secml_malware.models.basee2e module

secml_malware.models.c_classifier_ember module

secml_malware.models.c_classifier_end2end_malware module

secml_malware.models.c_classifier_sorel_net module

secml_malware.models.malconv module

Module contents