secml_malware.models package

Subpackages

Submodules

secml_malware.models.basee2e module

class secml_malware.models.basee2e.End2EndModel(embedding_size: int, max_input_size: int, embedding_value: int, shift_values: bool)

Bases: torch.nn.modules.module.Module

abstract classmethod bytes_to_numpy(bytez: bytes, max_length: int, padding_value: int, shift_values: bool) numpy.ndarray

It creates a numpy array from bare bytes. The vector is max_length long.

Parameters
  • bytez (bytes) – byte string containing the sample

  • max_length (int) – the max input size of the network

  • padding_value (int) – the value used as padding

  • shift_values (bool) – True if values are shifted by one

Returns

the sample as numpy array, cropped or padded

Return type

numpy array

compute_embedding_gradient(numpy_x: numpy.ndarray) torch.Tensor

It computes the gradient w.r.t. the embedding layer.

Parameters

numpy_x (numpy array) – the numpy array containing a sample

Returns

the gradient w.r.t. the embedding layer

Return type

torch.Tensor

abstract embed(input_x, transpose=True)

It embeds an input vector into MalConv embedded representation.

abstract embedd_and_forward(x: torch.Tensor) torch.Tensor

Compute the embedding for sample x and returns the prediction.

Parameters

x (torch.Tensor) – the sample as torch tensor

Returns

the result of the forward pass

Return type

torch.Tensor

forward(x: torch.Tensor) torch.Tensor

Forward pass.

Parameters

x (torch.Tensor) – the sample to test

Returns

the result of the forward pass

Return type

torch.Tensor

abstract classmethod list_to_numpy(x, max_length, padding_value, shift_values)

It creates a numpy array from bare bytes. The vector is max_length long.

abstract classmethod load_sample_from_file(path, max_length, has_shape, padding_value, shift_values)

It creates a numpy array containing a sample. The vector is max_length long. If shape is true, then the path is supposed to be a (1,1) matrix. Hence, item() is called.

load_simplified_model(path: str)

Load the model weights.

Parameters

path (str) – the path to the model

classmethod path_to_exe_vector(path: str, max_length: int, padding_value: int, shift_values: bool) numpy.ndarray

Creates a numpy array from the bytes contained in file

Parameters
  • path (str) – the path of the file

  • max_length (int) – the max input size of the network

  • padding_value (int) – the value used as padding

  • shift_values (bool) – True if values are shifted by one

Returns

the sample as numpy array, cropped or padded

Return type

numpy array

training: bool

secml_malware.models.c_classifier_ember module

class secml_malware.models.c_classifier_ember.CClassifierEmber(tree_path: Optional[str] = None)

Bases: secml.ml.classifiers.c_classifier.CClassifier

The wrapper for the EMBER GBDT, by Anderson et al. https://arxiv.org/abs/1804.04637

extract_features(x: secml.array.c_array.CArray) secml.array.c_array.CArray

Extract EMBER features

Parameters

x (CArray) – program sample

Returns

EMBER features

Return type

CArray

secml_malware.models.c_classifier_end2end_malware module

class secml_malware.models.c_classifier_end2end_malware.CClassifierEnd2EndMalware(model: secml_malware.models.basee2e.End2EndModel, epochs=100, batch_size=256, train_transform=None, preprocess=None, softmax_outputs=False, random_state=None, plus_version=False, input_shape=(1, 1048576), verbose=0)

Bases: secml.ml.classifiers.pytorch.c_classifier_pytorch.CClassifierPyTorch

compute_embedding_gradient(x: secml.array.c_array.CArray, penalty_term: torch.Tensor)

Compute the gradient w.r.t. embedding layer.

Parameters
  • x (CArray) – point where gradient will be computed

  • penalty_term (float) – the penalty term

Returns

the gradient w.r.t. the embedding

Return type

CArray

embed(x: secml.array.c_array.CArray, transpose: bool = True)

Embed the sample inside the embedding space

Parameters
  • x (CArray) – the sample to embed

  • transpose (bool, optional, default True) – set True to return the transposed feature space vector

Returns

the embedded vector

Return type

torch.Tensor

embedding_predict(x)

Embed the sample and produce prediction.

Parameters

x (CArray) – the input sample

Returns

the malware score

Return type

float

get_embedding_size()

Get the embedding space dimensionality

Returns

the dimensionality of the embedding space

Return type

int

get_embedding_value()

Get the value used as padding

Returns

a value that is used for padding the sample

Return type

int

get_input_max_length()

Get the input window length

Returns

the window input length

Return type

int

get_is_shifting_values()

Get if the model shifts the values by one

Returns

return if the values are shifted by one

Return type

bool

gradient(x, w=None)

Compute gradient at x by doing a forward and a backward pass.

The gradient is pre-multiplied by w.

gradient_f_x(x, **kwargs)

Returns the gradient of the function on point x.

Parameters

point (x {CArray} -- The) –

Raises

NotImplementedError – Model do not support gradient

Returns

CArray – the gradient computed on x

load_pretrained_model(path: Optional[str] = None)

Load pretrained model weights

Parameters

path (str, optional, default None) – The path of the model, default is None, and it will load the internal default one

secml_malware.models.c_classifier_sorel_net module

class secml_malware.models.c_classifier_sorel_net.CClassifierSorel(model_path, use_malware=True, use_counts=True, use_tags=True, n_tags=11, feature_dimension=2381, layer_sizes=None)

Bases: secml.ml.classifiers.c_classifier.CClassifier

extract_features(x: secml.array.c_array.CArray)
load_model(model_path)
class secml_malware.models.c_classifier_sorel_net.SorelNet(use_malware=True, use_counts=True, use_tags=True, n_tags=None, feature_dimension=2381, layer_sizes=None)

Bases: torch.nn.modules.module.Module

This is a simple network loosely based on the one used in ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation (https://arxiv.org/abs/1903.05700) Note that it uses fewer (and smaller) layers, as well as a single layer for all tag predictions, performance will suffer accordingly.

forward(data)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

secml_malware.models.malconv module

Malware Detection by Eating a Whole EXE Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, Charles Nicholas https://arxiv.org/abs/1710.09435

class secml_malware.models.malconv.MalConv(pretrained_path=None, embedding_size=8, max_input_size=1048576)

Bases: secml_malware.models.basee2e.End2EndModel

Architecture implementation.

embed(input_x, transpose=True)

It embeds an input vector into MalConv embedded representation.

embedd_and_forward(x)

Compute the embedding for sample x and returns the prediction.

Parameters

x (torch.Tensor) – the sample as torch tensor

Returns

the result of the forward pass

Return type

torch.Tensor

training: bool

Module contents