**D4PG**, or **Distributed Distributional DDPG**, is a policy gradient algorithm that extends upon the [DDPG](https://paperswithcode.com/method/ddpg). The improvements include a distributional updates to the DDPG algorithm, combined with the use of multiple distributed workers all writing into the same replay table. The biggest performance gain of other simpler changes was the use of $N$-step returns. The authors found that the use of [prioritized experience replay](https://paperswithcode.com/method/prioritized-experience-replay) was less crucial to the overall D4PG algorithm especially on harder problems.

**MEUZZ** is a machine learning-based hybrid fuzzer which employs supervised machine learning for adaptive and generalizable seed scheduling -- a prominent factor in determining the yields of hybrid fuzzing. MEUZZ determines which new seeds are expected to produce better fuzzing yields based on the knowledge learned from past seed scheduling decisions made on the same or similar programs. MEUZZ's learning is based on a series of features extracted via code reachability and dynamic analysis, which incurs negligible runtime overhead (in microseconds). Moreover, MEUZZ automatically infers the data labels by evaluating the fuzzing performance of each selected seed.

MEUZZ

MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

D4PG

Distributed Distributional Deterministic Policy Gradients

**Graph Self-Attention (GSA)** is a self-attention module used in the [BP-Transformer](https://paperswithcode.com/method/bp-transformer) architecture, and is based on the [graph attentional layer](https://paperswithcode.com/method/graph-attentional-layer).

For a given node $u$, we update its representation according to its neighbour nodes, formulated as $\mathbf{h}\_{u} \leftarrow \text{GSA}\left(\mathcal{G}, \mathbf{h}^{u}\right)$.

Let $\mathbf{A}\left(u\right)$ denote the set of the neighbour nodes of $u$ in $\mathcal{G}$, $\text{GSA}\left(\mathcal{G}, \mathbf{h}^{u}\right)$ is detailed as follows:

$$ \mathbf{A}^{u} = \text{concat}\left(\{\mathbf{h}\_{v} | v \in \mathcal{A}\left(u\right)\}\right) $$

$$ \mathbf{Q}^{u}\_{i} = \mathbf{H}\_{k}\mathbf{W}^{Q}\_{i},\mathbf{K}\_{i}^{u} = \mathbf{A}^{u}\mathbf{W}^{K}\_{i},\mathbf{V}^{u}\_{i} = \mathbf{A}^{u}\mathbf{W}\_{i}^{V} $$

$$ \text{head}^{u}\_{i} = \text{softmax}\left(\frac{\mathbf{Q}^{u}\_{i}\mathbf{K}\_{i}^{uT}}{\sqrt{d}}\right)\mathbf{V}\_{i}^{u} $$

$$ \text{GSA}\left(\mathcal{G}, \mathbf{h}^{u}\right) = \left[\text{head}^{u}\_{1}, \dots, \text{head}^{u}\_{h}\right]\mathbf{W}^{O}$$

where d is the dimension of h, and $\mathbf{W}^{Q}\_{i}$, $\mathbf{W}^{K}\_{i}$ and $\mathbf{W}^{V}\_{i}$ are trainable parameters of the $i$-th attention head.

Source	Distributed Distributional Deterministic Policy Gradients
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Distributed Distributional DDPG?

Viet-Anh on Software