Pervformer _top_ Access

import torch import torch.nn as nn class PervasiveAttention(nn.Module): def (self, dim, num_probes=64): super(). init () self.num_probes = num_probes # Learnable latent probes (global memory) self.probes = nn.Parameter(torch.randn(1, num_probes, dim))

| Model | Something-Something V2 (Accuracy) | Kinetics-700 (FLOPS) | GPU Memory (128 frames) | | :--- | :--- | :--- | :--- | | TimeSformer | 62.5% | 1.9k G | 42 GB | | VideoMAE | 70.8% | 2.1k G | OOM (>80GB) | | | 74.2% | 980 G | 23 GB | pervformer

For automatic rotoscoping (cutting out a person from a video), previous models flickered when the person overlapped with a similar color background. PervFormer's pervasive attention keeps track of the person's identity across time, resulting in rock-solid masks. How to Implement (PyTorch Pseudo-Code) The core of PervFormer is surprisingly simple to integrate. Here is a minimal snippet showing the Pervasive Attention block: import torch import torch