What is: Audiovisual SlowFast Network?
Source | Audiovisual SlowFast Networks for Video Recognition |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Audiovisual SlowFast Network, or AVSlowFast, is an architecture for integrated audiovisual perception. AVSlowFast has Slow and Fast visual pathways that are integrated with a Faster Audio pathway to model vision and sound in a unified representation. Audio and visual features are fused at multiple layers, enabling audio to contribute to the formation of hierarchical audiovisual concepts. To overcome training difficulties that arise from different learning dynamics for audio and visual modalities, DropPathway is used, which randomly drops the Audio pathway during training as an effective regularization technique. Inspired by prior studies in neuroscience, hierarchical audiovisual synchronization is performed to learn joint audiovisual features.