Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications.

Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications.

Senocak, Arda;Oh, Tae-Hyun;Kim, Junsik;Yang, Ming-Hsuan;Kweon, In So;
ieee transactions on pattern analysis and machine intelligence 2019
246
senocak2019learningieee

Abstract

Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To investigate its empirical learnability, in this work we first present a novel unsupervised algorithm to address the problem of localizing sound sources in visual scenes. In order to achieve this goal, a two-stream network structure which handles each modality, with attention mechanism is developed for sound source localization. The network naturally reveals the localized response in the scene without human annotation. In addition, a new sound source dataset is developed for performance evaluation. Nevertheless, our empirical evaluation shows that the unsupervised method generates false conclusions in some cases. Thereby, we show that this false conclusion cannot be fixed without human prior knowledge due to the well-known correlation and causality mismatch misconception. We show that the false conclusion can be effectively corrected even with a small amount of supervision, i.e., semi-supervised setup. We present the versatility of the learned audio and visual embeddings on the cross-modal content alignment and we incorporate this proposed algorithm into sound saliency based automatic camera view panning in 360 degree videos.

Citation

ID: 66015
Ref Key: senocak2019learningieee
Use this key to autocite in SciMatic or Thesis Manager

References

Blockchain Verification

Account:
NFT Contract Address:
0x95644003c57E6F55A65596E3D9Eac6813e3566dA
Article ID:
66015
Unique Identifier:
10.1109/TPAMI.2019.2952095
Network:
Scimatic Chain (ID: 481)
Loading...
Blockchain Readiness Checklist
Authors
Abstract
Journal Name
Year
Title
5/5
Creates 1,000,000 NFT tokens for this article
Token Features:
  • ERC-1155 Standard NFT
  • 1 Million Supply per Article
  • Transferable via MetaMask
  • Permanent Blockchain Record
Blockchain QR Code
Scan with Saymatik Web3.0 Wallet

Saymatik Web3.0 Wallet