A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Abigail Copiaco; Christian Ritz; Nidhal Abdulaziz; Stefano Fasciani; Copiaco, Abigail; Ritz, Christian; Abdulaziz, Nidhal; Fasciani, Stefano

A Study of Features and Deep Neural Network Architectures and Hyper-Parameters for Domestic Audio Classification

Abigail Copiaco;Christian Ritz;Nidhal Abdulaziz;Stefano Fasciani;Copiaco, Abigail;Ritz, Christian;Abdulaziz, Nidhal;Fasciani, Stefano;

applied sciences 2021 Vol. 11 pp. 4880-

151

copiaco2021applieda

Abstract

Recent methodologies for audio classification frequently involve cepstral and spectral features, applied to single channel recordings of acoustic scenes and events. Further, the concept of transfer learning has been widely used over the years, and has proven to provide an efficient alternative to training neural networks from scratch. The lower time and resource requirements when using pre-trained models allows for more versatility in developing system classification approaches. However, information on classification performance when using different features for multi-channel recordings is often limited. Furthermore, pre-trained networks are initially trained on bigger databases and are often unnecessarily large. This poses a challenge when developing systems for devices with limited computational resources, such as mobile or embedded devices. This paper presents a detailed study of the most apparent and widely-used cepstral and spectral features for multi-channel audio applications. Accordingly, we propose the use of spectro-temporal features. Additionally, the paper details the development of a compact version of the AlexNet model for computationally-limited platforms through studies of performances against various architectural and parameter modifications of the original network. The aim is to minimize the network size while maintaining the series network architecture and preserving the classification accuracy. Considering that other state-of-the-art compact networks present complex directed acyclic graphs, a series architecture proposes an advantage in customizability. Experimentation was carried out through Matlab, using a database that we have generated for this task, which composes of four-channel synthetic recordings of both sound events and scenes. The top performing methodology resulted in a weighted F1-score of 87.92% for scalogram features classified via the modified AlexNet-33 network, which has a size of 14.33 MB. The AlexNet network returned 86.24% at a size of 222.71 MB.

Keywords

Transfer learning Neural network scalograms mfcc log-mel pre-trained models

Access

DOI:

10.3390/app11114880

URL:

https://www.mdpi.com/2076-3417/11/11/4880

Citation

ID: 269503

Ref Key: copiaco2021applieda

Use this key to autocite in SciMatic or Thesis Manager

References

No Bibliography

Blockchain Verification

Account:

NFT Contract Address:

0x95644003c57E6F55A65596E3D9Eac6813e3566dA

Article ID:

269503

Unique Identifier:

10.3390/app11114880

Network:

Scimatic Chain (ID: 481)

Blockchain Readiness Checklist

Authors

Abstract

Journal Name

Year

Title

5/5

Creates 1,000,000 NFT tokens for this article

Token Features:

ERC-1155 Standard NFT
1 Million Supply per Article
Transferable via MetaMask
Permanent Blockchain Record

Scan with Saymatik Web3.0 Wallet

Gas fees required in SCI Coins

Buy SCI

Saymatik Web3.0 Wallet

Google Play

App Store

Coming soon

Reference Key: lastname+year+titlefirstword+journalfirstword

Article Type (Article, Book, Proceedings etc.)

Add a reference in a raw form. Our automatic system will correct it later.