
Showing 1 - 50 of 1950
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Improved Solution To The Frequency-Invariant Beamforming With Concentric Circular Microphone Arrays
Frequency-invariant beamforming with circular microphone arrays (CMAs) has drawn a significant amount of attention for its steering flexibility and high directivity. However, frequency-invariant beamforming with CMAs often suffers from the so-called null
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Bridging Mixture Density Networks With Meta-Learning For Automatic Speaker Identification
Speaker identification answers the fundamental question "Who is speaking?" The identification technology enables downstream applications to provide a personalized experience. Both the prevalent i-vector based solutions and deep learning solutions usually
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Age Of Information With Finite Horizon And Partial Updates
A resource-constrained system monitors a source of information by requesting a finite number of updates subject to random transmission delays. An a priori fixed update request policy is shown to minimize a polynomial penalty function of the age of informa
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Complexity Reduction Methods For Index Modulation Based Dual-Function Radar Communication Systems
Dual-function radar communication (DFRC) systems implement both sensing and communication using the same hardware. An emerging DFRC strategy embeds transmission of digital messages into agility-based radar schemes in the form of index modulation (IM). Thi
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Deep Monocular Video Depth Estimation Using Temporal Attention
Monocular video depth estimation (MVDE) plays a crucial role in 3D computer vision. In this paper, we propose an end-to-end monocular video depth estimation network based on temporal attention. Our network starts by a motion compensation module where the
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Fully Learnable Front-End For Multi-Channel Acoustic Modeling Using Semi-Supervised Learning
In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a sim
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
T-Gsa: Transformer With Gaussian-Weighted Self-Attention For Speech Enhancement
Transformer neural networks (TNN) demonstrated state-of-art performance on many natural language processing (NLP) tasks, replacing recurrent neural networks (RNNs), such as LSTMs or GRUs. However, TNNs did not perform well in speech enhancement, whose con
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adaptation And Learning In Multi-Task Decision Systems
Adaptation and learning over multi-agent networks is a topic of great relevance with important implications. Elaborating on previous works on single-task networks engaged in decision problems, here we consider the multi-task version in the challenging sce
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Colour Compression Of Plenoptic Point Clouds Using Raht-Klt With Prior Colour Clustering And Specular/Diffuse Component Separation
The recently introduced plenoptic point cloud representation marries a 3D point cloud with a light field. Instead of each point being associated with a single colour value, there can be multiple values to represent the colour at that point as perceived fr
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Self-Supervised Learning For Ecg-Based Emotion Recognition
We present an electrocardiogram (ECG) -based emotion recognition system using self-supervised learning. Our proposed architecture consists of two main networks, a signal transformation recognition network and an emotion recognition network. First, unlabel
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Deep Learning Abilities To Classify Intricate Variations In Temporal Dynamics Of Multivariate Time Series
The aim of this work is to investigate the ability of deep learning (DL) architectures to learn temporal dynamics in multivariate time series. The methodology consists in using well known synthetic stochastic processes for which changes in joint temporal
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Attention Enhanced Multi-Task Model For Objective Speech Assessment In Real-World Environments
Computational objective metrics that use reference signals have been shown to be effective forms of speech assessment in simulated environments, since they are correlated with subjective listening studies. Recent efforts have been dedicated towards effect
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
An Early Termination Scheme For Successive Cancellation List Decoding Of Polar Codes
In order to minimize the decoding period and the response time for Polar Codes, an early termination (ET) scheme based on additional check points (ACPs) is proposed in this work. For conventional ET schemes based on distributed parity-check (PC) bits, ET
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Clcnet: Deep Learning-Based Noise Reduction For Hearing Aids Using Complex Linear Coding
Noise reduction is an important part of modern hearing aids and is included in most commercially available devices. Deep learning-based state-of-the-art algorithms, however, either do not consider real-time and frequency resolution constrains or result in
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Translation Of A Higher Order Ambisonics Sound Scene Based On Parametric Decomposition
This paper presents a novel 3DoF+ system that allows to navigate, i.e., change position, in scene-based spatial audio content beyond the sweet spot of a Higher Order Ambisonics recording. It is one of the first such systems based on sound capturing at a s
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Leveraging Gans To Improve Continuous Path Keyboard Input Models
Continuous path keyboard input has higher inherent ambiguity than standard tapping, because the path trace may exhibit not only local overshoots/undershoots (as in tapping) but also, depending on the user, substantial mid-path excursions. Deploying a robu
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Scalable Learning-Based Sampling Optimization For Compressive Dynamic Mri
Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reco
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Design Of A Convergence-Aware Based Expectation Propagation Algorithm For Uplink Mimo Scma Systems
Sparse code multiple access (SCMA) uses multi-dimensional sparse codewords to transmit user data. The expectation propagation algorithm (EPA) exploiting the sparse property shows linear complexity growth and thus is preferred for multi-user detection. To
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Enhancement Of Coded Speech Using A Mask-Based Post-Filter
The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven postfilter relying on masking in the time-frequency doma
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Diagonalizable Shift And Filters For Directed Graphs Based On The Jordan-Chevalley Decomposition
Graph signal processing on directed graphs poses theoretical challenges since an eigendecomposition of filters is in general not available. Instead, Fourier analysis requires a Jordan decomposition and the frequency response is given by the Jordan normal
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A-Crnn: A Domain Adaptation Model For Sound Event Detection
This paper presents a domain adaptation model for sound event detection. A common challenge for sound event detection is how to deal with the mismatch among different datasets. Typically, the performance of a model will decrease if it is tested on a datas
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Efficient Multichannel Nonlinear Acoustic Echo Cancellation Based On A Cooperative Strategy
While a common approach to address nonlinear distortions, emitted by multiple loudspeakers and observed by multiple microphones, is to use post-filtering techniques, this paper proposes a cooperative strategy to rather model and then cancel such distortio
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Using Speech Synthesis To Train End-To-End Spoken Language Understanding Models
End-to-end models are an attractive new approach to spoken language understanding (SLU) in which the meaning of an utterance is inferred directly from the raw audio, without employing the standard pipeline composed of a separately trained speech recognize
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Efficient Deep Learning-Based Lossy Image Compression Via Asymmetric Autoencoder And Pruning
Recently, deep learning-based lossy image compression methods have been proposed. However, their efficiency in terms of storage and computational costs has not been addressed adequately. In this paper, we propose efficient lossy image compression methods
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Time-Scale Synthesis For Locally Stationary Signals
We develop a time-scale synthesis-based probabilistic approach for the modeling of locally stationary signals. Inspired by our previous work, the model involves zero-mean, complex Gaussian wavelet coefficients, whose distribution varies as a function of t
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Accurate Localization Of Auv In Motion By Explicit Solution Using Time Delays
Accurate localization of an autonomous underwater vehicle (AUV) is essential in many applications. The motion of an AUV during the measurement acquisition period can be significant and the localization performance can suffer considerably if it is neglecte
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Algorithmic Exploration Of American English Dialects
In this paper, we use a novel algorithmic approach to explore dialectal variation in American English speech. Without the need for human annotations, we are able to use a corpus transcribed in text form only. Our results show that, in general, American En
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Improving Deep Learning Classification Of Jpeg2000 Images Over Bandlimited Networks
JPEG2000 (j2k) is a highly popular format for image and video compression. It plays a major role in the rapidly growing applications of cloud based image classification. Considering limited network bandwidth, we propose an end-to-end deep learning framewo
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Multi-View Approach For Mandarin Non-Native Mispronunciation Verification
Traditionally, the performance of non-native mispronunciation verification systems relied on effective phone-level labelling of non-native corpora. In this study, a multi-view approach is proposed to incorporate discriminative feature representations whic
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
K-Autoencoders Deep Clustering
In this study we propose a deep clustering algorithm that extends the k-means algorithm. Each cluster is represented by an autoencoder instead of a single centroid vector. Each data point is associated with the autoencoder which yields the minimal reconst
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adversarial Multi-Task Learning For Speaker Normalization In Replay Detection
Spoofing detection algorithms in voice biometrics are adversely affected by differences in the speech characteristics of the various target users. In this paper, we propose a novel speaker normalisation technique that employs adversarial multi-task learni
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
The Matched Reassigned Cross-Spectrogram For Phase Estimation
In this paper, the matched reassigned spectrogram is expanded into a novel matched phase reassignment (MPR) method based on the reassigned cross-spectrogram. It is shown that for two phase synchronized oscillating transient signals, the method gives perfe
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Interpretable Machine Learning In Sustainable Edge Computing: A Case Study Of Short-Term Photovoltaic Power Output Prediction
With the Internet of Things continuously penetrating into all spheres of our daily life, the increasing use of smart devices enabled the emergence of the edge computing paradigm. To meet the needs of saving energy and reducing electricity bills for each h
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Sparse Convolutional Beamforming For Wireless Ultrasound
Wireless ultrasound systems can make the imaging process much more efficient, affordable and accessible for users. The standard technique to create B-mode images is to rely on delay and sum (DAS) beamforming, in which the signals at each transducer elemen
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Divergence-Based Adaptive Extreme Video Completion
Extreme image or video completion, where, for instance, we only retain 1% of pixels in random locations, allows for very cheap sampling in terms of the required pre-processing. The consequence is, however, a reconstruction that is challenging for humans a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Fast Acoustic Scattering Using Convolutional Neural Networks
Diffracted scattering and occlusion are important acoustic effects in interactive auralization and noise control applications, typically requiring expensive numerical simulation. We propose training a convolutional neural network to map from a convex scat
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adversarial Video Compression Guided By Soft Edge Detection
We propose a video compression framework using conditional Generative Adversarial Networks (GANs). We rely on two encoders: one that deploys a standard video codec and another one which generates low-level soft edge maps. For decoding, we use a standard v
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Transformer Vae: A Hierarchical Model For Structure-Aware And Interpretable Music Representation Learning
Structure awareness and interpretability are two of the most desired properties of music generation algorithms. Structure-aware models generate more natural and coherent music with long-term dependencies, while interpretable models are more friendly for h
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Geometrically Constrained Independent Vector Analysis For Directional Speech Enhancement
This paper addresses the multichannel directional speech enhancement problem with geometrically constrained independent vector analysis (GCIVA), where we aim to combine the high separation performance from blind source separation and the capability of dir
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Espnet-Tts: Unified, Reproducible, And Integratable Open Source End-To-End Text-To-Speech Toolkit
This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit supports state-of-the-art E2E-TTS models, including Tacotron~2, Transformer TT
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
The Processing Of Mandarin Chinese Tonal Alternations In Contexts: An Eye-Tracking Study
This study investigated the perception of Mandarin tonal alternations in disyllabic words. In Mandarin, a low-dipping Tone3 is converted to a high-rising Tone2 when followed by another Tone3, known as third tone sandhi. Although previous studies showed st
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Adversarial Attacks On Gmm I-Vector Based Speaker Verification Systems
This work investigates the vulnerability of Gaussian Mixture Model (GMM) i-vector based speaker verification systems to adversarial attacks, and the transferability of adversarial samples crafted from GMM i-vector based systems to x-vector based systems.
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Detecting Mismatch Between Text Script And Voice-Over Using Utterance Verification Based On Phoneme Recognition Ranking
The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a scrip
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00

Regularized Beamformer For The Spherical Microphone Array To Cope With The White Noise Amplification
[2 Videos ]
Spherical microphone arrays with compact aperture and maximum directivity factor have been one of the popular research fields but are usually accompanied by the white noise amplification problem, which hinders them for practical applications. This paper p
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
A Neural Network Based On First Principles
In this paper, a Neural network is derived from first principles, assuming only that each layer begins with a linear dimension-reducing transformation. The approach appeals to the principle of Maximum Entropy (Max-Ent) to find the posterior distribution o
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Av(Se)²: Audio-Visual Squeeze-Excite Speech Enhancement
The goal of audio-visual speech enhancement (AVSE) is to supplement audio-only information with visual information, such as target speaker's lip movements, to improve the intelligibility and overall perceptual quality of noisy speech signals. We propose a
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Unseen Face Presentation Attack Detection With Hypersphere Loss
Presentation attack is one of the main threats to face verification systems and attracts great attention of research community. Recent methods achieve great success in intra-database test. However, the problem is more complex in practical scenario as the
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
L-Vector: Neural Label Embedding For Domain Adaptation
We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains. With NLE method, we distill the knowledge from a powerful source-doma
- IEEE MemberUS $11.00
- Society MemberUS $0.00
- IEEE Student MemberUS $11.00
- Non-IEEE MemberUS $15.00
Attention-Guided Deraining Network Via Stage-Wise Learning
Due to diverse rain shapes, directions, densities as well as different distances to cameras, rain streaks in the air are interweaved and overlapped. However, most existing deraining methods are inherently oblivious this phenomenon and tend to learn a sing