2024
-
December 2024
[Award] The paper entitled “Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings” won the Best paper honorable mention at the IEEE Spoken Language Technology Workshop (SLT2024).
-
December 2024
The following 20 papers from the Media Information Laboratory have been accepted to the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2025):
https://group.ntt/en/topics/2025/03/31/icassp2025.html
・Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko, "Core Aggregation via Quantized Distribution Fitting and ITS Application to Predictor Learning"
・Binh Thien Nguyen, Daiki Takeuchi, Masahiro Yasuda, Daisuke Niizumi, Noboru Harada, "Negative and Balanced Sampling for Language-query Audio Source Separation"
・Stefan Bruhn, Tomas Toftgår, Stefan Döhla, Huan-yu Su, Lasse Laaksonen, Takehiro Moriya, Stéphane Ragot, Hiroyuki Ehara, Marek Szczerba, Imre Varga, Andrey Schevciw, Milan Jerinec, "3GPP IVAS Codec – Perspectives on Development, Testing and Standardization"
・Takehiro Moriya, Stephane Ragot, Arnaud Lefort, Alexandre Guerin, Noboru Harada, Ryosuke Sugiura, Yutaka Kamamoto, "EVS-Compatible Downmix in 3GPP IVAS"
・Masahiro Nakano, Hiroki Sakuma, Ryo Nishikimi, Kenji Komiya, Tomoharu Iwata, Kunio Kashino, "Hyperbolic PHATE: Visualizing Continuous Hierarchy of Latent Differentiation Structures"
・Nao Sato, Masahiro Yasuda, Shoichiro Saito, Noboru Harada, "Sound Source Distance Estimation Utilizing Physics-informed Prior for Sound Event Localization and Detection"
・Masahiro Yasuda, Shoichiro Saito, Nao Sato, Noboru Harada, "Spatial Annotation-free Training for Sound Event Localization and Detection"
・Junpei Honma, Akisato Kimura, Go Irie, "Multi-Task Learning for Ultrasonic Echo-based Depth Estimation with Audible Frequency Recovery"
・Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki, "A Hybrid Probabilistic-Deterministic Model Recursively Enhancing Speech"
・Naohiro Tawara, Atsushi Ando, Shota Horiguchi, and Marc Delcroix, "Multi-channel Speaker Counting for EEND-VC-based Speaker Diarization on Multi-domain Conversation"
・Takatomo Kano, Atsunori Ogawa, Marc Delcroix, William Chen, Ryo Fukuda, Kohei Matsuura, Takanori Ashihara, Shinji Watanabe, "Bridging Speech and Text Foundation Models with ReShape Attention"
・Ryo Fukuda, Takatomo Kano, Atushi Ando, Atunori Ogawa, "Whisper-ER: Speech Emotion Recognition Based on Large-Scale Automatic Speech Recognizer"
・Shoko Araki, Nobutaka Ito, Reinhold Haeb-Umbach, Gordon Wichern, Zhong-Qiu Wang, Yuki Mitsufuji, "30+ Years of Source Separation Research: Achievements and Future Challenges"
・Takafumi Moriya, Shota Horiguchi, Marc Delcroix, Ryo Masumura, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Masato Mimura, "Alignment-Free Training for Transducer-based Multi-Talker ASR"
・Carlos Hernandez-Olivan, Marc Delcroix, Tsubasa Ochiai, Daisuke Niizumi, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki, "SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model"
・Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Atsushi Ando, Shota Horiguchi, Shoko Araki, "Mamba-based Segmentation Model for Speaker Diarization"
・Junyi Peng, Takanori Ashihara, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernock, "TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models"
・Shota Horiguchi, Takafumi Moriya, Atsushi Ando, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, Marc Delcroix, "Guided Speaker Embedding"
・Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, "Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance" (Journal Paper Presentation)
・Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino, "Masked Modeling Duo: Towards Universal Audio Pre-training Framework" (Journal Paper Presentation) -
November 2024
The following two papers have been accepted to the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2025):
・Shogo Sato, Takuhiro Kaneko, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida, Akisato Kimura, "Unsupervised Single-Image Intrinsic Image Decomposition With LiDAR Intensity Enhanced Training"
・Risako Tanigawa, Kenji Ishikawa, Noboru Harada, Yasuhiro Oikawa, "SoundSil-DS: Deep Denoising and Segmentation of Sound-Field Images with Silhouettes" -
November 2024
The paper entitled “Sound Field Reconstruction Using Optical Sound Measurement and Neural Fields” has been accepted to the IEEE International Workshop on Machine Learning for Signal Processing (MLSP2024).
-
November 2024
The paper entitled “Direct Moment Estimation of Intensity Distribution of Magnetic Fields with Quantum Sensing Network” has been accepted to New Journal of Physics.
https://iopscience.iop.org/article/10.1088/1367-2630/ad93f4 -
November 2024
The paper entitled “Wolstenholme Primes and Group Determinants of Cyclic Groups” has been accepted to Proceedings of the Japan Academy, Ser. A.
https://projecteuclid.org/journals/proceedings-of-the-japan-academy-series-a-mathematical-sciences/volume-100/issue-9/Wolstenholme-primes-and-group-determinants-of-cyclic-groups/10.3792/pjaa.100.011.full?tab=ArticleLink -
November 2024
[Award] The paper entitled “Cross-Action Cross-Subject Skeleton Action Recognition Via Simultaneous Action-Subject Learning with Two-Step Feature Removal” won the Best Paper Award 1st Runner-up at the IEEE International Conference on Image Processing (ICIP2024).
https://ieeexplore.ieee.org/document/10647253 -
October 2024
The following two papers have been accepted to Asia Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (APSIPA ASC2024):
・Chihiro Watanabe, Hirokazu Kameoka, "GE2E-AC: Generalized End-to-End Loss Training for Accent Classification"
・Xiao Zhang, Haoran Xing, Mingxue Song, Daiki Takeuchi, Noboru Harada, Shoji Makino, "Prediction-Error-Based Adaptive SpecAugment for Fine-Tuning the Masked Model on Audio Classification Tasks" -
October 2024
The paper entitled “Rewindable Quantum Computation and Its Equivalence to Cloning and Adaptive Postselection” has been accepted to Theory of Computing Systems.
https://link.springer.com/article/10.1007/s00224-024-10208-5 -
September 2024
The following paper has been accepted to IEEE Signal Processing Magazine:
Reinhold Haeb-Umbach, Tomohiro Nakatani, Marc Delcroix, Christoph Boeddeker, Tsubasa Ochiai, "Microphone Array Signal Processing and Deep Learning for Speech Enhancement"
https://ieeexplore.ieee.org/document/10819706 -
September 2024
The following paper has been accepted to EURASIP Journal on Audio, Speech, and Music Processing:
Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki & Shoji Makino, "DOA-Informed Switching Independent Vector Extraction and Beamforming for Speech Enhancement in Underdetermined Situations"
https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-024-00373-3 -
September 2024
The paper entitled “Masked Modeling Duo: Towards Universal Audio Pre-Training Framework” has been accepted to IEEE Transactions on Audio, Speech and Language Processing (TASLP).
https://ieeexplore.ieee.org/document/10502167 -
September 2024
The paper entitled “Exploring Pre-Trained General-Purpose Audio Representations for Heart Murmur Detection” has been accepted to IEEE Engineering in Medicine and Biology Society(EMBC2024).
https://arxiv.org/pdf/2404.17107 -
September 2024
The following three papers have been accepted to the workshop on Detection and Classification of Acoustic Scenes and Events(DCASE2024):
・Daiki Takeuchi, Masahiro Yasuda, Daisuke Niizumi, Noboru Harada, "Towards Learning a Difference-Aware General-Purpose Audio Representation"
・Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, Yohei Kawaguchi, "Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring"
・Daisuke Niizumi, Noboru Harada, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda, "ToyADMOS2#: Yet Another Data for the DCASE2024 Challenge Task 2 First-Shot Anomalous Sound Detection" -
September 2024
The paper entitled “Probabilistic Unitary and State Synthesis with Optimal Accuracy” has been accepted to the 6th International Workshop on Quantum Compilation (IWQC 2024).
https://dl.acm.org/doi/pdf/10.1145/3663576 -
September 2024
The paper entitled “Zeta Limits for The Spectrum of Quantum Rabi Models” has been accepted to Journal of Mathematical Physics.
https://arxiv.org/pdf/2304.08943 -
July 2024
The following paper has been accepted to the IEEE/ACM Transactions on Audio, Speech, and Language Processing:
Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, "Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance" IEEE/ACM Transactions on Audio, Speech, and Language Processing.
https://ieeexplore.ieee.org/document/10606400 -
July 2024
The paper entitled “Acoustic-Based 3D Human Pose Estimation Robust to Human Position” has been accepted to the British Machine Vision Conference(BMVC2024).
https://bmva-archive.org.uk/bmvc/2024/papers/Paper_135/paper.pdf -
July 2024
The following paper has been accepted to IEEE Access:
Takanori Ashihara, Marc Delcroix, Yusuke Ijima, Makio Kashino, "Unveiling the Linguistic Capabilities of a Self-Supervised Speech Model Through Cross-Lingual Benchmark and Layer-Wise Similarity Analysis"
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10597571 -
July 2024
The following paper has been accepted to the EURASIP Journal on Audio, Speech, and Music Processing:
Daiki Mori, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka, "Recognition of Target Domain Japanese Speech Using Language Model Replacement"
https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-024-00360-8 -
July 2024
The paper entitled “Northcott Numbers for Generalized Weighted Weil Heights” has been accepted to Acta Arithmetica.
https://arxiv.org/pdf/2308.03981 -
July 2024
The paper entitled “Finite-Key Security of Differential-Phase-Shift QKD” has been accepted to Asian Quantum Information Science Conference(AQIS2024).
-
July 2024
The paper entitled “Spacing Distribution for Quantum Rabi Models” has been accepted to Journal of Physics A: Mathematical and Theoretical.
https://arxiv.org/pdf/2310.09811 -
July 2024
The paper entitled “Activity Measures of Dynamical Systems Over Non-Archimedean Fields” has been accepted to Discrete and Continuous Dynamical Systems.
https://arxiv.org/pdf/1901.01075 -
June 2024
The following 7 papers have been accepted to Interspeech2024.
・Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Masato Mimura, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Taichi Asami, " Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation"
・Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama, Marc Delcroix, "SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling"
・Kenichi Fujita, Takanori Ashihara, Marc Delcroix, Yusuke Ijima, " Lightweight Zero-shot Text-to-Speech with Mixture of Adapters"
・Marvin Tammen, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, Simon Doclo, "Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers"
・Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo, " FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation”
・Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Yuto Kondo, “PRVAE-VC2: Non-Parallel Voice Conversion by Distillation of Speech Representations”
・Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Masahiro Yasuda, Shunsuke Tsubaki, Keisuke Imoto, "M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation" -
June 2024
The followin two papers have been accepted to European Signal Processing Conference (EUSIPCO2024).
・Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko, Noboru Harada, “Learning to Assess Subjective Impressions Conveyed Through Speech”
・Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto, “Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval” -
May 2024
The paper entitled “Detection of Acute Myeloid Leukemia without Labeling Individual Boold Cells” has been accepted to Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC2024).
-
May 2024
The paper entitled “Probabilistic Unitary Synthesis with Optimal Accuracy” has been accepted to ACM Transactions on Quantum Computing.
https://arxiv.org/html/2301.06307v2 -
May 2024
The paper “Non-Locality of Conjugation Symmetry: Characterization and Examples in Quantum Network Sensing” has been accepted to New Journal of Physics.
https://arxiv.org/html/2309.12523v2 -
April 2024
The following two papers have been accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP).
・Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, Shogo Seki, "VoiceGrad: Non-Parallel Any-to-Many Voice Conversion with Annealed Langevin Dynamics"
・Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino, "Masked Modeling Duo: Towards a Universal Audio Pre-training Framework" -
April 2024
The paper entitled “Partition Functions for Non-Commutative Harmonic Oscillators and Related Divergent Series” has been accepted to Indagationes Mathematicae.
https://www.sciencedirect.com/science/article/abs/pii/S0019357724000612?via%3Dihub -
April 2024
The following two papers have been accepted to Mathematicsl Foundations for Post-Quantum Cryptography.
・Ryosuke Nakahama, “Representation Theory of sl(2,R)=su(1,1) and a Generalization of Non-commutative Harmonic Oscillators”
・Cid Reyes-Bustos, “Towards Hash Functions Based on Group-subgroup Pair Graphs” -
March 2024
The following two papers have been accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2024).
・Yu Mitsuzumi, Akisato Kimura, Hisashi Kashima, "Understanding and Improving Source-free Domain Adaptation from a Theoretical Perspective"
・Takuhiro Kaneko, "Improving Physics Augmented Continuum Neural Radiance Fileds-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization" -
March 2024
Our paper “Geometrically-regularized fast independent vector extraction by pure majorization-minimization” has been accepted to IEEE Transactions on Signal Processing.
https://ieeexplore.ieee.org/document/10466407 -
February 2024
The following 5 papers have been accepted to satellite workshops in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2024).
・Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki, Jan Cernocky, "Probing Self-supervised Learning Models with Target Speech Extraction"
・Thilo von Neumann, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix, Reinhold Haeb-Umbach, "Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization"
・Rino Kimura, Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki, Tetsuya Ueda, Shoji Makino, "Diffusion model-based MIMO speech denoising and dereverberation"
・Hao Shi, Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani and Shoko Araki, "ENSEMBLE INFERENCE FOR DIFFUSION MODEL-BASED SPEECH ENHANCEMENT"
・Bo He, Shiqi Zhang, Xianrui Wang, Zheng Qiu, Daiki Takeuchi, Daisuke Niizumi, Noboru Harada, Shoji Makino, “Light Gated Multi Mini-patch Extractor for Audio Classification”
Also, the following 2 papers have been accepted to Show and Tell Demos in ICASSP2024.
・Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino “Target Speech Spotting and Extraction Based on ConceptBeam”
・Thilo von Neumann, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach, "MeetEval, Show Me the Errors! Interactive Visualization of Transcript Alignments for the Analysis of Conversational ASR" -
February 2024
Our paper “Warped diffusion for laten differentiation inference” has been accepted to International Conference on Artificial Intelligence and Statistics (AISTATS2024).
https://proceedings.mlr.press/v238/nakano24a.html -
January 2024
Our paper “A motivic construction of the de Rham-Witt complex” has been accepted to Journal of Pure and Applied Algebra. This is a joint work with the University of Tokyo.
https://www.sciencedirect.com/science/article/pii/S0022404923002840