Automatically curated collection of the latest research papers in Speech & Language Technology
📅 Updated on 2025.09.25
This repository provides a daily-updated collection of the latest research papers from arXiv in the following domains:
📖 Usage instructions: here 🌐 Web version: GitHub Pages
💡 This page is inspired by cv-arxiv-daily
📊 117 papers
📅 Publish Date | 📝 Title | 👥 Authors | 💻 Code | |
---|---|---|---|---|
2025-09-23 | WolBanking77: Wolof Banking Speech Intent Classification Dataset | Abdou Karim Kandji et.al. | 2509.19271 | null |
2025-09-23 | SloPalSpeech: A 2,8000-Hour Slovak Speech Corpus from Parliamentary Data | Erik Božík et.al. | 2509.19270 | null |
2025-09-23 | LOTUSDIS: A Thai far-field meeting corpus for robust conversational ASR | Pattara Tipaksorn et.al. | 2509.18722 | null |
2025-09-22 | Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents | Chutong Meng et.al. | 2509.18360 | link |
2025-09-20 | Conversational Orientation Reasoning: Egocentric-to-Allocentric Navigation with Multimodal Chain-of-Thought | Yu Ti Huang et.al. | 2509.18200 | null |
2025-09-19 | MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech | Jialong Mai et.al. | 2509.18196 | null |
2025-09-22 | Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation | Yiwen Guan et.al. | 2509.17930 | null |
2025-09-22 | Qwen3-Omni Technical Report | Jin Xu et.al. | 2509.17765 | null |
2025-09-22 | Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models | María Andrea Cruz Blandón et.al. | 2509.17523 | null |
2025-09-20 | Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies | Vishnu Raja et.al. | 2509.16718 | null |
2025-09-20 | Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing | Mengqi Wang et.al. | 2509.16622 | null |
2025-09-19 | Whisper-UT: A Unified Translation Framework for Speech and Text | Cihan Xiao et.al. | 2509.16375 | null |
2025-09-19 | GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition | Tianyue Wang et.al. | 2509.16031 | null |
2025-09-19 | Session-Level Spoken Language Assessment with a Multimodal Foundation Model via Multi-Target Learning | Hong-Yun Lin et.al. | 2509.16025 | null |
2025-09-19 | Interpreting the Role of Visemes in Audio-Visual Speech Recognition | Aristeidis Papadopoulos et.al. | 2509.16023 | null |
2025-09-19 | VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion | Dimitrios Damianos et.al. | 2509.15667 | null |
2025-09-19 | Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations | Linyang He et.al. | 2509.15655 | null |
2025-09-19 | Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition | Yiru Zhang et.al. | 2509.15612 | null |
2025-09-19 | Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization | Yun Tang et.al. | 2509.15579 | null |
2025-09-19 | State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization | Dhruuv Agarwal et.al. | 2509.15516 | null |
2025-09-18 | BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition | Liuyuan Jiang et.al. | 2509.15430 | null |
2025-09-18 | Speech Language Models for Under-Represented Languages: Insights from Wolof | Yaya Sy et.al. | 2509.15362 | null |
2025-09-20 | Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs | Yutong Liu et.al. | 2509.15095 | null |
2025-09-19 | From Hype to Insight: Rethinking Large Language Model Integration in Visual Speech Recognition | Rishabh Jain et.al. | 2509.14880 | null |
2025-09-18 | Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages | Mingchen Shao et.al. | 2509.14804 | null |
2025-09-18 | UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition | Ying Fang et.al. | 2509.14653 | null |
2025-09-17 | Multi-Channel Differential ASR for Robust Wearer Speech Recognition on Smart Glasses | Yufeng Yang et.al. | 2509.14430 | null |
2025-09-13 | Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing | Luan Vejsiu et.al. | 2509.14263 | null |
2025-09-17 | Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST | Monica Sekoyan et.al. | 2509.14128 | null |
2025-09-17 | Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Untowered Airspace | Sundhar Vinodh Sangeetha et.al. | 2509.14063 | null |
2025-09-17 | Conducting Mission-Critical Voice Experiments with Automated Speech Recognition and Crowdsourcing | Jan Janak et.al. | 2509.13724 | null |
2025-09-16 | Invisible Ears at Your Fingertips: Acoustic Eavesdropping via Mouse Sensors | Mohamad Fakih et.al. | 2509.13581 | null |
2025-09-16 | TICL: Text-Embedding KNN For Speech In-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models | Haolong Zheng et.al. | 2509.13395 | null |
2025-09-22 | GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR | Yujie Guo et.al. | 2509.13093 | null |
2025-09-16 | PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition | Li Fu et.al. | 2509.12647 | null |
2025-09-17 | FunAudio-ASR Technical Report | Keyu An et.al. | 2509.12508 | null |
2025-09-15 | In-domain SSL pre-training and streaming ASR | Jarod Duret et.al. | 2509.12101 | null |
2025-09-12 | Improving Audio Event Recognition with Consistency Regularization | Shanmuka Sadhu et.al. | 2509.10391 | null |
2025-09-12 | Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR | Can Cui et.al. | 2509.10234 | null |
2025-09-12 | Prominence-aware automatic speech recognition for conversational speech | Julian Linke et.al. | 2509.10116 | null |
2025-09-12 | Unified Learnable 2D Convolutional Feature Extraction for ASR | Peter Vieting et.al. | 2509.10031 | null |
2025-09-11 | Combining Textual and Spectral Features for Robust Classification of Pilot Communications | Abdullah All Tanvir et.al. | 2509.09752 | null |
2025-09-11 | Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function | Chin Yuen Kwok et.al. | 2509.09197 | null |
2025-09-11 | Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition | Chin Yuen Kwok et.al. | 2509.09196 | null |
2025-09-09 | A Bottom-up Framework with Language-universal Speech Attribute Modeling for Syllable-based ASR | Hao Yen et.al. | 2509.08173 | null |
2025-09-09 | EnvX: Agentize Everything with Agentic AI | Linyao Chen et.al. | 2509.08088 | null |
2025-09-08 | Identifying and Calibrating Overconfidence in Noisy Speech Recognition | Mingyue Huo et.al. | 2509.07195 | null |
2025-09-08 | The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties | William Chen et.al. | 2509.07139 | null |
2025-09-20 | TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition | Minh N. H. Nguyen et.al. | 2509.05983 | null |
2025-09-07 | Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint Modeling | Yue Gu et.al. | 2509.05908 | null |
2025-09-06 | New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR | Xugang Lu et.al. | 2509.05609 | null |
2025-09-05 | Graph Connectionist Temporal Classification for Phoneme Recognition | Henry Grafé et.al. | 2509.05399 | null |
2025-09-05 | Layer-wise Analysis for Quality of Multilingual Synthesized Speech | Erica Cooper et.al. | 2509.04830 | null |
2025-09-02 | From Silent Signals to Natural Language: A Dual-Stage Transformer-LLM Approach | Nithyashree Sivasubramaniam et.al. | 2509.04507 | null |
2025-09-01 | Refining Transcripts With TV Subtitles by Prompt-Based Weakly Supervised Training of ASR | Xinnian Zhao et.al. | 2509.04491 | null |
2025-09-01 | Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition | Hao Shi et.al. | 2509.04488 | null |
2025-08-29 | SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings | Jaekwon Yoo et.al. | 2509.04473 | null |
2025-09-04 | Contextualized Token Discrimination for Speech Search Query Correction | Junyu Lu et.al. | 2509.04393 | null |
2025-09-04 | Denoising GER: A Noise-Robust Generative Error Correction with LLM for Speech Recognition | Yanyan Liu et.al. | 2509.04392 | null |
2025-09-04 | PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation | Jiajun He et.al. | 2509.04357 | null |
2025-09-04 | Enhancing Self-Supervised Speaker Verification Using Similarity-Connected Graphs and GCN | Zhaorui Sun et.al. | 2509.04147 | null |
2025-08-27 | An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment | Tien-Hong Lo et.al. | 2509.03372 | null |
2025-09-05 | Exploring persuasive interactions with generative social robots: An experimental framework | Stephan Vonschallen et.al. | 2509.03231 | null |
2025-09-03 | Beyond Words: Interjection Classification for Improved Human-Computer Interaction | Yaniv Goren et.al. | 2509.03181 | null |
2025-09-03 | A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models | Ryandhimas E. Zezario et.al. | 2509.03021 | null |
2025-09-04 | Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM | Ryandhimas E. Zezario et.al. | 2509.03013 | null |
2025-09-02 | SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR | Pu Wang et.al. | 2509.02830 | null |
2025-09-02 | Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices | Evan King et.al. | 2509.02523 | null |
2025-09-04 | AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation | Lu Wang et.al. | 2509.02349 | null |
2025-09-03 | NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task | Bashar Talafha et.al. | 2509.02038 | null |
2025-09-02 | Group Relative Policy Optimization for Speech Recognition | Prashanth Gurunath Shivakumar et.al. | 2509.01939 | null |
2025-09-02 | Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy | Zehan Li et.al. | 2509.01900 | null |
2025-09-01 | Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts | Shreyas Tirumala et.al. | 2509.01814 | null |
2025-09-01 | Characterization of Speech Similarity Between Australian Aboriginal and High-Resource Languages: A Case Study on Dharawal | Ting Dang et.al. | 2509.01419 | null |
2025-09-01 | CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-Car Speech Separation with Distributed Heterogeneous Arrays | Runduo Han et.al. | 2509.01399 | null |
2025-09-01 | Analysing the Language of Neural Audio Codecs | Joonyong Park et.al. | 2509.01390 | null |
2025-09-01 | Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition | Shuangyuan Chen et.al. | 2509.01087 | null |
2025-08-31 | A Unified Denoising and Adaptation Framework for Self-Supervised Bengali Dialectal ASR | Swadhin Biswas et.al. | 2509.00988 | null |
2025-08-30 | Entropy-based Coarse and Compressed Semantic Speech Representation Learning | Jialong Zuo et.al. | 2509.00503 | null |
2025-08-27 | Automatic Pronunciation Error Detection and Correction of the Holy Quran’s Learners Using Deep Learning | Abdullah Abdelfattah et.al. | 2509.00094 | null |
2025-08-29 | NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration | Wuque Cai et.al. | 2508.21566 | null |
2025-09-02 | AHELM: A Holistic Evaluation of Audio-Language Models | Tony Lee et.al. | 2508.21376 | null |
2025-08-28 | Can Layer-wise SSL Features Improve Zero-Shot ASR Performance for Children’s Speech? | Abhijit Sinha et.al. | 2508.21225 | null |
2025-08-28 | Benchmarking Large Pretrained Multilingual Models on Québec French Speech Recognition | Coralie Serrand et.al. | 2508.21193 | null |
2025-08-28 | OLMoASR: Open Models and Data for Training Robust Speech Recognition Models | Huong Ngo et.al. | 2508.20869 | null |
2025-08-28 | Generative Annotation for ASR Named Entity Correction | Yuanchang Luo et.al. | 2508.20700 | null |
2025-08-28 | Towards Inclusive Communication: A Unified LLM-Based Framework for Sign Language, Lip Movements, and Audio Understanding | Jeong Hun Yeo et.al. | 2508.20476 | null |
2025-09-08 | Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints | Xiaodong Cui et.al. | 2508.19990 | null |
2025-08-27 | TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation | Shashi Kumar et.al. | 2508.19856 | null |
2025-08-27 | CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese | Carlos Carvalho et.al. | 2508.19721 | null |
2025-08-27 | Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models | Yunkyu Lim et.al. | 2508.19671 | null |
2025-08-27 | Towards stable AI systems for Evaluating Arabic Pronunciations | Hadi Zaatiti et.al. | 2508.19587 | null |
2025-08-22 | Whisper based Cross-Lingual Phoneme Recognition between Vietnamese and English | Nguyen Huu Nhat Minh et.al. | 2508.19270 | null |
2025-08-26 | MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR | Junjie Li et.al. | 2508.18998 | null |
2025-08-26 | TaiBai: A fully programmable brain-inspired processor with topology-aware efficiency | Qianpeng Li et.al. | 2508.18961 | null |
2025-08-26 | DESAMO: A Device for Elder-Friendly Smart Homes Powered by Embedded LLM with Audio Modality | Youngwon Choi et.al. | 2508.18918 | null |
2025-08-26 | Improving Noise Robust Audio-Visual Speech Recognition via Router-Gated Cross-Modal Feature Fusion | DongHoon Lim et.al. | 2508.18734 | null |
2025-08-26 | Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database | Qing Xiao et.al. | 2508.18732 | null |
2025-08-26 | Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System | Yanfan Du et.al. | 2508.18701 | null |
2025-08-22 | H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems | Huangyu Dai et.al. | 2508.18295 | null |
2025-08-20 | Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology | Jay L. Cunningham et.al. | 2508.18288 | null |
2025-08-25 | Evaluating the Representation of Vowels in Wav2Vec Feature Extractor: A Layer-Wise Analysis Using MFCCs | Domenico De Cristofaro et.al. | 2508.17914 | null |
2025-08-25 | Designing Practical Models for Isolated Word Visual Speech Recognition | Iason Ioannis Panagos et.al. | 2508.17894 | null |
2025-08-25 | Talking to Robots: A Practical Examination of Speech Foundation Models for HRI Applications | Theresa Pekarek Rosin et.al. | 2508.17753 | null |
2025-08-24 | AI-Powered Legal Intelligence System Architecture: A Comprehensive Framework for Automated Legal Consultation and Analysis | Sean Kalaycioglu et.al. | 2508.17499 | null |
2025-08-22 | Benchmarking Training Paradigms, Dataset Composition, and Model Scaling for Child ASR in ESPnet | Anyu Ying et.al. | 2508.16576 | null |
2025-08-21 | Beyond Transcription: Mechanistic Interpretability in ASR | Neta Glazer et.al. | 2508.15882 | null |
2025-08-20 | MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr | Xuwen Yang et.al. | 2508.15853 | null |
2025-08-21 | UniCoM: A Universal Code-Switching Speech Generator | Sangmin Lee et.al. | 2508.15244 | null |
2025-08-20 | A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References | Simon Dahl Jepsen et.al. | 2508.14623 | null |
2025-08-18 | Whispering Context: Distilling Syntax and Semantics for Long Speech Transcripts | Duygu Altinok et.al. | 2508.13376 | null |
2025-08-18 | Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MT | Zeeshan Ahmed et.al. | 2508.13358 | null |
2025-08-18 | Evaluating ASR robustness to spontaneous speech errors: A study of WhisperX using a Speech Error Database | John Alderete et.al. | 2508.13060 | null |
2025-08-18 | Arabic ASR on the SADA Large-Scale Arabic Speech Corpus with Transformer-Based Models | Branislav Gerazov et.al. | 2508.12968 | null |
2025-08-17 | CarelessWhisper: Turning Whisper into a Causal Streaming Model | Tomer Krichli et.al. | 2508.12301 | null |
2025-08-17 | HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization | Hyebin Ahn et.al. | 2508.12292 | null |
2025-08-17 | What do Speech Foundation Models Learn? Analysis and Applications | Ankita Pasad et.al. | 2508.12255 | null |
📊 97 papers
📅 Publish Date | 📝 Title | 👥 Authors | 💻 Code | |
---|---|---|---|---|
2025-09-23 | Finding My Voice: Generative Reconstruction of Disordered Speech for Automated Clinical Evaluation | Karen Rosero et.al. | 2509.19231 | null |
2025-09-23 | Investigating Test-Time Scaling with Reranking for Machine Translation | Shaomu Tan et.al. | 2509.19020 | null |
2025-09-23 | No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS | Seungyoun Shin et.al. | 2509.18531 | null |
2025-09-22 | Discrete-time diffusion-like models for speech synthesis | Xiaozhou Tan et.al. | 2509.18470 | null |
2025-09-22 | TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation | Yutong Liu et.al. | 2509.18060 | null |
2025-09-22 | Variation in Verification: Understanding Verification Dynamics in Large Language Models | Yefan Zhou et.al. | 2509.17995 | null |
2025-09-22 | Nord-Parl-TTS: Finnish and Swedish TTS Dataset from Parliament Speech | Zirui Li et.al. | 2509.17988 | null |
2025-09-23 | Mitigating Strategy-Selection Bias in Reasoning for More Effective Test-Time Scaling | Zongqian Wu et.al. | 2509.17905 | null |
2025-09-22 | Audiobook-CC: Controllable Long-context Speech Generation for Multicast Audiobook | Min Liu et.al. | 2509.17516 | null |
2025-09-21 | Bridging the gap between training and inference in LM-based TTS models | Ruonan Zhang et.al. | 2509.17021 | null |
2025-09-21 | MBCodec:Thorough disentangle for high-fidelity audio compression | Ruonan Zhang et.al. | 2509.17006 | null |
2025-09-19 | Fed-PISA: Federated Voice Cloning via Personalized Identity-Style Adaptation | Qi Wang et.al. | 2509.16010 | null |
2025-09-19 | VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency | Nikita Torgashov et.al. | 2509.15969 | null |
2025-09-19 | Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS | Ziqi Dai et.al. | 2509.15845 | null |
2025-09-19 | Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech | Xinlei Niu et.al. | 2509.15492 | null |
2025-09-18 | Real-Time Streaming Mel Vocoding with Generative Flow Matching | Simon Welker et.al. | 2509.15085 | null |
2025-09-18 | DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis | Ye-Xin Lu et.al. | 2509.14684 | null |
2025-09-20 | Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis | Qingyu Liu et.al. | 2509.14579 | null |
2025-09-15 | SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models | Karan Dua et.al. | 2509.14270 | null |
2025-09-17 | Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency | Colin Hong et.al. | 2509.13990 | null |
2025-09-22 | Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems | Yi-Cheng Lin et.al. | 2509.13989 | null |
2025-09-16 | MSR-Codec: A Low-Bitrate Multi-Stream Residual Codec for High-Fidelity Speech Generation with Information Disentanglement | Jingyu Li et.al. | 2509.13068 | null |
2025-09-21 | LTA-thinker: Latent Thought-Augmented Training Framework for Large Language Models on Complex Reasoning | Jiaqi Wang et.al. | 2509.12875 | null |
2025-09-16 | Towards personalized, precise and survey-free environment recognition: AI-enhanced sensor fusion without pre-deployment | Ruichen Wang et.al. | 2509.12870 | null |
2025-09-16 | A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis | Javeria Amir et.al. | 2509.12831 | null |
2025-09-21 | Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization | Jiahao Yu et.al. | 2509.12434 | null |
2025-09-15 | Preservation of Language Understanding Capabilities in Speech-aware Large Language Models | Marek Kubis et.al. | 2509.12171 | null |
2025-09-14 | FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs | Md Mubtasim Ahasan et.al. | 2509.11425 | null |
2025-09-14 | Length-Aware Rotary Position Embedding for Text-Speech Alignment | Hyeongju Kim et.al. | 2509.11084 | null |
2025-09-12 | Towards Data Drift Monitoring for Speech Deepfake Detection in the context of MLOps | Xin Wang et.al. | 2509.10086 | null |
2025-09-11 | DiTReducio: A Training-Free Acceleration for DiT-Based TTS via Progressive Calibration | Yanru Huo et.al. | 2509.09748 | null |
2025-09-12 | DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech | Ngoc-Son Nguyen et.al. | 2509.09631 | link |
2025-09-11 | HISPASpoof: A New Dataset For Spanish Speech Forensics | Maria Risques et.al. | 2509.09155 | null |
2025-09-10 | Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching | Siratish Sakpiboonchit et.al. | 2509.08696 | null |
2025-09-14 | Progressive Facial Granularity Aggregation with Bilateral Attribute-based Enhancement for Face-to-Speech Synthesis | Yejin Jeon et.al. | 2509.07376 | null |
2025-09-09 | When Fine-Tuning is Not Enough: Lessons from HSAD on Hybrid and Adversarial Audio Spoof Detection | Bin Hu et.al. | 2509.07323 | null |
2025-09-08 | Controllable Singing Voice Synthesis using Phoneme-Level Energy Sequence | Yerin Ryu et.al. | 2509.07038 | null |
2025-09-07 | Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis | Zhenqi Jia et.al. | 2509.06074 | null |
2025-09-06 | LatinX: Aligning a Multilingual TTS Model with Direct Preference Optimization | Luis Felipe Chary et.al. | 2509.05863 | null |
2025-09-08 | Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework | Jie Chen et.al. | 2509.05007 | null |
2025-09-04 | Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding | Rui-Chen Zheng et.al. | 2509.04685 | null |
2025-09-04 | DarkStream: real-time speech anonymization with low latency | Waris Quamer et.al. | 2509.04667 | null |
2025-09-04 | AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds | Qizhou Wang et.al. | 2509.04345 | null |
2025-09-04 | Open-Source Full-Duplex Conversational Datasets for Natural and Interactive Speech Synthesis | Zhitong Zhou et.al. | 2509.04093 | null |
2025-09-04 | LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis | Gaspard Michel et.al. | 2509.04072 | null |
2025-09-16 | SwinSRGAN: Swin Transformer-based Generative Adversarial Network for High-Fidelity Speech Super-Resolution | Jiajun Yuan et.al. | 2509.03913 | null |
2025-09-03 | Multi-level SSL Feature Gating for Audio Deepfake Detection | Hoan My Tran et.al. | 2509.03409 | null |
2025-09-03 | Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings | Dyah A. M. G. Wisnu et.al. | 2509.03292 | null |
2025-09-03 | AIVA: An AI-based Virtual Companion for Emotion-aware Interaction | Chenxi Li et.al. | 2509.03212 | null |
2025-09-02 | Scale, Don’t Fine-tune: Guiding Multimodal LLMs for Efficient Visual Place Recognition at Test-Time | Jintao Cheng et.al. | 2509.02129 | null |
2025-09-04 | FireRedTTS-2: Towards Long Conversational Speech Generation for Podcast and Chatbot | Kun Xie et.al. | 2509.02020 | null |
2025-09-01 | MixedG2P-T5: G2P-free Speech Synthesis for Mixed-script texts using Speech Self-Supervised Learning and Language Model | Joonyong Park et.al. | 2509.01391 | null |
2025-08-31 | MPO: Multidimensional Preference Optimization for Language Model-based Text-to-Speech | Kangxiang Xia et.al. | 2509.00685 | null |
2025-08-31 | Speaker-Conditioned Phrase Break Prediction for Text-to-Speech with Phoneme-Level Pre-trained Language Model | Dong Yang et.al. | 2509.00675 | null |
2025-08-29 | Democratizing Agentic AI with Fast Test-Time Scaling on the Edge | Hao Mark Chen et.al. | 2509.00195 | null |
2025-08-27 | Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs | Qibin Wang et.al. | 2509.00084 | null |
2025-08-28 | Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System | Hashim Ali et.al. | 2508.20983 | null |
2025-08-26 | Predicting the optimal noise strength for solving optimization problems with analog Ising machines | Leen Mys et.al. | 2508.19107 | null |
2025-08-26 | CLEAR: Continuous Latent Autoregressive Modeling for High-quality and Low-latency Speech Synthesis | Chun Yat Wu et.al. | 2508.19098 | null |
2025-08-25 | SwiftF0: Fast and Accurate Monophonic Pitch Detection | Lars Nieradzik et.al. | 2508.18440 | null |
2025-08-25 | Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters | Alessio Falai et.al. | 2508.18006 | null |
2025-08-27 | Vocoder-Projected Feature Discriminator | Takuhiro Kaneko et.al. | 2508.17874 | null |
2025-08-25 | ClearMask: Noise-Free and Naturalness-Preserving Protection Against Voice Deepfake Attacks | Yuanda Wang et.al. | 2508.17660 | null |
2025-08-24 | Improving French Synthetic Speech Quality via SSML Prosody Control | Nassima Ould Ouali et.al. | 2508.17494 | null |
2025-08-23 | WildSpoof Challenge Evaluation Plan | Yihan Wu et.al. | 2508.16858 | null |
2025-09-09 | Trust but Verify! A Survey on Verification Design for Test-time Scaling | V Venktesh et.al. | 2508.16665 | null |
2025-09-05 | Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets | Chenlin Liu et.al. | 2508.15442 | null |
2025-08-25 | Linear Preference Optimization: Decoupled Gradient Control via Absolute Regularization | Rui Wang et.al. | 2508.14947 | null |
2025-08-20 | Long-Context Speech Synthesis with Context-Aware Memory | Zhipeng Li et.al. | 2508.14713 | null |
2025-08-20 | Improving Resource-Efficient Speech Enhancement via Neural Differentiable DSP Vocoder Refinement | Heitor R. Guimarães et.al. | 2508.14709 | null |
2025-08-22 | Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS | Can Jin et.al. | 2508.14313 | null |
2025-08-19 | Who Gets the Mic? Investigating Gender Bias in the Speaker Assignment of a Speech-LLM | Dariia Puhach et.al. | 2508.13603 | null |
2025-08-18 | Integrating Feedback Loss from Bi-modal Sarcasm Detector for Sarcastic Speech Synthesis | Zhu Li et.al. | 2508.13028 | null |
2025-08-18 | Cooperative Sensing-Assisted Predictive Beam Tracking for MIMO-OFDM Networked ISAC Systems | Xiaoyu Yang et.al. | 2508.12723 | null |
2025-08-18 | Real-Time Sign Language Gestures to Speech Transcription using Deep Learning | Brandone Fonya et.al. | 2508.12713 | null |
2025-08-19 | FNH-TTS: A Fast, Natural, and Human-Like Speech Synthesis System with advanced prosodic modeling based on Mixture of Experts | Qingliang Meng et.al. | 2508.12001 | null |
2025-08-15 | MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts | Heyang Xue et.al. | 2508.11326 | null |
2025-08-15 | EmoSSLSphere: Multilingual Emotional Speech Synthesis with Spherical Vectors and Discrete Speech Tokens | Joonyong Park et.al. | 2508.11273 | null |
2025-08-14 | Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning | Yejin Jeon et.al. | 2508.10412 | null |
2025-08-14 | Towards Frame-level Quality Predictions of Synthetic Speech | Michael Kuhlmann et.al. | 2508.10374 | null |
2025-08-15 | Training-Free Multimodal Large Language Model Orchestration | Tianyu Xie et.al. | 2508.10016 | null |
2025-09-16 | UtterTune: LoRA-Based Target-Language Pronunciation Edit and Control in Multilingual Text-to-Speech | Shuhei Kato et.al. | 2508.09767 | null |
2025-08-12 | ProMode: A Speech Prosody Model Conditioned on Acoustic and Textual Inputs | Eray Eren et.al. | 2508.09389 | null |
2025-08-12 | Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention’s Alternative | Xi Xuan et.al. | 2508.09294 | null |
2025-08-12 | HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis | Timo Teufel et.al. | 2508.09137 | null |
2025-08-12 | QAMRO: Quality-aware Adaptive Margin Ranking Optimization for Human-aligned Assessment of Audio Generation Systems | Chien-Chun Wang et.al. | 2508.08957 | null |
2025-08-10 | Scalable Controllable Accented TTS | Henry Li Xinyuan et.al. | 2508.07426 | null |
2025-08-10 | KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features | Ivan Kukanov et.al. | 2508.07337 | null |
2025-08-12 | XEmoRAG: Cross-Lingual Emotion Transfer with Controllable Intensity Using Retrieval-Augmented Generation | Tianlun Zuo et.al. | 2508.07302 | null |
2025-08-09 | Maestro-EVC: Controllable Emotional Voice Conversion Guided by References and Explicit Prosody | Jinsung Yoon et.al. | 2508.06890 | null |
2025-08-09 | Text to Speech System for Meitei Mayek Script | Gangular Singh Irengbam et.al. | 2508.06870 | null |
2025-08-08 | Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis | Wenjie Tian et.al. | 2508.06262 | null |
2025-08-08 | NEP: Autoregressive Image Editing via Next Editing Token Prediction | Huimin Wu et.al. | 2508.06044 | null |
2025-08-07 | A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding | Runchuan Ye et.al. | 2508.05385 | null |
2025-08-15 | Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS | M Anuprabha et.al. | 2508.05102 | null |
2025-08-07 | UniTalker: Conversational Speech-Visual Synthesis | Yifan Hu et.al. | 2508.04585 | null |
2025-08-06 | The State Of TTS: A Case Study with Human Fooling Rates | Praveen Srinivasa Varadhan et.al. | 2508.04179 | null |
📊 117 papers
📅 Publish Date | 📝 Title | 👥 Authors | 💻 Code | |
---|---|---|---|---|
2025-09-22 | Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation | Yiwen Guan et.al. | 2509.17930 | null |
2025-09-22 | Specification-Aware Machine Translation and Evaluation for Purpose Alignment | Yoko Kayano et.al. | 2509.17559 | null |
2025-09-22 | Enhancing Cross-Lingual Transfer through Reversible Transliteration: A Huffman-Based Approach for Low-Resource Languages | Wenhao Zhuang et.al. | 2509.17493 | null |
2025-09-22 | Filling in the Clinical Gaps in Benchmark: Case for HealthBench for the Japanese medical system | Shohei Hisada et.al. | 2509.17444 | null |
2025-09-22 | Scaling, Simplification, and Adaptation: Lessons from Pretraining on Machine-Translated Text | Dan John Velasco et.al. | 2509.17317 | null |
2025-09-22 | JPResUnet: A Joint Probability Density Function Translation Model in Partially Premixed Flames | Hanying Yang et.al. | 2509.17297 | null |
2025-09-21 | Extending Automatic Machine Translation Evaluation to Book-Length Documents | Kuang-Da Wang et.al. | 2509.17249 | null |
2025-09-21 | CUTE: A Multilingual Dataset for Enhancing Cross-Lingual Knowledge Transfer in Low-Resource Languages | Wenhao Zhuang et.al. | 2509.16914 | null |
2025-09-20 | Angular Dispersion Accelerates $k$ -Nearest Neighbors Machine Translation | Evgeniia Tokarchuk et.al. | 2509.16729 | null |
2025-09-19 | Whisper-UT: A Unified Translation Framework for Speech and Text | Cihan Xiao et.al. | 2509.16375 | null |
2025-09-19 | UPRPRC: Unified Pipeline for Reproducing Parallel Resources – Corpus from the United Nations | Qiuyang Lu et.al. | 2509.15789 | null |
2025-09-19 | Multilingual LLM Prompting Strategies for Medical English-Vietnamese Machine Translation | Nhu Vo et.al. | 2509.15640 | null |
2025-09-18 | RulER: Automated Rule-Based Semantic Error Localization and Repair for Code Translation | Shuo Jin et.al. | 2509.14829 | null |
2025-09-18 | Evaluating Large Language Models for Cross-Lingual Retrieval | Longfei Zuo et.al. | 2509.14749 | null |
2025-09-17 | Translate, then Detect: Leveraging Machine Translation for Cross-Lingual Toxicity Classification | Samuel J. Bell et.al. | 2509.14493 | null |
2025-09-17 | You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models | Paweł Mąka et.al. | 2509.14031 | null |
2025-09-17 | Audio-Based Crowd-Sourced Evaluation of Machine Translation Quality | Sami Ul Haq et.al. | 2509.14023 | null |
2025-09-17 | Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale | Hasan Abed Al Kader Hammoud et.al. | 2509.14008 | null |
2025-09-17 | Long-context Reference-based MT Quality Estimation | Sami Ul Haq et.al. | 2509.13980 | null |
2025-09-20 | Data Augmentation for Maltese NLP using Transliterated and Machine Translated Arabic Data | Kurt Micallef et.al. | 2509.12853 | null |
2025-09-17 | Human + AI for Accelerating Ad Localization Evaluation | Harshit Rajgarhia et.al. | 2509.12543 | null |
2025-09-15 | A comparison of pipelines for the translation of a low resource language based on transformers | Chiara Bonfanti et.al. | 2509.12514 | null |
2025-09-14 | PATIMT-Bench: A Multi-Scenario Benchmark for Position-Aware Text Image Machine Translation in Large Vision-Language Models | Wanru Zhuang et.al. | 2509.12278 | null |
2025-09-15 | XplaiNLP at CheckThat! 2025: Multilingual Subjectivity Detection with Finetuned Transformers and Prompt-Based Inference with Large Language Models | Ariana Sahitaj et.al. | 2509.12130 | null |
2025-09-04 | Optimal Multi-Task Learning at Regularization Horizon for Speech Translation Task | JungHo Jung et.al. | 2509.09701 | null |
2025-09-11 | Mitigating Language Barriers in Education: Developing Multilingual Digital Learning Materials with Machine Translation | Lucie Poláková et.al. | 2509.09473 | null |
2025-09-09 | Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost | Mihai Nadas et.al. | 2509.07829 | null |
2025-09-09 | From Scarcity to Efficiency: Investigating the Effects of Data Augmentation on African Machine Translation | Mardiyyah Oduwole et.al. | 2509.07471 | null |
2025-09-09 | Hunyuan-MT Technical Report | Mao Zheng et.al. | 2509.05209 | null |
2025-09-05 | PRIM: Towards Practical In-Image Multilingual Machine Translation | Yanzhi Tian et.al. | 2509.05146 | null |
2025-09-03 | Artificially Fluent: Swahili AI Performance Benchmarks Between English-Trained and Natively-Trained Datasets | Sophie Jaffer et.al. | 2509.04516 | null |
2025-09-04 | Exploring NLP Benchmarks in an Extremely Low-Resource Setting | Ulin Nuha et.al. | 2509.03962 | null |
2025-09-04 | Align-then-Slide: A complete evaluation framework for Ultra-Long Document-Level Machine Translation | Jiaxin Guo et.al. | 2509.03809 | null |
2025-09-24 | Expanding the WMT24++ Benchmark with Rumantsch Grischun, Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader | Jannis Vamvas et.al. | 2509.03148 | null |
2025-09-02 | The Forgotten Code: Validating a Century-Old Translation System with AI | Jean-Marie Le Ray et.al. | 2509.02506 | null |
2025-09-18 | CSRM-LLM: Embracing Multilingual LLMs for Cold-Start Relevance Matching in Emerging E-commerce Markets | Yujing Wang et.al. | 2509.01566 | null |
2025-08-28 | The Uneven Impact of Post-Training Quantization in Machine Translation | Benjamin Marie et.al. | 2508.20893 | null |
2025-08-28 | Languages Still Left Behind: Toward a Better Multilingual Machine Translation Benchmark | Chihiro Taguchi et.al. | 2508.20511 | null |
2025-09-06 | FlowMalTrans: Unsupervised Binary Code Translation for Malware Detection Using Flow-Adapter Architecture | Minghao Hu et.al. | 2508.20212 | null |
2025-08-26 | Improving Low-Resource Translation with Dictionary-Guided Fine-Tuning and RL: A Spanish-to-Wayuunaiki Study | Manuel Mosquera et.al. | 2508.19481 | null |
2025-09-03 | The Ramon Llull’s Thinking Machine for Automated Ideation | Xinran Zhao et.al. | 2508.19200 | null |
2025-08-26 | LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination | Ziming Zhu et.al. | 2508.18791 | null |
2025-08-26 | A New NMT Model for Translating Clinical Texts from English to Spanish | Rumeng Li et.al. | 2508.18607 | null |
2025-08-25 | COMET-poly: Machine Translation Metric Grounded in Other Candidates | Maike Züfle et.al. | 2508.18549 | null |
2025-08-24 | Evaluating the Impact of Verbal Multiword Expressions on Machine Translation | Linfeng Liu et.al. | 2508.17458 | null |
2025-08-22 | Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish | Yakup Abrek Er et.al. | 2508.16431 | null |
2025-08-22 | The Mediomatix Corpus: Parallel Data for Romansh Idioms via Comparable Schoolbooks | Zachary Hopton et.al. | 2508.16371 | null |
2025-09-23 | OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages | Raphaël Merx et.al. | 2508.16048 | null |
2025-08-21 | Confidence-Modulated Speculative Decoding for Large Language Models | Jaydip Sen et.al. | 2508.15371 | null |
2025-08-20 | Improving LLMs for Machine Translation Using Synthetic Preference Data | Dario Vajda et.al. | 2508.14951 | null |
2025-08-24 | Preliminary Ranking of WMT25 General Machine Translation Systems | Tom Kocmi et.al. | 2508.14909 | null |
2025-08-20 | Filling the Gap for Uzbek: Creating Translation Resources for Southern Uzbek | Mukhammadsaid Mamasaidov et.al. | 2508.14586 | null |
2025-08-20 | In2x at WMT25 Translation Task | Lei Pang et.al. | 2508.14472 | null |
2025-08-18 | Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MT | Zeeshan Ahmed et.al. | 2508.13358 | null |
2025-08-18 | DocHPLT: A Massively Multilingual Document-Level Translation Dataset | Dayyán O’Brien et.al. | 2508.13079 | null |
2025-08-18 | From SALAMANDRA to SALAMANDRATA: BSC Submission for WMT25 General Machine Translation Shared Task | Javier Garcia Gilabert et.al. | 2508.12774 | null |
2025-08-25 | SEA-BED: Southeast Asia Embedding Benchmark | Wuttikorn Ponwitayarat et.al. | 2508.12243 | null |
2025-08-14 | Neural Machine Translation for Coptic-French: Strategies for Low-Resource Ancient Languages | Nasma Chaoui et.al. | 2508.10683 | null |
2025-08-14 | Evaluating LLMs on Chinese Idiom Translation | Cai Yang et.al. | 2508.10421 | null |
2025-08-28 | Estimating Machine Translation Difficulty | Lorenzo Proietti et.al. | 2508.10175 | null |
2025-08-12 | TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation | Armel Zebaze et.al. | 2508.08680 | null |
2025-08-12 | UWB at WASSA-2024 Shared Task 2: Cross-lingual Emotion Detection | Jakub Šmíd et.al. | 2508.08650 | null |
2025-08-11 | Toward Machine Interpreting: Lessons from Human Interpreting Studies | Matthias Sperber et.al. | 2508.07964 | null |
2025-08-10 | ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models | Archchana Sindhujan et.al. | 2508.07484 | null |
2025-08-08 | Testing the Limits of Machine Translation from One Book | Jonathan Shaw et.al. | 2508.06665 | null |
2025-08-08 | Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models | Tomohiro Sawada et.al. | 2508.06621 | null |
2025-08-07 | PEACH: A sentence-aligned Parallel English-Arabic Corpus for Healthcare | Rania Al-Sabbagh et.al. | 2508.05722 | null |
2025-08-07 | MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs | Yufei Gao et.al. | 2508.05502 | null |
2025-08-07 | Optimal Corpus Aware Training for Neural Machine Translation | Yi-Hsiu Liao et.al. | 2508.05364 | null |
2025-08-11 | REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation | Nameer Hirschkind et.al. | 2508.04946 | null |
2025-08-05 | Marito: Structuring and Building Open Multilingual Terminologies for South African NLP | Vukosi Marivate et.al. | 2508.03529 | null |
2025-08-05 | Investigation on deep learning-based galaxy image translation models | Hengxin Ruan et.al. | 2508.03291 | null |
2025-08-05 | Cross-lingual Opinions and Emotions Mining in Comparable Documents | Motaz Saad et.al. | 2508.03112 | null |
2025-08-04 | A Survey on Data Security in Large Language Models | Kang Chen et.al. | 2508.02312 | null |
2025-08-04 | A French Version of the OLDI Seed Corpus | Malik Marmonier et.al. | 2508.02290 | null |
2025-08-04 | SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System | Serry Sibaee et.al. | 2508.02268 | null |
2025-08-25 | CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications | Raviraj Joshi et.al. | 2508.01710 | null |
2025-08-02 | ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations | Rania Al-Sabbagh et.al. | 2508.01411 | null |
2025-09-16 | Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation | Irene Iele et.al. | 2508.00766 | null |
2025-07-31 | Arabic Hate Speech Identification and Masking in Social Media using Deep Learning Models and Pre-trained Models Fine-tuning | Salam Thabet Doghmash et.al. | 2507.23661 | null |
2025-07-31 | Beyond the Cloud: Assessing the Benefits and Drawbacks of Local LLM Deployment for Translators | Peter Sandrini et.al. | 2507.23399 | null |
2025-07-29 | RL from Teacher-Model Refinement: Gradual Imitation Learning for Machine Translation | Dongyub Jude Lee et.al. | 2507.22219 | null |
2025-07-31 | Multi-Hypothesis Distillation of Multilingual Neural Translation Models for Low-Resource Languages | Aarón Galiano-Jiménez et.al. | 2507.21568 | null |
2025-07-07 | iLSU-T: an Open Dataset for Uruguayan Sign Language Translation | Ariel E. Stassi et.al. | 2507.21104 | null |
2025-07-28 | Multilingual Self-Taught Faithfulness Evaluators | Carlo Alfano et.al. | 2507.20752 | null |
2025-09-02 | Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation | Abdullah Alabdullah et.al. | 2507.20301 | null |
2025-07-29 | Mind the Language Gap in Digital Humanities: LLM-Aided Translation of SKOS Thesauri | Felix Kraus et.al. | 2507.19537 | null |
2025-07-25 | LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation | Jingxuan Wei et.al. | 2507.18940 | null |
2025-07-24 | GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation | Jiafeng Xiong et.al. | 2507.18562 | null |
2025-07-24 | Uncertainty Quantification for Evaluating Machine Translation Bias | Ieva Raminta Staliūnaitė et.al. | 2507.18338 | null |
2025-07-25 | Natural Language Processing for Tigrinya: Current State and Future Directions | Fitsum Gaim et.al. | 2507.17974 | null |
2025-07-23 | Dual-branch Prompting for Multimodal Machine Translation | Jie Wang et.al. | 2507.17588 | null |
2025-07-22 | Introducing Quality Estimation to Machine Translation Post-editing Workflow: An Empirical Study on Its Usefulness | Siqi Liu et.al. | 2507.16515 | null |
2025-07-22 | GG-BBQ: German Gender Bias Benchmark for Question Answering | Shalaka Satheesh et.al. | 2507.16410 | null |
2025-07-21 | Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification | Vitaly Protasov et.al. | 2507.15557 | null |
2025-07-20 | A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge’ez Script | Hellina Hailu Nigatu et.al. | 2507.15142 | null |
2025-08-21 | Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters | Shanbo Cheng et.al. | 2507.13618 | null |
2025-07-16 | Mitigating Stylistic Biases of Machine Translation Systems via Monolingual Corpora Only | Xuanqi Gao et.al. | 2507.13395 | null |
2025-07-16 | The first open machine translation system for the Chechen language | Abu-Viskhan A. Umishov et.al. | 2507.12672 | null |
2025-09-19 | Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese | Yikang Liu et.al. | 2507.12260 | null |
2025-07-16 | Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language Models | Bo Zeng et.al. | 2507.11882 | null |
2025-07-31 | ILID: Native Script Language Identification for Indian Languages | Yash Ingle et.al. | 2507.11832 | null |
2025-08-30 | How Important is `Perfect’ English for Machine Translation Prompts? | Patrícia Schmidtová et.al. | 2507.09509 | null |
2025-07-11 | Improving MLLM’s Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency | Yupu Liang et.al. | 2507.08309 | null |
2025-07-10 | Conditional Unigram Tokenization with Parallel Data | Gianluca Vico et.al. | 2507.07824 | null |
2025-07-10 | Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation | Yupu Liang et.al. | 2507.07572 | null |
2025-07-09 | Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation | Kazi Mahathir Rahman et.al. | 2507.06530 | null |
2025-07-09 | Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings | Russell Taylor et.al. | 2507.06506 | null |
2025-07-07 | A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic | Juan Moreno Gonzalez et.al. | 2507.04746 | null |
2025-07-09 | Losing our Tail – Again: On (Un)Natural Selection And Multilingual Large Language Models | Eva Vanmassenhove et.al. | 2507.03933 | null |
2025-07-17 | Learning to Translate Ambiguous Terminology by Preference Optimization on Post-Edits | Nathaniel Berger et.al. | 2507.03580 | null |
2025-07-04 | GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation | Himanshu Dutta et.al. | 2507.03311 | null |
2025-07-01 | TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation | Xi Xuan et.al. | 2507.00875 | null |
2025-07-01 | Neural translation for Stokes inversion and synthesis | A. Asensio Ramos et.al. | 2507.00594 | null |
2025-06-30 | Natural language processing for African languages | David Ifeoluwa Adelani et.al. | 2507.00297 | link |
2025-06-30 | Bridging the Gap with Retrieval-Augmented Generation: Making Prosthetic Device User Manuals Available in Marginalised Languages | Ikechukwu Ogbonna et.al. | 2506.23958 | null |
2025-07-07 | CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation | Yi Liu et.al. | 2506.23347 | null |
📊 180 papers
📅 Publish Date | 📝 Title | 👥 Authors | 💻 Code | |
---|---|---|---|---|
2025-09-23 | Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps | Gabriel Maldonado et.al. | 2509.19252 | null |
2025-09-23 | PPG-Distill: Efficient Photoplethysmography Signals Analysis via Foundation Model Distillation | Juntong Ni et.al. | 2509.19215 | null |
2025-09-23 | Exact WKB Formulation of Quantization and Particle Production in Time-Dependent Backgrounds | Ryo Namba et.al. | 2509.19194 | null |
2025-09-23 | Data-Free Knowledge Distillation for LiDAR-Aided Beam Tracking in MmWave Systems | Abolfazl Zakeri et.al. | 2509.19092 | null |
2025-09-23 | Enhancing Noise Robustness for Neural Speech Codecs through Resource-Efficient Progressive Quantization Perturbation Simulation | Rui-Chen Zheng et.al. | 2509.19025 | null |
2025-09-23 | Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding | Zhanglu Yan et.al. | 2509.18968 | null |
2025-09-23 | VGGT-DP: Generalizable Robot Control via Vision Foundation Models | Shijia Ge et.al. | 2509.18778 | null |
2025-09-23 | DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision | Azad Singh et.al. | 2509.18765 | null |
2025-09-23 | Bi-VLM: Pushing Ultra-Low Precision Post-Training Quantization Boundaries in Vision-Language Models | Xijun Wang et.al. | 2509.18763 | null |
2025-09-23 | Enhanced Survival Trees | Ruiwen Zhou et.al. | 2509.18494 | null |
2025-09-23 | Codebook-Based Adaptive Feature Compression With Semantic Enhancement for Edge-Cloud Systems | Xinyu Wang et.al. | 2509.18481 | null |
2025-09-22 | Individualized non-uniform quantization for vector search | Mariano Tepper et.al. | 2509.18471 | null |
2025-09-22 | TinyBEV: Cross Modal Knowledge Distillation for Efficient Multi Task Bird’s Eye View Perception and Planning | Reeshad Khan et.al. | 2509.18372 | null |
2025-09-21 | nDNA – the Semantic Helix of Artificial Cognition | Amitava Das et.al. | 2509.18216 | null |
2025-09-19 | MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation | Rui Liu et.al. | 2509.18198 | null |
2025-09-19 | TinyEcoWeedNet: Edge Efficient Real-Time Aerial Agricultural Weed Detection | Omar H. Khater et.al. | 2509.18193 | null |
2025-09-22 | Visual Detector Compression via Location-Aware Discriminant Analysis | Qizhen Lan et.al. | 2509.17968 | null |
2025-09-23 | Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark | Siu Hang Ho et.al. | 2509.17894 | null |
2025-09-23 | Breaking Token Into Concepts: Exploring Extreme Compression in Token Representation Via Compositional Shared Semantics | Kavin R V et.al. | 2509.17737 | null |
2025-09-22 | RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion | Geonho Bang et.al. | 2509.17712 | null |
2025-09-22 | Stratification of the half-density quantization of the Jeffrey-Weitsman-Witten invariants | Adrian Chitan et.al. | 2509.17656 | null |
2025-09-22 | Evaluating the Energy Efficiency of NPU-Accelerated Machine Learning Inference on Embedded Microcontrollers | Anastasios Fanariotis et.al. | 2509.17533 | null |
2025-09-22 | MapCoder-Lite: Squeezing Multi-Agent Coding into a Single Small LLM | Woongkyu Lee et.al. | 2509.17489 | null |
2025-09-22 | Learning Dexterous Manipulation with Quantized Hand State | Ying Feng et.al. | 2509.17450 | null |
2025-09-23 | QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models | Hyesung Jeon et.al. | 2509.17428 | null |
2025-09-22 | Physics-Informed Operator Learning for Hemodynamic Modeling | Ryan Chappell et.al. | 2509.17293 | null |
2025-09-21 | On the Quantization of the Electromagnetic Field with Magnetic Monopoles | Kanan Anwar et.al. | 2509.17284 | null |
2025-09-21 | PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models | He Xiao et.al. | 2509.16989 | null |
2025-09-21 | Equip Pre-ranking with Target Attention by Residual Quantization | Yutong Li et.al. | 2509.16931 | null |
2025-09-21 | PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion | Xuewan He et.al. | 2509.16897 | null |
2025-09-20 | Knowledge Distillation for Variational Quantum Convolutional Neural Networks on Heterogeneous Data | Kai Yu et.al. | 2509.16699 | null |
2025-09-20 | When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs | Abhirama Subramanyam Penamakuri et.al. | 2509.16633 | null |
2025-09-20 | The Role of Vocabularies in Learning Sparse Representations for Ranking | Hiun Kim et.al. | 2509.16621 | null |
2025-09-20 | Federated Learning with Ad-hoc Adapter Insertions: The Case of Soft-Embeddings for Training Classifier-as-Retriever | Marijan Fofonjka et.al. | 2509.16508 | null |
2025-09-20 | PrediPrune: Reducing Verification Overhead in Souper with Machine Learning Driven Pruning | Ange-Thierry Ishimwe et.al. | 2509.16497 | null |
2025-09-20 | Eye Gaze Tells You Where to Compute: Gaze-Driven Efficient VLMs | Qinyu Chen et.al. | 2509.16476 | null |
2025-09-19 | Locally Purified Maximally Mixed States At Scale: Entanglement Pruning and Symmetries | Amit Jamadagni et.al. | 2509.16439 | null |
2025-09-19 | Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research | Richard Diehl Martinez et.al. | 2509.16413 | null |
2025-09-19 | A Unified AI Approach for Continuous Monitoring of Human Health and Diseases from Intensive Care Unit to Home with Physiological Foundation Models (UNIPHY+) | Minxiao Wang et.al. | 2509.16348 | null |
2025-09-19 | The Role of High-Performance GPU Resources in Large Language Model Based Radiology Imaging Diagnosis | Jyun-Ping Kao et.al. | 2509.16328 | null |
2025-09-18 | Language Modeling with Learned Meta-Tokens | Alok N. Shah et.al. | 2509.16278 | null |
2025-09-19 | DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning | Sikai Bai et.al. | 2509.16105 | null |
2025-09-19 | DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching | Meng Yang et.al. | 2509.16017 | null |
2025-09-19 | DISPATCH: Distilling Selective Patches for Speech Enhancement | Dohwan Kim et.al. | 2509.15922 | null |
2025-09-19 | RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation | Davide Ettori et.al. | 2509.15724 | null |
2025-09-19 | Once Upon a Time: Interactive Learning for Storytelling with Small Language Models | Jonas Mayer Martins et.al. | 2509.15714 | null |
2025-09-19 | Training-Free Pyramid Token Pruning for Efficient Large Vision-Language Models via Region, Token, and Instruction-Guided Importance | Yuxuan Liang et.al. | 2509.15704 | null |
2025-09-19 | pFedSAM: Personalized Federated Learning of Segment Anything Model for Medical Image Segmentation | Tong Wang et.al. | 2509.15638 | null |
2025-09-19 | MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training | Junbiao Pang et.al. | 2509.15514 | null |
2025-09-19 | Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers | Zahra Aref et.al. | 2509.15498 | null |
2025-09-19 | Backdoor Mitigation via Invertible Pruning Masks | Kealan Dunnett et.al. | 2509.15497 | null |
2025-09-18 | IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs | Junchen Zhao et.al. | 2509.15455 | null |
2025-09-18 | Fair-GPTQ: Bias-Aware Quantization for Large Language Models | Irina Proskurina et.al. | 2509.15206 | null |
2025-09-18 | MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration | Giorgos Armeniakos et.al. | 2509.15187 | null |
2025-09-18 | No Modality Left Behind: Adapting to Missing Modalities via Knowledge Distillation for Brain Tumor Segmentation | Shenghao Zhu et.al. | 2509.15017 | null |
2025-09-19 | MeanFlowSE: one-step generative speech enhancement via conditional mean flow | Duojia Li et.al. | 2509.14858 | null |
2025-09-18 | Delta Knowledge Distillation for Large Language Models | Yihan Cao et.al. | 2509.14526 | null |
2025-09-17 | NIRVANA: Structured pruning reimagined for large language models compression | Mengting Ai et.al. | 2509.14230 | null |
2025-09-17 | Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions | Michal Szczepanski et.al. | 2509.14165 | null |
2025-09-17 | SV-Mixer: Replacing the Transformer Encoder with Lightweight MLPs for Self-Supervised Model Compression in Speaker Verification | Jungwoo Heo et.al. | 2509.14136 | null |
2025-09-17 | MOCHA: Multi-modal Objects-aware Cross-arcHitecture Alignment | Elena Camuffo et.al. | 2509.14001 | null |
2025-09-17 | Asymptotic Analysis of Nonlinear One-Bit Precoding in Massive MIMO Systems via Approximate Message Passing | Zheyu Wu et.al. | 2509.13955 | null |
2025-09-19 | Efficient Quantization-Aware Neural Receivers: Beyond Post-Training Quantization | SaiKrishna Saketh Yellapragada et.al. | 2509.13786 | null |
2025-09-17 | TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge | Zhirui Huang et.al. | 2509.13765 | null |
2025-09-18 | DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning | Yaxin Gao et.al. | 2509.13723 | null |
2025-09-17 | InfraMind: A Novel Exploration-based GUI Agentic Framework for Mission-critical Industrial Management | Liangtao Lin et.al. | 2509.13704 | null |
2025-09-17 | A High-Quality and Low-Complexity Streamable Neural Speech Codec with Knowledge Distillation | En-Wei Zhang et.al. | 2509.13670 | null |
2025-09-16 | AQUA-LLM: Evaluating Accuracy, Quantization, and Adversarial Robustness Trade-offs in LLMs for Cybersecurity Question Answering | Onat Gungor et.al. | 2509.13514 | null |
2025-09-16 | Improving 3D Gaussian Splatting Compression by Scene-Adaptive Lattice Vector Quantization | Hao Xu et.al. | 2509.13482 | null |
2025-09-16 | LLMs for energy and macronutrients estimation using only text data from 24-hour dietary recalls: a parameter-efficient fine-tuning experiment using a 10-shot prompt | Rodrigo M Carrillo-Larco et.al. | 2509.13268 | null |
2025-09-18 | HAM: Hierarchical Adapter Merging for Scalable Continual Learning | Eric Nuertey Coleman et.al. | 2509.13211 | null |
2025-09-16 | Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance | Ligang Chang et.al. | 2509.13210 | null |
2025-09-16 | Multi-Model Synthetic Training for Mission-Critical Small Language Models | Nolan Platt et.al. | 2509.13047 | null |
2025-09-16 | Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models | Yuval Weiss et.al. | 2509.12960 | null |
2025-09-17 | A Novel Compression Framework for YOLOv8: Achieving Real-Time Aerial Object Detection on Edge Devices via Structured Pruning and Channel-Wise Distillation | Melika Sabaghian et.al. | 2509.12918 | null |
2025-09-16 | Energy-Efficient Quantized Federated Learning for Resource-constrained IoT devices | Wilfrid Sougrinoma Compaoré et.al. | 2509.12814 | null |
2025-09-16 | NEFT: A Unified Transformer Framework for Efficient Near-Field CSI Feedback in XL-MIMO Systems | Haiyang Li et.al. | 2509.12748 | null |
2025-09-16 | Effective Gaussian Management for High-fidelity Object Reconstruction | Jiateng Liu et.al. | 2509.12742 | null |
2025-09-16 | ZTree: A Subgroup Identification Based Decision Tree Learning Framework | Eric Cheng et.al. | 2509.12688 | null |
2025-09-16 | The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning | Titong Jiang et.al. | 2509.12594 | null |
2025-09-16 | iCD: A Implicit Clustering Distillation Mathod for Structural Information Mining | Xiang Xue et.al. | 2509.12553 | null |
2025-09-16 | LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations | Robin Vujanic et.al. | 2509.12539 | null |
2025-09-15 | Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction | Ryan Lucas et.al. | 2509.12464 | null |
2025-09-15 | GhostNetV3-Small: A Tailored Architecture and Comparative Study of Distillation Strategies for Tiny Images | Florian Zager et.al. | 2509.12380 | null |
2025-09-15 | Unsupervised Atomic Data Mining via Multi-Kernel Graph Autoencoders for Machine Learning Force Fields | Hong Sun et.al. | 2509.12358 | null |
2025-09-15 | SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation | Hui Li et.al. | 2509.12086 | null |
2025-09-15 | AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models | Sangjun Lee et.al. | 2509.12019 | null |
2025-09-15 | CLAIRE: A Dual Encoder Network with RIFT Loss and Phi-3 Small Language Model Based Interpretability for Cross-Modality Synthetic Aperture Radar and Optical Land Cover Segmentation | Debopom Sutradhar et.al. | 2509.11952 | null |
2025-09-16 | Enriched text-guided variational multimodal knowledge distillation network (VMD) for automated diagnosis of plaque vulnerability in 3D carotid artery MRI | Bo Cao et.al. | 2509.11924 | null |
2025-09-15 | SpecVLM: Fast Speculative Decoding in Vision-Language Models | Haiduo Huang et.al. | 2509.11815 | null |
2025-09-15 | Visualization and Analysis of the Loss Landscape in Graph Neural Networks | Samir Moustafa et.al. | 2509.11792 | null |
2025-09-15 | Quantization Errors, Human–AI Interaction, and Approximate Fixed Points in $L^1(μ)$ | Faruk Alpay et.al. | 2509.11700 | null |
2025-09-15 | DARD: Dice Adversarial Robustness Distillation against Adversarial Attacks | Jing Zou et.al. | 2509.11525 | null |
2025-09-14 | Knowledge Distillation for Sensing-Assisted Long-Term Beam Tracking in mmWave Communications | Mengyuan Ma et.al. | 2509.11419 | null |
2025-09-14 | Investigating the Lottery Ticket Hypothesis for Variational Quantum Circuits | Michael Kölle et.al. | 2509.11190 | null |
2025-09-16 | Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs | Hang Guo et.al. | 2509.11177 | null |
2025-09-14 | SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting | Ashkan Taghipour et.al. | 2509.11116 | null |
2025-09-13 | GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings | Yixuan Tang et.al. | 2509.10844 | null |
2025-09-12 | Automated MCQA Benchmarking at Scale: Evaluating Reasoning Traces as Retrieval Sources for Domain Adaptation of Small Language Models | Ozan Gokdemir et.al. | 2509.10744 | null |
2025-09-12 | Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs | Yixiao Zhou et.al. | 2509.10377 | null |
2025-09-12 | Efficient Learned Image Compression Through Knowledge Distillation | Fabien Allemand et.al. | 2509.10366 | null |
2025-09-12 | I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation | Jordan Sassoon et.al. | 2509.10334 | null |
2025-09-12 | Investigating Language Model Capabilities to Represent and Process Formal Knowledge: A Preliminary Study to Assist Ontology Engineering | Hanna Abi Akl et.al. | 2509.10249 | null |
2025-09-12 | FedBiF: Communication-Efficient Federated Learning via Bits Freezing | Shiwei Li et.al. | 2509.10161 | null |
2025-09-12 | Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization | Yifan Chang et.al. | 2509.10140 | null |
2025-09-12 | Efficient and Accurate Downfacing Visual Inertial Odometry | Jonas Kühne et.al. | 2509.10021 | null |
2025-09-12 | Toward Green Code: Prompting Small Language Models for Energy-Efficient Code Generation | Humza Ashraf et.al. | 2509.09947 | null |
2025-09-12 | Acoustic Scene Classification Using CNN-GRU Model Without Knowledge Distillation | Ee-Leng Tan et.al. | 2509.09931 | null |
2025-09-11 | ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms | Bingxin Xu et.al. | 2509.09679 | null |
2025-09-11 | ReBaNO: Reduced Basis Neural Operator Mitigating Generalization Gaps and Achieving Discretization Invariance | Haolan Zheng et.al. | 2509.09611 | null |
2025-09-11 | Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference | Haoran Wu et.al. | 2509.09505 | null |
2025-09-11 | Unified Start, Personalized End: Progressive Pruning for Efficient 3D Medical Image Segmentation | Linhao Li et.al. | 2509.09267 | link |
2025-09-11 | Adaptive Knowledge Distillation using a Device-Aware Teacher for Low-Complexity Acoustic Scene Classification | Seung Gyu Jeong et.al. | 2509.09262 | null |
2025-09-11 | SQAP-VLA: A Synergistic Quantization-Aware Pruning Framework for High-Performance Vision-Language-Action Models | Hengyu Fang et.al. | 2509.09090 | null |
2025-09-10 | CSI Compression Beyond Latents: End-to-End Hybrid Attention-CNN Networks with Entropy Regularization | Maryam Ansarifard et.al. | 2509.08776 | null |
2025-09-10 | Compressing CNN models for resource-constrained systems by channel and layer pruning | Ahmed Sadaqa et.al. | 2509.08714 | null |
2025-09-10 | BitROM: Weight Reload-Free CiROM Architecture Towards Billion-Parameter 1.58-bit LLM Inference | Wenlun Zhang et.al. | 2509.08542 | null |
2025-09-12 | SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors | Ruoxuan Li et.al. | 2509.08395 | null |
2025-09-10 | Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning | Wei Huang et.al. | 2509.08255 | null |
2025-09-10 | Strategies for Improving Communication Efficiency in Distributed and Federated Learning: Compression, Local Training, and Personalization | Kai Yi et.al. | 2509.08233 | null |
2025-09-09 | Risk-Bounded Multi-Agent Visual Navigation via Dynamic Budget Allocation | Viraj Parimi et.al. | 2509.08157 | null |
2025-09-09 | Tensor-Train Operator Inference | Engin Danis et.al. | 2509.08071 | null |
2025-09-09 | SA-OOSC: A Multimodal LLM-Distilled Semantic Communication Framework for Enhanced Coding Efficiency with Scenario Understanding | Feifan Zhang et.al. | 2509.07436 | null |
2025-09-09 | The Role of Exploration Modules in Small Language Models for Knowledge Graph Question Answering | Yi-Jie Cheng et.al. | 2509.07399 | null |
2025-09-09 | Knowledge Distillation Driven Semantic NOMA for Image Transmission with Diffusion Model | Qifei Wang et.al. | 2509.07363 | null |
2025-09-09 | Word2Spike: Poisson Rate Coding for Associative Memories and Neuromorphic Algorithms | Archit Kalra et.al. | 2509.07361 | null |
2025-09-09 | Quantization of the electromagnetic fields from single atomic or molecular radiators | Valerica Raicu et.al. | 2509.07359 | null |
2025-09-08 | Recursive algorithm for constructing antisymmetric fermionic states in first quantization mapping | E. Rule et.al. | 2509.07279 | null |
2025-09-08 | HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring | Xin Wang et.al. | 2509.07260 | null |
2025-09-08 | Efficient Multi-Agent Coordination via Dynamic Joint-State Graph Construction | Yanlin Zhou et.al. | 2509.07234 | null |
2025-09-08 | Efficient Low-Memory Fast Stack Decoding with Variance Polarization for PAC Codes | Mohsen Moradi et.al. | 2509.07231 | null |
2025-09-08 | Explaining How Quantization Disparately Skews a Model | Abhimanyu Bellam et.al. | 2509.07222 | null |
2025-09-07 | MEGS $^{2}$ : Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning | Jiarui Chen et.al. | 2509.07021 | null |
2025-09-08 | H $_{2}$ OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers | Wenhao Li et.al. | 2509.06956 | null |
2025-09-08 | COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens | Eugene Kwek et.al. | 2509.06836 | null |
2025-09-08 | Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning | Song Yu et.al. | 2509.06436 | null |
2025-09-08 | Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models | Jaemin Son et.al. | 2509.06415 | null |
2025-09-08 | 3DOF+Quantization: 3DGS quantization for large scenes with limited Degrees of Freedom | Matthieu Gendrin et.al. | 2509.06400 | null |
2025-09-08 | Variational Garrote for Statistical Physics-based Sparse and Robust Variable Selection | Hyungjoon Soh et.al. | 2509.06383 | null |
2025-09-08 | Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks? | Junjie Mu et.al. | 2509.06350 | null |
2025-09-08 | LoaQ: Layer-wise Output Approximation Quantization | Li Lin et.al. | 2509.06297 | null |
2025-09-15 | FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving | Kyungmin Bin et.al. | 2509.06261 | null |
2025-09-10 | BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models | Yuming Li et.al. | 2509.06040 | null |
2025-09-07 | StripDet: Strip Attention-Based Lightweight 3D Object Detection from Point Cloud | Weichao Wang et.al. | 2509.05954 | null |
2025-09-07 | Quantization of bounded symplectic domains associated with compact Lie groups | Alexey A. Sharapov et.al. | 2509.05931 | null |
2025-09-06 | Batalin-Fradkin-Vilkovisky Quantization of FLPR model | Ansha S. Nair et.al. | 2509.05632 | null |
2025-09-06 | Quantization of spin circular photogalvanic effect in altermagnetic Weyl semimetals | Hiroki Yoshida et.al. | 2509.05620 | null |
2025-09-06 | SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning | Hanzhen Wang et.al. | 2509.05614 | null |
2025-09-09 | Mitigating Spurious Correlations Between Question and Answer via Chain-of-Thought Correctness Perception Distillation | Hongyan Xie et.al. | 2509.05602 | null |
2025-09-06 | ProfilingAgent: Profiling-Guided Agentic Reasoning for Adaptive Model Optimization | Sadegh Jafari et.al. | 2509.05584 | null |
2025-09-06 | Sensitivity-Aware Post-Training Quantization for Deep Neural Networks | Zekang Zheng et.al. | 2509.05576 | null |
2025-09-05 | SuperSNN: A Hardware-Aware Framework for Physically Realizable, High-Performance Superconducting Spiking Neural Network Chips | Changxu Song et.al. | 2509.05532 | null |
2025-09-05 | Dynamic Sensitivity Filter Pruning using Multi-Agent Reinforcement Learning For DCNN’s | Iftekhar Haider Chowdhury et.al. | 2509.05446 | null |
2025-09-05 | Accuracy-Constrained CNN Pruning for Efficient and Reliable EEG-Based Seizure Detection | Mounvik K et.al. | 2509.05190 | null |
2025-09-05 | FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies | Moritz Reuss et.al. | 2509.04996 | null |
2025-09-05 | PLaMo 2 Technical Report | Preferred Networks et.al. | 2509.04897 | null |
2025-09-05 | AI-Driven Fronthaul Link Compression in Wireless Communication Systems: Review and Method Design | Keqin Zhang et.al. | 2509.04805 | null |
2025-09-05 | STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs | Han Liang et.al. | 2509.04719 | null |
2025-09-08 | Advancing SLM Tool-Use Capability using Reinforcement Learning | Dhruvi Paprunia et.al. | 2509.04518 | null |
2025-09-02 | ProST: Progressive Sub-task Training for Pareto-Optimal Multi-agent Systems Using Small Language Models | Biddut Sarker Bijoy et.al. | 2509.04508 | null |
2025-09-04 | PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference | Krishna Teja Chitty-Venkata et.al. | 2509.04377 | null |
2025-09-04 | Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression | Sara Makenali et.al. | 2509.04244 | null |
2025-09-04 | Real Time FPGA Based Transformers & VLMs for Vision Tasks: SOTA Designs and Optimizations | Safa Mohammed Sali et.al. | 2509.04162 | null |
2025-09-04 | Real Time FPGA Based CNNs for Detection, Classification, and Tracking in Autonomous Systems: State of the Art Designs and Optimizations | Safa Mohammed Sali et.al. | 2509.04153 | null |
2025-09-04 | Duality between polyhedral approximation of value functions and optimal quantization of measures | Abdellah Bulaich Mehamdi et.al. | 2509.04101 | null |
2025-09-04 | Robust MIMO Semantic Communication with Imperfect CSI via Knowledge Distillation | Mingze Gong et.al. | 2509.04005 | null |
2025-09-04 | Data-Augmented Quantization-Aware Knowledge Distillation | Justin Kur et.al. | 2509.03850 | null |
2025-09-03 | QuantV2X: A Fully Quantized Multi-Agent System for Cooperative Perception | Seth Z. Zhao et.al. | 2509.03704 | null |
2025-09-03 | DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling | Yubo Gao et.al. | 2509.03472 | null |
2025-09-08 | Amplifying Effective CXL Memory Bandwidth for LLM Inference via Transparent Near-Data Processing | Rui Xie et.al. | 2509.03377 | null |
2025-09-03 | NeurStore: Efficient In-database Deep Learning Model Management System | Siqi Xiang et.al. | 2509.03228 | null |
2025-09-03 | BAMG: A Block-Aware Monotonic Graph Index for Disk-Based Approximate Nearest Neighbor Search | Huiling Li et.al. | 2509.03226 | null |
2025-09-03 | CapsBeam: Accelerating Capsule Network based Beamformer for Ultrasound Non-Steered Plane Wave Imaging on Field Programmable Gate Array | Abdul Rahoof et.al. | 2509.03201 | null |
2025-09-03 | Deep Self-knowledge Distillation: A hierarchical supervised learning for coronary artery segmentation | Mingfeng Lin et.al. | 2509.03173 | null |
2025-09-03 | FastCaps: A Design Methodology for Accelerating Capsule Network on Field Programmable Gate Arrays | Abdul Rahoof et.al. | 2509.03103 | null |
2025-09-03 | Binary Quantization For LLMs Through Dynamic Grouping | Xinzhe Zheng et.al. | 2509.03054 | null |
2025-09-02 | LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference | Krishna Teja Chitty-Venkata et.al. | 2509.02753 | null |
2025-09-02 | A quantization of the $\operatorname{SL}_2(\mathbb{C})$ -Chern-Simons invariant of tangle exteriors | Calvin McPhail-Snyder et.al. | 2509.02365 | null |
2025-09-02 | All-optical band structure reconstruction and onset of Landau quantization of Dirac fermions | Josef Riepl et.al. | 2509.02362 | null |
2025-09-02 | Operator Algebras and Third Quantization | Yidong Chen et.al. | 2509.02293 | null |
📊 130 papers
📅 Publish Date | 📝 Title | 👥 Authors | 💻 Code | |
---|---|---|---|---|
2025-09-23 | Generative data augmentation for biliary tract detection on intraoperative images | Cristina Iacono et.al. | 2509.18958 | null |
2025-09-23 | PIE: Perception and Interaction Enhanced End-to-End Motion Planning for Autonomous Driving | Chengran Yuan et.al. | 2509.18609 | null |
2025-09-23 | SynSonic: Augmenting Sound Event Detection through Text-to-Audio Diffusion ControlNet and Effective Sample Filtering | Jiarui Hai et.al. | 2509.18603 | null |
2025-09-23 | Efficient Breast and Ovarian Cancer Classification via ViT-Based Preprocessing and Transfer Learning | Richa Rawat et.al. | 2509.18553 | null |
2025-09-23 | Reverse-Complement Consistency for DNA Language Models | Mingqian Ma et.al. | 2509.18529 | null |
2025-09-21 | Automatic Classification of Magnetic Chirality of Solar Filaments from H-Alpha Observations | Alexis Chalmers et.al. | 2509.18214 | null |
2025-09-22 | Intra-Cluster Mixup: An Effective Data Augmentation Technique for Complementary-Label Learning | Tan-Ha Mai et.al. | 2509.17971 | null |
2025-09-22 | SeqUDA-Rec: Sequential User Behavior Enhanced Recommendation via Global Unsupervised Data Augmentation for Personalized Content Marketing | Ruihan Luo et.al. | 2509.17361 | null |
2025-09-21 | Enhanced Detection of Tiny Objects in Aerial Images | Kihyun Kim et.al. | 2509.17078 | null |
2025-09-23 | Penalizing Boundary Activation for Object Completeness in Diffusion Models | Haoyang Xu et.al. | 2509.16968 | null |
2025-09-20 | IPF-RDA: An Information-Preserving Framework for Robust Data Augmentation | Suorong Yang et.al. | 2509.16678 | null |
2025-09-20 | MedCutMix: A Data-Centric Approach to Improve Radiology Vision-Language Pre-training with Disease Awareness | Sinuo Wang et.al. | 2509.16673 | null |
2025-09-20 | AISTAT lab system for DCASE2025 Task6: Language-based audio retrieval | Hyun Jun Kim et.al. | 2509.16649 | null |
2025-09-19 | Intrinsic Meets Extrinsic Fairness: Assessing the Downstream Impact of Bias Mitigation in Large Language Models | ‘Mina Arzaghi’ et.al. | 2509.16462 | null |
2025-09-19 | Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval | Pranjal A. Chitale et.al. | 2509.16442 | null |
2025-09-19 | DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching | Meng Yang et.al. | 2509.16017 | null |
2025-09-19 | Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization | Yun Tang et.al. | 2509.15579 | null |
2025-09-19 | Contrastive Learning with Spectrum Information Augmentation in Abnormal Sound Detection | Xinxin Meng et.al. | 2509.15570 | null |
2025-09-18 | Generative AI Meets Wireless Sensing: Towards Wireless Foundation Model | Zheng Yang et.al. | 2509.15258 | null |
2025-09-17 | GenCAD-3D: CAD Program Generation using Multimodal Latent Space Alignment and Synthetic Dataset Balancing | Nomi Yu et.al. | 2509.15246 | null |
2025-09-18 | Synthetic-to-Real Object Detection using YOLOv11 and Domain Randomization Strategies | Luisa Torquato Niño et.al. | 2509.15045 | null |
2025-09-18 | Data Augmentation via Latent Diffusion Models for Detecting Smell-Related Objects in Historical Artworks | Ahmed Sheta et.al. | 2509.14755 | null |
2025-09-18 | SpeechMLC: Speech Multi-label Classification | Miseul Kim et.al. | 2509.14677 | null |
2025-09-18 | How Does Instrumental Music Help SingFake Detection? | Xuanjun Chen et.al. | 2509.14675 | null |
2025-09-18 | SWE-QA: Can Language Models Answer Repository-level Code Questions? | Weihan Peng et.al. | 2509.14635 | null |
2025-09-18 | Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation | Miseul Kim et.al. | 2509.14632 | null |
2025-09-18 | LSTC-MDA: A Unified Framework for Long-Short Term Temporal Convolution and Mixed Data Augmentation in Skeleton-Based Action Recognition | Feng Ding et.al. | 2509.14619 | null |
2025-09-18 | Leveraging IndoBERT and DistilBERT for Indonesian Emotion Classification in E-Commerce Reviews | William Christian et.al. | 2509.14611 | null |
2025-09-18 | VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models | Huanchen Wang et.al. | 2509.14571 | null |
2025-09-18 | Learning to Retrieve for Environmental Knowledge Discovery: An Augmentation-Adaptive Self-Supervised Learning Framework | Shiyuan Luo et.al. | 2509.14563 | null |
2025-09-18 | Data coarse graining can improve model performance | Alex Nguyen et.al. | 2509.14498 | null |
2025-09-17 | Sequential Data Augmentation for Generative Recommendation | Geon Lee et.al. | 2509.13648 | null |
2025-09-17 | Multimodal signal fusion for stress detection using deep neural networks: a novel approach for converting 1D signals to unified 2D images | Yasin Hasanpoor et.al. | 2509.13636 | null |
2025-09-16 | Adversarial Appearance Learning in Augmented Cityscapes for Pedestrian Recognition in Autonomous Driving | Artem Savkin et.al. | 2509.13507 | null |
2025-09-16 | Contrastive timbre representations for musical instrument and synthesizer retrieval | Gwendal Le Vaillant et.al. | 2509.13285 | null |
2025-09-16 | Time-step Mixup for Efficient Spiking Knowledge Transfer from Appearance to Event Domain | Yuqi Xie et.al. | 2509.12959 | null |
2025-09-16 | Synthetic Protein-Ligand Complex Generation for Deep Molecular Docking | Sofiene Khiari et.al. | 2509.12915 | null |
2025-09-16 | Cumulative Consensus Score: Label-Free and Model-Agnostic Evaluation of Object Detectors in Deployment | Avinaash Manoharan et.al. | 2509.12871 | null |
2025-09-20 | Data Augmentation for Maltese NLP using Transliterated and Machine Translated Arabic Data | Kurt Micallef et.al. | 2509.12853 | null |
2025-09-16 | Double Helix Diffusion for Cross-Domain Anomaly Image Generation | Linchun Wu et.al. | 2509.12787 | null |
2025-09-15 | Robust Fetal Pose Estimation across Gestational Ages via Cross-Population Augmentation | Sebastian Diaz et.al. | 2509.12062 | null |
2025-09-15 | Learning to Generate 4D LiDAR Sequences | Ao Liang et.al. | 2509.11959 | null |
2025-09-15 | Automated training of neural-network interatomic potentials | Davide Bidoggia et.al. | 2509.11703 | null |
2025-09-15 | DTGen: Generative Diffusion-Based Few-Shot Data Augmentation for Fine-Grained Dirty Tableware Recognition | Lifei Hao et.al. | 2509.11661 | null |
2025-09-15 | Task Decoding based on Eye Movements using Synthetic Data Augmentation | Shanmuka Sadhu et.al. | 2509.11547 | null |
2025-09-14 | An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift | Peihong Zhang et.al. | 2509.11168 | null |
2025-09-14 | An Advanced Convolutional Neural Network for Bearing Fault Diagnosis under Limited Data | Shengke Sun et.al. | 2509.11053 | null |
2025-09-13 | Point-Plane Projections for Accurate LiDAR Semantic Segmentation in Small Data Scenarios | Simone Mosco et.al. | 2509.10841 | null |
2025-09-01 | MIDOG 2025 Track 2: A Deep Learning Model for Classification of Atypical and Normal Mitotic Figures under Class and Hardness Imbalances | Sujatha Kotte et.al. | 2509.10502 | null |
2025-09-12 | Improving Audio Event Recognition with Consistency Regularization | Shanmuka Sadhu et.al. | 2509.10391 | null |
2025-09-12 | Scaling Arabic Medical Chatbots Using Synthetic Data: Enhancing Generative AI with Synthetic Patient Records | Abdulrahman Allam et.al. | 2509.10108 | null |
2025-09-11 | Combining Textual and Spectral Features for Robust Classification of Pilot Communications | Abdullah All Tanvir et.al. | 2509.09752 | null |
2025-09-24 | Structure Matters: Brain Graph Augmentation via Learnable Edge Masking for Data-efficient Psychiatric Diagnosis | Mujie Liu et.al. | 2509.09744 | null |
2025-09-11 | Virtual staining for 3D X-ray histology of bone implants | Sarah C. Irvine et.al. | 2509.09235 | null |
2025-09-11 | Target-oriented Multimodal Sentiment Classification with Counterfactual-enhanced Debiasing | Zhiyue Liu et.al. | 2509.09160 | null |
2025-09-10 | Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval-Augmented Parsing with Expert Knowledge | Mohammad Saqib Hasan et.al. | 2509.08808 | null |
2025-09-10 | ADHDeepNet From Raw EEG to Diagnosis: Improving ADHD Diagnosis through Temporal-Spatial Processing, Adaptive Attention Mechanisms, and Explainability in Raw EEG Signals | Ali Amini et.al. | 2509.08779 | null |
2025-09-10 | Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition | Matthew Nolan et.al. | 2509.08225 | null |
2025-09-09 | Transformer-Based Approach to Optimal Sensor Placement for Structural Health Monitoring of Probe Cards | Mehdi Bejani et.al. | 2509.07603 | null |
2025-09-09 | From Scarcity to Efficiency: Investigating the Effects of Data Augmentation on African Machine Translation | Mardiyyah Oduwole et.al. | 2509.07471 | null |
2025-09-08 | Breast Cancer Detection in Thermographic Images via Diffusion-Based Augmentation and Nonlinear Feature Fusion | Sepehr Salem et.al. | 2509.07277 | null |
2025-09-08 | Pothole Detection and Recognition based on Transfer Learning | Mang Hu et.al. | 2509.06750 | null |
2025-09-08 | Contrastive Self-Supervised Network Intrusion Detection using Augmented Negative Pairs | Jack Wilkie et.al. | 2509.06550 | null |
2025-09-08 | IGAff: Benchmarking Adversarial Iterative and Genetic Affine Algorithms on Deep Neural Networks | Sebastian-Vasile Echim et.al. | 2509.06459 | null |
2025-09-08 | CAPMix: Robust Time Series Anomaly Detection Based on Abnormal Assumptions with Dual-Space Mixup | Xudong Mou et.al. | 2509.06419 | null |
2025-09-08 | PL-CA: A Parametric Legal Case Augmentation Framework | Ao Chang et.al. | 2509.06356 | null |
2025-09-07 | Exploring Light-Weight Object Recognition for Real-Time Document Detection | Lucas Wojcik et.al. | 2509.06246 | null |
2025-09-07 | Learning in ImaginationLand: Omnidirectional Policies through 3D Generative Models (OP-Gen) | Yifei Ren et.al. | 2509.06191 | null |
2025-09-06 | CardiacFlow: 3D+t Four-Chamber Cardiac Shape Completion and Generation via Flow Matching | Qiang Ma et.al. | 2509.05754 | null |
2025-09-05 | DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-based Human Action Segmentation | Haitao Tian et.al. | 2509.05543 | null |
2025-09-05 | Handling Data Gaps for the Next Generation of Gravitational-Wave Observatories | Noah Pearson et.al. | 2509.05479 | null |
2025-09-01 | Handling imbalance and few-sample size in ML based Onion disease classification | Abhijeet Manoj Pal et.al. | 2509.05341 | null |
2025-08-30 | A Dataset Generation Scheme Based on Video2EEG-SPGN-Diffusion for SEED-VD | Yunfei Guo et.al. | 2509.05321 | null |
2025-09-05 | Uncertain but Useful: Leveraging CNN Variability into Data Augmentation | Inés Gonzalez-Pepe et.al. | 2509.05238 | null |
2025-09-05 | SL-SLR: Self-Supervised Representation Learning for Sign Language Recognition | Ariel Basso Madjoukeng et.al. | 2509.05188 | null |
2025-09-05 | Hybrid Matrix Factorization Based Graph Contrastive Learning for Recommendation System | Hao Chen et.al. | 2509.05115 | null |
2025-09-05 | Leveraging Transfer Learning and Mobile-enabled Convolutional Neural Networks for Improved Arabic Handwritten Character Recognition | Mohsine El Khayati et.al. | 2509.05019 | null |
2025-09-05 | Optimizing Small Transformer-Based Language Models for Multi-Label Sentiment Analysis in Short Texts | Julius Neumann et.al. | 2509.04982 | null |
2025-09-05 | DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation | Tien Pham et.al. | 2509.04970 | null |
2025-09-05 | A transformer-BiGRU-based framework with data augmentation and confident learning for network intrusion detection | Jiale Zhang et.al. | 2509.04925 | null |
2025-09-05 | Evaluating Multiple Instance Learning Strategies for Automated Sebocyte Droplet Counting | Maryam Adelipour et.al. | 2509.04895 | null |
2025-08-29 | MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification | Alice Schiavone et.al. | 2509.04471 | null |
2025-09-04 | TauGenNet: Plasma-Driven Tau PET Image Synthesis via Text-Guided 3D Diffusion Models | Yuxin Gong et.al. | 2509.04269 | null |
2025-09-04 | How many patients could we save with LLM priors? | Shota Arai et.al. | 2509.04250 | null |
2025-09-04 | Explicit and Implicit Data Augmentation for Social Event Detection | Congbo Ma et.al. | 2509.04202 | null |
2025-09-04 | Chest X-ray Pneumothorax Segmentation Using EfficientNet-B4 Transfer Learning in a U-Net Architecture | Alvaro Aranibar Roque et.al. | 2509.03950 | null |
2025-09-04 | A Generative Foundation Model for Chest Radiography | Yuanfeng Ji et.al. | 2509.03903 | null |
2025-09-04 | Data-Augmented Quantization-Aware Knowledge Distillation | Justin Kur et.al. | 2509.03850 | null |
2025-09-03 | Lightweight image segmentation for echocardiography | Anders Kjelsrud et.al. | 2509.03631 | null |
2025-09-04 | Invariant Features for Global Crop Type Classification | Xin-Yi Tong et.al. | 2509.03497 | null |
2025-09-03 | Joint Training of Image Generator and Detector for Road Defect Detection | Kuan-Chuan Peng et.al. | 2509.03465 | null |
2025-09-02 | Enhancing Machine Learning for Imbalanced Medical Data: A Quantum-Inspired Approach to Synthetic Oversampling (QI-SMOTE) | Vikas Kashtriya et.al. | 2509.02863 | null |
2025-08-29 | Foundation Model-Driven Classification of Atypical Mitotic Figures with Domain-Aware Training Strategies | Piotr Giedziun et.al. | 2509.02601 | null |
2025-09-02 | PalmX 2025: The First Shared Task on Benchmarking LLMs on Arabic and Islamic Culture | Fakhraddin Alwajih et.al. | 2509.02550 | null |
2025-09-02 | EmoPerso: Enhancing Personality Detection with Self-Supervised Emotion-Aware Modelling | Lingzhi Shen et.al. | 2509.02450 | null |
2025-09-02 | Improving Electroencephalogram-Based Deception Detection in Concealed Information Test under Low Stimulus Heterogeneity | Suhye Kim et.al. | 2509.02234 | null |
2025-09-02 | Enhancing Zero-Shot Pedestrian Attribute Recognition with Synthetic Data Generation: A Comparative Study with Image-To-Image Diffusion Models | Pablo Ayuso-Albizu et.al. | 2509.02161 | null |
2025-09-02 | A Data-Centric Approach to Pedestrian Attribute Recognition: Synthetic Augmentation via Prompt-driven Diffusion Models | Alejandro Alonso et.al. | 2509.02099 | null |
2025-09-16 | Abex-rat: Synergizing Abstractive Augmentation and Adversarial Training for Classification of Occupational Accident Reports | Jian Chen et.al. | 2509.02072 | null |
2025-09-01 | CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-Car Speech Separation with Distributed Heterogeneous Arrays | Runduo Han et.al. | 2509.01399 | null |
2025-09-01 | MARS: Modality-Aligned Retrieval for Sequence Augmented CTR Prediction | Yutian Xiao et.al. | 2509.01184 | null |
2025-08-31 | A Unified Denoising and Adaptation Framework for Self-Supervised Bengali Dialectal ASR | Swadhin Biswas et.al. | 2509.00988 | null |
2025-09-05 | Semi-Supervised Bayesian GANs with Log-Signatures for Uncertainty-Aware Credit Card Fraud Detection | David Hirnschall et.al. | 2509.00931 | null |
2025-08-30 | NoiseCutMix: A Novel Data Augmentation Approach by Mixing Estimated Noise in Diffusion Models | Shumpei Takezaki et.al. | 2509.00378 | null |
2025-08-26 | Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition | Tai Vu et.al. | 2509.00077 | null |
2025-08-29 | A Multi-Stage Fine-Tuning and Ensembling Strategy for Pancreatic Tumor Segmentation in Diagnostic and Therapeutic MRI | Omer Faruk Durugol et.al. | 2508.21775 | null |
2025-08-29 | QZhou-Embedding Technical Report | Peng Yu et.al. | 2508.21632 | null |
2025-08-29 | Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model | Zhaofeng Zhong et.al. | 2508.21313 | null |
2025-08-28 | Reverse Imaging for Wide-spectrum Generalization of Cardiac MRI Segmentation | Yidong Zhao et.al. | 2508.21254 | null |
2025-08-26 | CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples | Kyohoon Jin et.al. | 2508.21083 | null |
2025-08-28 | Improved photometric redshift estimations through self-organising map-based data augmentation | Yun-Hao Zhang et.al. | 2508.20903 | null |
2025-08-28 | Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision | Ao Cheng et.al. | 2508.20729 | null |
2025-08-28 | Compositionality in Time Series: A Proof of Concept using Symbolic Dynamics and Compositional Data Augmentation | Michael Hagmann et.al. | 2508.20656 | null |
2025-08-28 | Mask-Guided Multi-Channel SwinUNETR Framework for Robust MRI Classification | Smriti Joshi et.al. | 2508.20621 | null |
2025-08-28 | KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling | Yangfan Wang et.al. | 2508.20567 | null |
2025-08-28 | Enhancing Health Fact-Checking with LLM-Generated Synthetic Data | Jingze Zhang et.al. | 2508.20525 | null |
2025-08-27 | IELDG: Suppressing Domain-Specific Noise with Inverse Evolution Layers for Domain Generalized Semantic Segmentation | Qizhe Fan et.al. | 2508.19604 | null |
2025-08-27 | Improving Recommendation Fairness via Graph Structure and Representation Augmentation | Tongxin Xu et.al. | 2508.19547 | null |
2025-08-26 | Database Entity Recognition with Data Augmentation and Deep Learning | Zikun Fu et.al. | 2508.19372 | null |
2025-08-26 | HuBE: Cross-Embodiment Human-like Behavior Execution for Humanoid Robots | Shipeng Lyu et.al. | 2508.19002 | null |
2025-08-26 | Enhancing compact convolutional transformers with super attention | Simpenzwe Honore Leandre et.al. | 2508.18960 | null |
2025-08-26 | SegReConcat: A Data Augmentation Method for Voice Anonymization Attack | Ridwan Arefeen et.al. | 2508.18907 | null |
2025-08-26 | Enhancing Video-Based Robot Failure Detection Using Task Knowledge | Santosh Thoduka et.al. | 2508.18705 | null |
2025-08-26 | Auditing Approximate Machine Unlearning for Differentially Private Models | Yuechun Gu et.al. | 2508.18671 | null |
2025-08-25 | Analise de Desaprendizado de Maquina em Modelos de Classificacao de Imagens Medicas | Andreza M. C. Falcao et.al. | 2508.18509 | null |
2025-08-25 | Data Augmentation Improves Machine Unlearning | Andreza M. C. Falcao et.al. | 2508.18502 | null |
2025-08-29 | German4All – A Dataset and Model for Readability-Controlled Paraphrasing in German | Miriam Anschütz et.al. | 2508.17973 | null |
2025-08-25 | Diffusion-Based Data Augmentation for Medical Image Segmentation | Maham Nazir et.al. | 2508.17844 | null |
2025-08-25 | LLMulator: Generalizable Cost Modeling for Dataflow Accelerators with Input-Adaptive Control Flow | Kaiyan Chang et.al. | 2508.17826 | null |
2025-08-24 | LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations | Weikang Wan et.al. | 2508.17547 | null |
📊 161 papers
📅 Publish Date | 📝 Title | 👥 Authors | 💻 Code | |
---|---|---|---|---|
2025-09-23 | CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching | Chen Chen et.al. | 2509.19300 | null |
2025-09-23 | Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation | Sherwin Bahmani et.al. | 2509.19296 | null |
2025-09-23 | Enabling Plant Phenotyping in Weedy Environments using Multi-Modal Imagery via Synthetic and Generated Training Data | Earl Ranario et.al. | 2509.19208 | null |
2025-09-23 | GSTM-HMU: Generative Spatio-Temporal Modeling for Human Mobility Understanding | Wenying Luo et.al. | 2509.19135 | null |
2025-09-23 | Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward Pass | Nicholas Popovič et.al. | 2509.18901 | null |
2025-09-22 | Hierarchical Semi-Markov Models with Duration-Aware Dynamics for Activity Sequences | Rohit Dube et.al. | 2509.18414 | null |
2025-09-22 | Evaluating the Creativity of LLMs in Persian Literary Text Generation | Armin Tourajmehr et.al. | 2509.18401 | null |
2025-09-22 | StereoFoley: Object-Aware Stereo Audio Generation from Video | Tornike Karchkhadze et.al. | 2509.18272 | null |
2025-09-22 | Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis | Joshua Ward et.al. | 2509.18014 | null |
2025-09-22 | Autoregressive-Gaussian Mixture Models: Efficient Generative Modeling of WSS Signals | Kathrin Klein et.al. | 2509.17953 | null |
2025-09-22 | Unsupervised Learning and Representation of Mandarin Tonal Categories by a Generative CNN | Kai Schenck et.al. | 2509.17859 | null |
2025-09-22 | Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology | Saghir Alfasly et.al. | 2509.17847 | null |
2025-09-22 | GEM-T: Generative Tabular Data via Fitting Moments | Miao Li et.al. | 2509.17752 | null |
2025-09-23 | A Generative Framework for Personalized Sticker Retrieval | Changjiang Zhou et.al. | 2509.17749 | null |
2025-09-22 | PG-CE: A Progressive Generation Dataset with Constraint Enhancement for Controllable Text Generation | Yan Zhuang et.al. | 2509.17669 | null |
2025-09-22 | Is It Certainly a Deepfake? Reliability Analysis in Detection & Generation Ecosystem | Neslihan Kose et.al. | 2509.17550 | null |
2025-09-22 | Audiobook-CC: Controllable Long-context Speech Generation for Multicast Audiobook | Min Liu et.al. | 2509.17516 | null |
2025-09-21 | Echo-Path: Pathology-Conditioned Echo Video Generation | Kabir Hamzah Muhammad et.al. | 2509.17190 | null |
2025-09-21 | STAR: Speech-to-Audio Generation via Representation Learning | Zeyu Xie et.al. | 2509.17164 | null |
2025-09-21 | ScenGAN: Attention-Intensive Generative Model for Uncertainty-Aware Renewable Scenario Forecasting | Yifei Wu et.al. | 2509.17119 | null |
2025-09-21 | Deep Synthetic Cross-Project Approaches for Software Reliability Growth Modeling | Taehyoun Kim et.al. | 2509.16939 | null |
2025-09-21 | PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion | Xuewan He et.al. | 2509.16897 | null |
2025-09-20 | DoubleGen: Debiased Generative Modeling of Counterfactuals | Alex Luedtke et.al. | 2509.16842 | null |
2025-09-23 | Pain in 3D: Generating Controllable Synthetic Faces for Automated Pain Assessment | Xin Lei Lin et.al. | 2509.16727 | null |
2025-09-20 | Semi-Supervised Synthetic Data Generation with Fine-Grained Relevance Control for Short Video Search Relevance Modeling | Haoran Li et.al. | 2509.16717 | null |
2025-09-20 | An Octave-based Multi-Resolution CQT Architecture for Diffusion-based Audio Generation | Maurício do V. M. da Costa et.al. | 2509.16603 | null |
2025-09-20 | A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis | Antonio Scardace et.al. | 2509.16582 | link |
2025-09-20 | SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning | Yuyang Ding et.al. | 2509.16548 | link |
2025-09-20 | ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions | Yue Huang et.al. | 2509.16543 | link |
2025-09-20 | mmExpert: Integrating Large Language Models for Comprehensive mmWave Data Synthesis and Understanding | Yifan Yan et.al. | 2509.16521 | null |
2025-09-20 | RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation | Tianyi Yan et.al. | 2509.16500 | null |
2025-09-19 | SynthIPD: assumption-lean synthetic individual patient data generation | Zixuan Zhao et.al. | 2509.16466 | null |
2025-09-19 | Entropic Causal Inference: Graph Identifiability | Spencer Compton et.al. | 2509.16463 | null |
2025-09-19 | Introducing Resizable Region Packing Problem in Image Generation, with a Heuristic Solution | Hrishikesh Sharma et.al. | 2509.16363 | null |
2025-09-19 | Guided Sequence-Structure Generative Modeling for Iterative Antibody Optimization | Aniruddh Raghu et.al. | 2509.16357 | null |
2025-09-19 | Rethinking Molecule Synthesizability with Chain-of-Reaction | Seul Lee et.al. | 2509.16084 | null |
2025-09-19 | Sampling String Vacua Using Generative Models | Moritz Walden et.al. | 2509.16029 | null |
2025-09-19 | Fed-PISA: Federated Voice Cloning via Personalized Identity-Style Adaptation | Qi Wang et.al. | 2509.16010 | null |
2025-09-19 | On Optimal Steering to Achieve Exact Fairness | Mohit Sharma et.al. | 2509.15759 | null |
2025-09-19 | TrueMoE: Dual-Routing Mixture of Discriminative Experts for Synthetic Image Detection | Laixin Zhang et.al. | 2509.15741 | null |
2025-09-19 | Toward Medical Deepfake Detection: A Comprehensive Dataset and Novel Method | Shuaibo Li et.al. | 2509.15711 | null |
2025-09-19 | Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification | Zinan Lin et.al. | 2509.15591 | null |
2025-09-19 | LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs | Junlong Jia et.al. | 2509.15568 | null |
2025-09-19 | Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech | Xinlei Niu et.al. | 2509.15492 | null |
2025-09-18 | Discrete Flow-Based Generative Models for Measurement Optimization in Quantum Computing | Isaac L. Huidobro-Meezs et.al. | 2509.15486 | null |
2025-09-18 | Efficient Multimodal Dataset Distillation via Generative Models | Zhenghao Zhao et.al. | 2509.15472 | null |
2025-09-18 | PILOT: Steering Synthetic Data Generation with Psychological & Linguistic Output Targeting | Caitlin Cisar et.al. | 2509.15447 | null |
2025-09-18 | Causal Fingerprints of AI Generative Models | Hui Xu et.al. | 2509.15406 | null |
2025-09-18 | Autoguided Online Data Curation for Diffusion Model Training | Valeria Pais et.al. | 2509.15267 | null |
2025-09-18 | Emotion-Aware Speech Generation with Character-Specific Voices for Comics | Zhiwen Qian et.al. | 2509.15253 | null |
2025-09-18 | Fair-GPTQ: Bias-Aware Quantization for Large Language Models | Irina Proskurina et.al. | 2509.15206 | null |
2025-09-18 | Learning Mechanistic Subtypes of Neurodegeneration with a Physics-Informed Variational Autoencoder Mixture Model | Sanduni Pinnawala et.al. | 2509.15124 | null |
2025-09-19 | Sea-ing Through Scattered Rays: Revisiting the Image Formation Model for Realistic Underwater Image Generation | Vasiliki Ismiroglou et.al. | 2509.15011 | null |
2025-09-20 | SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding | Bingsong Bai et.al. | 2509.14946 | null |
2025-09-18 | Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance | Francisco Messina et.al. | 2509.14934 | null |
2025-09-19 | MeanFlowSE: one-step generative speech enhancement via conditional mean flow | Duojia Li et.al. | 2509.14858 | null |
2025-09-18 | SynBench: A Benchmark for Differentially Private Text Generation | Yidan Sun et.al. | 2509.14594 | null |
2025-09-18 | Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis | Qingyu Liu et.al. | 2509.14579 | null |
2025-09-17 | A generative model of function growth explains hidden self-similarities across biological and social systems | James Holehouse et.al. | 2509.14468 | null |
2025-09-15 | SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models | Karan Dua et.al. | 2509.14270 | null |
2025-09-17 | Quantum Reinforcement Learning-Guided Diffusion Model for Image Synthesis via Hybrid Quantum-Classical Generative Model Architectures | Chi-Sheng Chen et.al. | 2509.14163 | null |
2025-09-19 | FlightDiffusion: Revolutionising Autonomous Drone Training with Diffusion Models Generating FPV Video | Valerii Serpiva et.al. | 2509.14082 | null |
2025-09-17 | Lightweight Implicit Neural Network for Binaural Audio Synthesis | Xikun Lu et.al. | 2509.14069 | null |
2025-09-17 | Enhancing Time Awareness in Generative Recommendation | Sunkyung Lee et.al. | 2509.13957 | null |
2025-09-17 | Synthetic Data Generation for Screen Time and App Usage | Gustavo Kruger et.al. | 2509.13892 | null |
2025-09-17 | EDITS: Enhancing Dataset Distillation with Implicit Textual Semantics | Qianxin Xia et.al. | 2509.13858 | null |
2025-09-17 | CraftMesh: High-Fidelity Generative Mesh Manipulation via Poisson Seamless Fusion | James Jincheng et.al. | 2509.13688 | null |
2025-09-17 | AgentCTG: Harnessing Multi-Agent Collaboration for Fine-Grained Precise Control in Text Generation | Xinxu Zhou et.al. | 2509.13677 | null |
2025-09-17 | LLM-I: LLMs are Naturally Interleaved Multimodal Creators | Zirun Guo et.al. | 2509.13642 | null |
2025-09-17 | Privacy-Aware In-Context Learning for Large Language Models | Bishnu Bhusal et.al. | 2509.13625 | null |
2025-09-14 | Synthetic Data and the Shifting Ground of Truth | Dietmar Offenhuber et.al. | 2509.13355 | null |
2025-09-16 | SURGIN: SURrogate-guided Generative INversion for subsurface multiphase flow with quantified uncertainty | Zhao Feng et.al. | 2509.13189 | null |
2025-09-17 | TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving | Jiawei Wang et.al. | 2509.13164 | null |
2025-09-16 | A Synthetic Data Pipeline for Supporting Manufacturing SMEs in Visual Assembly Control | Jonas Werheid et.al. | 2509.13089 | null |
2025-09-16 | MSR-Codec: A Low-Bitrate Multi-Stream Residual Codec for High-Fidelity Speech Generation with Information Disentanglement | Jingyu Li et.al. | 2509.13068 | null |
2025-09-16 | MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data | Eyal German et.al. | 2509.13046 | null |
2025-09-16 | A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis | Javeria Amir et.al. | 2509.12831 | null |
2025-09-16 | ConvergeWriter: Data-Driven Bottom-Up Article Construction | Binquan Ji et.al. | 2509.12811 | null |
2025-09-16 | Toward Ownership Understanding of Objects: Active Question Generation with Large Language Model and Probabilistic Generative Model | Saki Hashimoto et.al. | 2509.12754 | null |
2025-09-16 | Chat-Driven Text Generation and Interaction for Person Retrieval | Zequn Xie et.al. | 2509.12662 | null |
2025-09-15 | MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch | Nikolay Banar et.al. | 2509.12340 | null |
2025-09-15 | VADER: A Variational Autoencoder to Infer Planetary Masses and Gas-Dust Disk Properties Around Young Stars | Sayed Shafaat Mahmud et.al. | 2509.12324 | null |
2025-09-14 | Prediction of Stocks Index Price using Quantum GANs | Sangram Deshpande et.al. | 2509.12286 | null |
2025-09-15 | OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling | Yang Zhou et.al. | 2509.12201 | null |
2025-09-15 | Learning Majority-to-Minority Transformations with MMD and Triplet Loss for Imbalanced Classification | Suman Cha et.al. | 2509.11511 | null |
2025-09-14 | Scaling Up Forest Vision with Synthetic Data | Yihang She et.al. | 2509.11201 | null |
2025-09-14 | Differentially-private text generation degrades output language quality | Erion Çano et.al. | 2509.11176 | null |
2025-09-14 | STASE: A spatialized text-to-audio synthesis engine for music generation | Tutti Chi et.al. | 2509.11124 | null |
2025-09-14 | Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation | Nhi Kieu et.al. | 2509.11102 | null |
2025-09-14 | Patient-Zero: A Unified Framework for Real-Record-Free Patient Agent Generation | Yunghwei Lai et.al. | 2509.11078 | null |
2025-09-13 | Term2Note: Synthesising Differentially Private Clinical Notes from Medical Terms | Yuping Wu et.al. | 2509.10882 | null |
2025-09-13 | CogGNN: Cognitive Graph Neural Networks in Generative Connectomics | Mayssa Soussia et.al. | 2509.10864 | null |
2025-09-12 | Struct-Bench: A Benchmark for Differentially Private Structured Text Generation | Shuaiqi Wang et.al. | 2509.10696 | null |
2025-09-12 | Humanizing Automated Programming Feedback: Fine-Tuning Generative Models with Student-Written Feedback | Victor-Alexandru Pădurean et.al. | 2509.10647 | null |
2025-09-11 | The Coding Limits of Robust Watermarking for Generative Models | Danilo Francati et.al. | 2509.10577 | null |
2025-09-12 | Differentially Private Decentralized Dataset Synthesis Through Randomized Mixing with Correlated Noise | Utsab Saha et.al. | 2509.10385 | null |
2025-09-12 | Merging Physics-Based Synthetic Data and Machine Learning for Thermal Monitoring of Lithium-ion Batteries: The Role of Data Fidelity | Yusheng Zheng et.al. | 2509.10380 | null |
2025-09-12 | Arabic Large Language Models for Medical Text Generation | Abdulrahman Allam et.al. | 2509.10095 | null |
2025-09-11 | A Modular and Multimodal Generative AI Framework for Urban Building Energy Data: Generating Synthetic Homes | Jackson Eshbaugh et.al. | 2509.09794 | null |
2025-09-11 | OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection | Victor Livernoche et.al. | 2509.09495 | null |
2025-09-11 | Diabatic quantum annealing for training energy-based generative models | Gilhan Kim et.al. | 2509.09374 | null |
2025-09-11 | HISPASpoof: A New Dataset For Spanish Speech Forensics | Maria Risques et.al. | 2509.09155 | null |
2025-09-10 | Generative quantum advantage for classical and quantum problems | Hsin-Yuan Huang et.al. | 2509.09033 | null |
2025-09-12 | ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models | Soheil Zibakhsh Shabgahi et.al. | 2509.08972 | null |
2025-09-10 | PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability | Tung Vu et.al. | 2509.08910 | null |
2025-09-10 | GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts | Jenna Kang et.al. | 2509.08818 | null |
2025-09-10 | Learning Turbulent Flows with Generative Models: Super-resolution, Forecasting, and Sparse Flow Reconstruction | Vivek Oommen et.al. | 2509.08752 | null |
2025-09-10 | Design-GenNO: A Physics-Informed Generative Model with Neural Operators for Inverse Microstructure Design | Yaohua Zang et.al. | 2509.08749 | null |
2025-09-11 | Generative Data Refinement: Just Ask for Better Data | Minqi Jiang et.al. | 2509.08653 | null |
2025-09-10 | Variational Rank Reduction Autoencoders for Generative Thermal Design | Alicia Tierz et.al. | 2509.08515 | null |
2025-09-10 | A Structured Review of Underwater Object Detection Challenges and Solutions: From Traditional to Large Vision Language Models | Edwine Nabahirwa et.al. | 2509.08490 | null |
2025-09-10 | Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition | Jing-Tong Tzeng et.al. | 2509.08470 | null |
2025-09-10 | LLM-Guided Ansätze Design for Quantum Circuit Born Machines in Financial Generative Modeling | Yaswitha Gujju et.al. | 2509.08385 | null |
2025-09-10 | Persistent-DPO: A novel loss function and hybrid learning for generative quantum eigensolver | Junya Nakamura et.al. | 2509.08351 | null |
2025-09-09 | Performance Assessment Strategies for Generative AI Applications in Healthcare | Victor Garcia et.al. | 2509.08087 | null |
2025-09-09 | One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation | Zheng Geng et.al. | 2509.07978 | null |
2025-09-09 | Enhancements in Score-based Channel Estimation for Real-Time Wireless Systems | Florian Strasser et.al. | 2509.07839 | null |
2025-09-09 | A Generalisable Generative Model for Multi-Detector Calorimeter Simulation | Piyush Raikwar et.al. | 2509.07700 | null |
2025-09-09 | Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems | Kamel Kamel et.al. | 2509.07677 | null |
2025-09-09 | Target matching based generative model for speech enhancement | Taihui Wang et.al. | 2509.07521 | null |
2025-09-09 | Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters | Laura Boggia et.al. | 2509.07451 | null |
2025-09-09 | When Fine-Tuning is Not Enough: Lessons from HSAD on Hybrid and Adversarial Audio Spoof Detection | Bin Hu et.al. | 2509.07323 | null |
2025-09-08 | A transformer-based generative model for planetary systems | Yann Alibert et.al. | 2509.07226 | null |
2025-09-08 | Neurocognitive Modeling for Text Generation: Deep Learning Architecture for EEG Data | Khushiyant et.al. | 2509.07202 | null |
2025-09-04 | K-Syn: K-space Data Synthesis in Ultra Low-data Regimes | Guan Yu et.al. | 2509.06997 | null |
2025-09-08 | SynthDrive: Scalable Real2Sim2Real Sensor Simulation Pipeline for High-Fidelity Asset Generation and Driving Data Synthesis | Zhengqing Chen et.al. | 2509.06798 | null |
2025-09-15 | A Statistical 3D Stomach Shape Model for Anatomical Analysis | Erez Posner et.al. | 2509.06464 | null |
2025-09-08 | MeanFlow-Accelerated Multimodal Video-to-Audio Synthesis via One-Step Generation | Xiaoran Yang et.al. | 2509.06389 | null |
2025-09-08 | Text4Seg++: Advancing Image Segmentation via Generative Language Modeling | Mengcheng Lan et.al. | 2509.06321 | null |
2025-09-07 | If generative AI is the answer, what is the question? | Ambuj Tewari et.al. | 2509.06120 | null |
2025-09-07 | DreamAudio: Customized Text-to-Audio Generation with Diffusion Models | Yi Yuan et.al. | 2509.06027 | null |
2025-09-06 | GUIDe: Generative and Uncertainty-Informed Inverse Design for On-Demand Nonlinear Functional Responses | Haoxuan Dylan Mu et.al. | 2509.05641 | null |
2025-09-04 | SasAgent: Multi-Agent AI System for Small-Angle Scattering Data Analysis | Lijie Ding et.al. | 2509.05363 | null |
2025-09-02 | Ensembling Membership Inference Attacks Against Tabular Generative Models | Joshua Ward et.al. | 2509.05350 | null |
2025-09-04 | Improved 3D Scene Stylization via Text-Guided Generative Image Editing with Region-Based Control | Haruo Fujiwara et.al. | 2509.05285 | null |
2025-09-05 | Recomposer: Event-roll-guided generative audio editing | Daniel P. W. Ellis et.al. | 2509.05256 | null |
2025-09-08 | Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations | Benjamin J. Zhang et.al. | 2509.05186 | null |
2025-09-05 | Painting the market: generative diffusion models for financial limit order book simulation and forecasting | Alfred Backhouse et.al. | 2509.05107 | null |
2025-09-05 | QCA-MolGAN: Quantum Circuit Associative Molecular GAN with Multi-Agent Reinforcement Learning | Aaron Mark Thomas et.al. | 2509.05051 | null |
2025-09-05 | Efficient Video-to-Audio Generation via Multiple Foundation Models Mapper | Gehui Chen et.al. | 2509.04957 | null |
2025-09-05 | SynGen-Vision: Synthetic Data Generation for training industrial vision models | Alpana Dubey et.al. | 2509.04894 | null |
2025-09-04 | Transition Models: Rethinking the Generative Learning Objective | Zidong Wang et.al. | 2509.04394 | null |
2025-09-04 | AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds | Qizhou Wang et.al. | 2509.04345 | null |
2025-09-04 | Synthetic Survival Data Generation for Heart Failure Prognosis Using Deep Generative Models | Chanon Puttanawarut et.al. | 2509.04245 | null |
2025-09-04 | Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning | Zhilin Wang et.al. | 2509.04059 | null |
2025-09-04 | An invertible generative model for forward and inverse problems | Tristan van Leeuwen et.al. | 2509.03910 | null |
2025-09-04 | Diffusion Generative Models Meet Compressed Sensing, with Applications to Image Data and Financial Time Series | Zhengyi Guo et.al. | 2509.03898 | null |
2025-09-03 | LuxDiT: Lighting Estimation with Video Diffusion Transformer | Ruofan Liang et.al. | 2509.03680 | null |
2025-09-05 | CEHR-XGPT: A Scalable Multi-Task Foundation Model for Electronic Health Records | Chao Pang et.al. | 2509.03643 | null |
2025-09-03 | Multi-level SSL Feature Gating for Audio Deepfake Detection | Hoan My Tran et.al. | 2509.03409 | null |
2025-09-03 | Generative Auto-Bidding in Large-Scale Competitive Auctions via Diffusion Completer-Aligner | Yewen Li et.al. | 2509.03348 | null |
2025-09-03 | A Comprehensive Guide to Differential Privacy: From Theory to User Expectations | Napsu Karmitsa et.al. | 2509.03294 | null |
2025-09-03 | Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings | Dyah A. M. G. Wisnu et.al. | 2509.03292 | null |
2025-09-03 | RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion | Junhao Jia et.al. | 2509.03214 | null |
2025-09-03 | Eigendecompositions of temporal networks | Lucas Lacasa et.al. | 2509.03135 | null |
2025-09-03 | Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers | Xingyue Huang et.al. | 2509.03059 | null |
2025-09-03 | Scale-Adaptive Generative Flows for Multiscale Scientific Data | Yifan Chen et.al. | 2509.02971 | null |
2025-09-02 | Generative AI for Crystal Structures: A Review | Pierre-Paul De Breuck et.al. | 2509.02723 | null |
2025-09-02 | Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation | Erfan Baghaei Potraghloo et.al. | 2509.02510 | null |
2025-09-02 | Exploring Variational Graph Autoencoders for Distribution Grid Data Generation | Syed Zain Abbas et.al. | 2509.02469 | null |
2025-09-02 | Exploring Diffusion Models for Generative Forecasting of Financial Charts | Taegyeong Lee et.al. | 2509.02308 | null |
Contributions are welcome! Please feel free to submit issues or pull requests.
If you find this repository useful, please consider giving it a star!