文档大纲

ARXIV CS AI 20251005 SUMMARY

arXiv cs.AI 20251005 论文分析报告

arXiv cs.AI 20251005 论文分析报告

📊 数据统计概览

📈基本统计

  • 论文总数: 121
  • 分析分类: cs.AI
  • 时间范围: 20251005
  • 独立作者数: 653

👥高产作者 Top 10

  1. Quanming Yao (3 篇)
  2. Ziying Zhang (2 篇)
  3. Yaqing Wang (2 篇)
  4. Fan Zhang (2 篇)
  5. Jiajun Wu (2 篇)
  6. Braden Teitge (2 篇)
  7. Jessalyn Holodinsky (2 篇)
  8. Steve Drew (2 篇)
  9. Joseph Ramsey (2 篇)
  10. Bryan Andrews (2 篇)

🔍热门关键词 Top 10

  1. language (70 次)
  2. learning (60 次)
  3. llms (58 次)
  4. reasoning (55 次)
  5. data (40 次)
  6. agents (28 次)
  7. llm (25 次)
  8. where (25 次)
  9. optimization (19 次)
  10. critical (18 次)

🤖 AI 深度分析

arXiv cs.AI 研究分析报告 对2025年10月5日发布的121篇论文的深入洞察 报告生成日期: 2025-10-28

1. 研究方向热度分析

本次分析的121篇论文中,大型语言模型(LLM)依然是绝对的研究核心,尤其是在Agent智能体、强化学习、模型评估与安全等方向上呈现出爆发性增长。研究者们不仅致力于提升模型的性能,更开始系统性地探索其在复杂任务中的自主性、可靠性和效率。

1.1 LLM Agent 与多智能体系统 (MAS)

相关论文: 约 25-30 篇

LLM Agent是本次分析中最热门的方向。研究重点从单一Agent的能力展示转向更复杂的系统性问题。

  • 核心技术: 框架设计(AgentRL, Zephyrus, FairAgent)、多智能体协作与协商(NegotiationGym, Emergent Coordination)、工具使用(AlphaApollo)、以及Agent的自我演进和适应能力(Just-in-time Episodic Feedback)。
  • 创新点: 提出了多个用于扩展Agent能力的框架,如用于科学发现的Zephyrus和用于天气科学的AlphaApollo。同时,对Agent在社会模拟、经济活动和软件开发中的应用进行了探索。Agentic Misalignment等研究开始关注Agent可能带来的潜在风险,如成为内部威胁。
  • 未来趋势: Agent的自主性、鲁棒性和安全性将是未来研究的焦点。如何设计能够进行长期规划、高效协作并与人类价值观对齐的Agent系统,将成为一个关键挑战。多Agent系统的社会动力学和经济模型也将是重要的探索方向。

1.2 LLM 推理与认知机制

相关论文: 约 15-20 篇

深入理解和增强LLM的推理能力是另一个核心主题。研究者们正从“让模型会推理”转向“理解模型如何推理”。

  • 核心技术: 强化学习与推理的结合(RLVR, SFPO)、思维链(CoT)的忠实性分析(FaithCoT-Bench)、以及潜式推理(Latent Thought Policy Optimization)。
  • 创新点: FaithCoT-Bench等工作开始质疑并评测CoT过程的真实性。Internal states before wait等研究通过分析模型生成特定token(如"wait")前的内部状态,深入探索模型的自我修正机制。研究也开始从数学角度解释Transformer架构(A Mathematical Explanation of Transformers)。
  • 未来趋势: 未来的研究将更侧重于“白盒”分析,即解密LLM内部的认知过程。如何实现更可靠、可解释和可控的推理,以及如何将潜式推理与显式推理(如CoT)相结合,将是重要的研究方向。

1.3 模型评估、基准与安全性

相关论文: 约 20 篇

随着模型能力的增强,如何全面、公平地评估模型,并确保其安全性,变得至关重要。

  • 核心技术: 新型基准设计(GDPval, MacroBench, WebRenderBench)、对抗性攻击与防御(AgentTypo, SECA, SafeGuider)、以及模型后门与非学习方法的探索(MLLMEraser, QuRA)。
  • 创新点: GDPval从经济价值角度评估AI模型在真实世界任务中的表现,为衡量模型实用性提供了新视角。MacroBenchWebRenderBench则专注于评估LLM在Web自动化和UI生成方面的能力。安全性方面,研究者不仅提出了新型攻击方法(如利用排版漏洞的AgentTypo),也开发了更鲁棒的防御机制和“非学习”技术(MLLMEraser),以便在不重新训练的情况下移除有害知识。
  • 未来趋势: 评估将更加注重模型在真实、动态和高风险环境下的表现。多模态安全、Agent安全以及对模型供应链(如量化过程)的安全审计将成为新的研究热点。

1.4 强化学习 (RL) 与 LLM 的融合

相关论文: 约 10-15 篇

强化学习(RL)已成为优化LLM(尤其是Agent)行为和推理能力的主流范式。

  • 核心技术: 策略优化算法(SFPO)、模型基RL(Spatiotemporal Forecasting as Planning)、以及多目标RL框架(COSMO-RL)。
  • 创新点: AgentRL提出了一个可扩展的多任务、多轮次Agentic RL训练框架。COSMO-RL则通过联合优化模型的安全性和稳定性,探索如何构建更值得信赖的多模态推理模型。研究也开始关注离线RL在LLM中的理论和算法保证。
  • 未来趋势: 如何提高RL训练的样本效率和稳定性仍然是核心挑战。将RL与世界模型、因果推理相结合,以及开发能够处理多模态输入和多目标优化的RL算法,将是未来的重要方向。

1.5 高效模型与 Transformer 架构创新

相关论文: 约 8-10 篇

降低LLM的训练和推理成本是推动其广泛应用的关键。

  • 核心技术: 模型量化(PatternKV, QuRA)、注意力机制替代方案(RACE Attention)、以及新型网络结构(SliceMoE, PolyKAN)。
  • 创新点: RACE Attention提出了一种线性复杂度的注意力机制,有望将上下文长度扩展到十亿级别。SliceMoE通过对嵌入向量进行切片路由,实现了比传统MoE更细粒度的专家网络。PatternKV通过扁平化KV表示来优化KV缓存的量化。
  • 未来趋势: 探索超越Transformer的新架构(如KAN, SNN)将持续升温。如何在保持模型性能的同时,极致地压缩模型大小、降低内存占用和计算成本,将是工业界和学术界共同追求的目标。

2. 作者关系图谱

通过分析论文作者的合作关系,我们可以识别出高产的研究者和紧密的合作团队。本次分析中,一些作者在多篇论文中出现,形成了一些研究核心。

高产作者: Joseph Ramsey, Bryan Andrews, Quanming Yao 等作者在本次论文集中发表了多篇论文,主要集中在因果发现和LLM推理领域。
影响力作者/团队: 一些论文拥有庞大的作者团队,如 "A global log for medical AI" 和 "Open Agent Specification",这反映了在AI基础设施和高风险应用领域,大规模跨机构合作已成为常态。

graph TD; subgraph "因果发现" JR["Joseph Ramsey"] -- "合作" --> BA["Bryan Andrews"]; JR -- "发表" --> p1["Efficient Latent Variable Causal Discovery"]; JR -- "发表" --> p2["Scalable Causal Discovery"]; BA -- "发表" --> p1; BA -- "发表" --> p2; end subgraph "LLM推理与优化" QY["Quanming Yao"] -- "合作" --> ZY_Zhang["Ziying Zhang"]; QY -- "合作" --> HQ_Qiu["Haiquan Qiu"]; QY -- "发表" --> p3["Searching Meta Reasoning Skeleton"]; QY -- "发表" --> p4["Attending on Multilevel Structure of Proteins"]; QY -- "发表" --> p5["Why Low-Precision Transformer Training Fails"]; ZY_Zhang -- "发表" --> p3; ZY_Zhang -- "发表" --> p4; HQ_Qiu -- "发表" --> p5; end subgraph "大型跨机构合作" MedAI["A global log for medical AI (28位作者)"] AgentSpec["Open Agent Specification (19位作者)"] GDPval["GDPval (19位作者)"] end subgraph "Agent与RL" AgentRL["AgentRL (14位作者)"] AlphaApollo["AlphaApollo (17位作者)"] end QY --> AgentRL;

3. 技术创新总结

本次论文集展示了AI领域的诸多技术突破,尤其是在方法论、应用和基础理论层面。

3.1 关键技术突破

  • 线性注意力机制 (RACE Attention): 提出了一种替代Softmax Attention的线性复杂度方法,理论上能将Transformer的上下文长度扩展到十亿级别,对处理超长序列数据意义重大。
  • 细粒度专家混合模型 (SliceMoE): 传统的MoE在token级别路由,而SliceMoE在嵌入向量的“切片”级别进行路由,实现了更细粒度的专家分工和更好的负载均衡,为模型扩展提供了新思路。
  • Agent自主演进系统 (AlphaApollo): 展示了一个能自我演进的Agentic推理系统,它通过编排多个基础模型和专业工具(如Python库)来解决复杂问题,并通过自我反思和知识更新来不断提升能力。

3.2 方法论创新

  • 基于RL的推理优化 (SFPO, RLVR): 提出了多种RL优化框架(如Slow-Fast Policy Optimization)来稳定和加速LLM的推理能力训练,解决了早期训练梯度噪声大、探索效率低的问题。
  • 非学习式模型修改 (MLLMEraser): 提出了一种在测试时通过激活向量转向来“擦除”模型特定知识的方法,无需重新训练即可实现模型内容的快速、可逆修改,为模型安全和隐私保护提供了新工具。
  • 因果发现新范式 (BOSS, BF-BIC): 提出了结合分数搜索和定向测试的混合策略因果搜索算法,以及基于基函数展开的BIC分数,旨在从非线性数据中高效、可扩展地学习因果结构。

3.3 应用领域拓展

  • AI for Science: Zephyrus框架将LLM Agent应用于天气科学,AlphaApollo则面向更广泛的科学计算和推理,展示了AI作为科学家助手的巨大潜力。
  • 医疗健康: 从A global log for medical AI提出的标准化日志系统,到Doctor-R1GROK模型在临床问诊和多模态诊断中的应用,AI正在深度融入医疗流程。
  • 软件工程与Web自动化: MacroBenchWebRenderBench等基准的提出,以及对需求工程、代码补全等方向的探索,标志着LLM正在成为下一代软件开发的核心驱动力。

4. 完整论文列表 (121篇)

标题 作者 链接
A global log for medical AI Ayush Noori, Adam Rodman, Alan Karthikesalingam, Bilal A. Mateen, Christopher A. Longhurst, Daniel Yang, Dave deBronkart, Gauden Galea, Harold F. Wolf III, Jacob Waxman, Joshua C. Mandel, Juliana Rotich, Kenneth D. Mandl, Maryam Mustafa, Melissa Miles, Nigam H. Shah, Peter Lee, Robert Korom, Scott Mahoney, Seth Hain, Tien Yin Wong, Trevor Mundel, Vivek Natarajan, Noa Dagan, David A. Clifton, Ran D. Balicer, Isaac S. Kohane, Marinka Zitnik 2510.04033v1
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning Xu Shen, Song Wang, Zhen Tan, Laura Yao, Xinyu Zhao, Kaidi Xu, Xin Wang, Tianlong Chen 2510.04040v1
Increasing LLM response trustworthiness using voting ensembles Aparna Nair-Kanneganti, Trevor J. Chan, Shir Goldfinger, Emily Mackay, Brian Anthony, Alison Pouch 2510.04048v1
Toward a unified framework for data-efficient evaluation of large language models Lele Liao, Qile Zhang, Ruofan Wu, Guanhua Fang 2510.04051v1
Decoding Emotion in the Deep: A Systematic Study of How LLMs Represent, Retain, and Express Emotion Jingxiang Zhang, Lujia Zhong 2510.04064v2
Moral Anchor System: A Predictive Framework for AI Value Alignment and Drift Prevention Santhosh Kumar Ravindran 2510.04073v1
SPOGW: a Score-based Preference Optimization method via Group-Wise comparison for workflows Yitong Cui, Liu Liu, Baosheng Yu, Jiayan Qiu, Xikai Zhang, Likang Xiao, Yixing Liu, Quan Chen 2510.04089v1
Harnessing LLM for Noise-Robust Cognitive Diagnosis in Web-Based Intelligent Education Systems Guixian Zhang, Guan Yuan, Ziqi Xu, Yanmei Zhang, Jing Ren, Zhenyun Deng, Debo Cheng 2510.04093v2
WebRenderBench: Enhancing Web Interface Generation through Layout-Style Consistency and Reinforcement Learning Peichao Lai, Jinhui Zhuang, Kexuan Zhang, Ningchang Xiong, Shengjie Wang, Yanwei Xu, Chong Chen, Yilei Wang, Bin Cui 2510.04097v2
Searching Meta Reasoning Skeleton to Guide LLM Reasoning Ziying Zhang, Yaqing Wang, Quanming Yao 2510.04116v1
The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning Mayank Ravishankara, Varindra V. Persad Maharaj 2510.04141v1
Open Agent Specification (Agent Spec) Technical Report Yassine Benajiba, Cesare Bernardis, Vladislav Blinov, Paul Cayet, Hassan Chafi, Abderrahim Fathan, Louis Faucon, Damien Hilloulin, Sungpack Hong, Ingo Kossyk, Rhicheek Patra, Sujith Ravi, Jonas Schweizer, Jyotika Singh, Shailender Singh, Xuelin Situ, Weiyi Sun, Jerry Xu, Ying Xu 2510.04173v2
Constructing coherent spatial memory in LLM agents through graph rectification Puzhen Zhang, Xuyang Chen, Yu Feng, Yuhan Jiang, Liqiu Meng 2510.04195v1
AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework Hanchen Zhang, Xiao Liu, Bowen Lv, Xueqiao Sun, Bohao Jing, Iat Long Iong, Zhenyu Hou, Zehan Qi, Hanyu Lai, Yifan Xu, Rui Lu, Hongning Wang, Jie Tang, Yuxiao Dong 2510.04206v1
GROK: From Quantitative Biomarkers to Qualitative Diagnosis via a Grounded MLLM with Knowledge-Guided Instruction Zhuangzhi Gao, Hongyi Qin, He Zhao, Qinkai Yu, Feixiang Zhou, Eduard Shantsila, Uazman Alam, Alena Shantsila, Wahbi El-Bouri, Gregory Y. H. Lip, Yalin Zheng 2510.04281v1
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning Yunghwei Lai, Kaiming Liu, Ziyue Wang, Weizhi Ma, Yang Liu 2510.04284v1
Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation Hadi Nekoei, Aman Jaiswal, Patrice Bechard, Oleh Shliazhko, Orlando Marquez Ayala, Mathieu Reymond, Massimo Caccia, Alexandre Drouin, Sarath Chandar, Alexandre Lacoste 2510.04373v1
LLM Based Bayesian Optimization for Prompt Search Adam Ballew, Jingbo Wang, Shaogang Ren 2510.04384v2
Representation Potentials of Foundation Models for Multimodal Alignment: A Survey Jianglin Lu, Hailing Wang, Yi Xu, Yizhou Wang, Kuo Yang, Yun Fu 2510.05184v1
Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data Hoang Anh Just, Myeongseob Ko, Ruoxi Jia 2510.03988v1
Quantifying Distributional Robustness of Agentic Tool-Selection Jehyeok Yeon, Isha Chaudhary, Gagandeep Singh 2510.03992v1
PrivSpike: Employing Homomorphic Encryption for Private Inference of Deep Spiking Neural Networks Nges Brian Njungle, Eric Jahns, Milan Stojkov, Michel A. Kinsy 2510.03995v1
Named Entity Recognition in COVID-19 tweets with Entity Knowledge Augmentation Xuankang Zhang, Jiangming Liu 2510.04001v1
Replacing Softmax Similarity with a Sharpened Angular Similarity: Theory and Practice of Scaling To Billion-Context Attention Sahil Joshi, Agniva Chowdhury, Amar Kanakamedala, Ekam Singh, Evan Tu, Anshumali Shrivastava 2510.04008v2
What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models Zicong He, Boxuan Zhang, Weihao Liu, Ruixiang Tang, Lu Cheng 2510.04009v1
Thai Semantic End-of-Turn Detection for Real-Time Voice Agents Thanapol Popit, Natthapath Rungseesiripak, Monthol Charattrakool, Saksorn Ruangtanusak 2510.04016v1
Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models Hao Wu, Yuan Gao, Xingjian Shi, Shuaipeng Li, Fan Xu, Fan Zhang, Zhihong Zhu, Weiyan Wang, Xiao Luo, Kun Wang, Xian Wu, Xiaomeng Huang 2510.04020v3
LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions Mizanur Rahman, Amran Bhuiyan, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Ridwan Mahbub, Ahmed Masry, Shafiq Joty, Enamul Hoque 2510.04023v1
The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View Xinhao Yao, Lu Yu, Xiaolin Hu, Fengwei Teng, Qing Cui, Jun Zhou, Yong Liu 2510.04028v1
Does Using Counterfactual Help LLMs Explain Textual Importance in Classification? Nelvin Tan, James Asikin Cheung, Yu-Ching Shih, Dong Yang, Amol Salunkhe 2510.04031v1
Small Language Models for Emergency Departments Decision Support: A Benchmark Study Zirui Wang, Jiajun Wu, Braden Teitge, Jessalyn Holodinsky, Steve Drew 2510.04032v1
Prompt-to-Prompt: Text-Based Image Editing Via Cross-Attention Mechanisms -- The Research of Hyperparameters and Novel Mechanisms to Enhance Existing Frameworks Linn Bieske, Carla Lorente 2510.04034v1
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding Bin Lei, Nuo Xu, Ali Payani, Mingyi Hong, Chunhua Liao, Yu Cao, Caiwen Ding 2510.04039v1
Quantization Range Estimation for Convolutional Neural Networks Bingtao Yang, Yujia Wang, Mengzhi Jiao, Hongwei Huo 2510.04044v1
MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation Zhenyu Pan, Yucheng Lu, Han Liu 2510.04057v1
Efficient Training of Spiking Neural Networks by Spike-aware Data Pruning Chenxiang Ma, Xinyi Chen, Yujie Wu, Kay Chen Tan, Jibin Wu 2510.04098v1
TOPO-Bench: An Open-Source Topological Mapping Evaluation Framework with Quantifiable Perceptual Aliasing Jiaming Wang, Diwen Liu, Jizhuo Chen, Harold Soh 2510.04100v1
Unveiling LLMs\' Metaphorical Understanding: Exploring Conceptual Irrelevance, Context Leveraging and Syntactic Influence Fengying Ye, Shanshan Wang, Lidia S. Chao, Derek F. Wong 2510.04120v1
Attending on Multilevel Structure of Proteins enables Accurate Prediction of Cold-Start Drug-Target Interactions Ziying Zhang, Yaqing Wang, Yuxuan Sun, Min Ye, Quanming Yao 2510.04126v1
Internal states before wait modulate reasoning patterns Dmitrii Troitskii, Koyena Pal, Chris Wendler, Callum Stuart McDougall, Neel Nanda 2510.04128v1
On the Limitations and Capabilities of Position Embeddings for Length Generalization Yang Chen, Yitao Liang, Zhouchen Lin 2510.04130v1
PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting Yiming Niu, Jinliang Deng, Yongxin Tong 2510.04134v1
GA4GC: Greener Agent for Greener Code via Multi-Objective Configuration Optimization Jingzhi Gong, Yixin Bian, Luis de la Cal, Giovanni Pinna, Anisha Uteem, David Williams, Mar Zamorano, Karine Even-Mendoza, W. B. Langdon, Hector Menendez, Federica Sarro 2510.04135v1
Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs Zishang Jiang, Jinyi Han, Tingyun Li, Xinyi Wang, Sihang Jiang, Jiaqing Liang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao 2510.04140v1
Multi Language Models for On-the-Fly Syntax Highlighting Marco Edoardo Palma, Pooja Rani, Harald C. Gall 2510.04166v1
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization Wengao Ye, Yan Liang, Lianlei Shan 2510.04182v1
A Complement to Neural Networks for Anisotropic Inelasticity at Finite Strains Hagen Holthusen, Ellen Kuhl 2510.04187v1
Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity Prashansa Panda, Shalabh Bhatnagar 2510.04189v1
Cooperative Flexibility Exchange: Fair and Comfort-Aware Decentralized Resource Allocation Rabiya Khalid, Evangelos Pournaras 2510.04192v1
COSMO-RL: Towards Trustworthy LMRMs via Joint Safety and Stability Yizhuo Ding, Mingkang Chen, Qiuhua Liu, Fenghua Weng, Wanying Qu, Yue Yang, Yugang Jiang, Zuxuan Wu, Yanwei Fu, Wenqi Shao 2510.04196v1
World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge Moo Hyun Son, Jintaek Oh, Sun Bin Mun, Jaechul Roh, Sehyun Choi 2510.04201v1
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Haiquan Qiu, Quanming Yao 2510.04212v2
MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering Chenlu Ding, Jiancan Wu, Leheng Sheng, Fan Zhang, Yancheng Yuan, Xiang Wang, Xiangnan He 2510.04217v2
When AI Gets Persuaded, Humans Follow: Inducing the Conformity Effect in Persuasive Dialogue Rikuo Sasaki, Michimasa Inaba 2510.04229v2
Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling Kai Yang, Yuqi Huang, Junheng Tao, Wanyu Wang, Qitian Wu 2510.04233v1
Flexible Locomotion Learning with Diffusion Model Predictive Control Runhan Huang, Haldun Balim, Heng Yang, Yilun Du 2510.04234v1
Empowering Denoising Sequential Recommendation with Large Language Model Embeddings Tongzhou Wu, Yuhao Wang, Maolin Wang, Chi Zhang, Xiangyu Zhao 2510.04239v1
Diffusion-Assisted Distillation for Self-Supervised Graph Representation Learning with MLPs Seong Jin Ahn, Myoung-Ho Kim 2510.04241v1
Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks Ayushi Mehrotra, Derek Peng, Dipkamal Bhusal, Nidhi Rastogi 2510.04245v1
ContextVLA: Vision-Language-Action Model with Amortized Multi-Frame Context Huiwon Jang, Sihyun Yu, Heeseung Kwon, Hojin Jeon, Younggyo Seo, Jinwoo Shin 2510.04246v1
AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents Yanjie Li, Yiming Cao, Dong Wang, Bin Xiao 2510.04257v1
Efficient Latent Variable Causal Discovery: Combining Score Search and Targeted Testing Joseph Ramsey, Bryan Andrews 2510.04263v1
LongTail-Swap: benchmarking language models' abilities on rare words Robin Algayres, Charles-Éric Saint-James, Mahi Luthra, Jiayi Shen, Dongyan Lin, Youssef Benchekroun, Rashel Moritz, Juan Pino, Emmanuel Dupoux 2510.04268v1
Scalable Causal Discovery from Recursive Nonlinear Data via Truncated Basis Function Scores and Tests Joseph Ramsey, Bryan Andrews 2510.04276v1
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs Om Tailor 2510.04303v2
On the Importance of Task Complexity in Evaluating LLM-Based Multi-Agent Systems Bohan Tang, Huidong Liang, Keyue Jiang, Xiaowen Dong 2510.04311v1
FairAgent: Democratizing Fairness-Aware Machine Learning with LLM-Powered Agents Yucong Dai, Lu Zhang, Feng Luo, Mashrur Chowdhury, Yongkai Wu 2510.04317v1
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time Daniel Tan, Anders Woodruff, Niels Warncke, Arun Jose, Maxime Riché, David Demitri Africa, Mia Taylor 2510.04340v3
Critical appraisal of artificial intelligence for rare-event recognition: principles and pharmacovigilance case studies G. Niklas Noren, Eva-Lisa Meldau, Johan Ellenius 2510.04341v1
NegotiationGym: Self-Optimizing Agents in a Multi-Agent Social Simulation Environment Shashank Mangla, Chris Hokamp, Jack Boylan, Demian Gholipour Ghalandari, Yuuv Jauhari, Lauren Cassidy, Oisin Duffy 2510.04368v1
Adaptive Weighted Loss for Sequential Recommendations on Sparse Domains Akshay Mittal, Vinay Venkatesh, Krishna Kandi, Shalini Sudarshan 2510.04375v1
Utility-Learning Tension in Self-Modifying Agents Charles L. Wang, Keir Dorchen, Peter Jin 2510.04399v1
From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs Guangyu Shen, Siyuan Cheng, Xiangzhe Xu, Yuan Zhou, Hanxi Guo, Zhuo Zhang, Xiangyu Zhang 2510.05169v1
Emergent Coordination in Multi-Agent Language Models Christoph Riedl 2510.05174v1
PatternKV: Flattening KV Representation Expands Quantization Headroom Ji Zhang, Yiwei Li, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Jiayi Shi, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li 2510.05176v1
Dual-stage and Lightweight Patient Chart Summarization for Emergency Physicians Jiajun Wu, Swaleh Zaidi, Braden Teitge, Henry Leung, Jiayu Zhou, Jessalyn Holodinsky, Steve Drew 2510.06263v1
Real-Time Health Analytics Using Ontology-Driven Complex Event Processing and LLM Reasoning: A Tuberculosis Case Study Ritesh Chandra, Sonali Agarwal, Navjot Singh 2510.09646v1
Benchmarking Open-Source Large Language Models for Persian in Zero-Shot and Few-Shot Learning Mahdi Cherakhloo, Arash Abbasi, Mohammad Saeid Sarafraz, Bijan Vosoughi Vahdat 2510.12807v1
A Mathematical Explanation of Transformers for Large Language Models and GPTs Xue-Cheng Tai, Hao Liu, Lingfeng Li, Raymond H. Chan 2510.03989v1
AI-Driven Grading and Moderation for Collaborative Projects in Computer Science Education Songmei Yu, Andrew Zagula 2510.03998v1
Zephyrus: An Agentic Framework for Weather Science Sumanth Varambally, Marshall Fisher, Jas Thakker, Yiwei Chen, Zhirui Xia, Yasaman Jafari, Ruijia Niu, Manas Jain, Veeramakali Vignesh Manivannan, Zachary Novack, Luyu Han, Srikar Eranky, Salva Rühling Cachay, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yi-An Ma, Rose Yu 2510.04017v1
Principled and Tractable RL for Reasoning with Diffusion Language Models Anthony Zhan 2510.04019v1
What Scales in Cross-Entropy Scaling Law? Junxi Yan, Zixi Wei, Jingtao Zhan, Qingyao Ai, Yiqun Liu 2510.04067v1
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning Ziyan Wang, Zheng Wang, Jie Fu, Xingwei Qu, Qi Cheng, Shengpu Tang, Minjia Zhang, Xiaoming Huo 2510.04072v2
Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling Hyung Gyu Rho, Sian Lee 2510.04087v2
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees Nan Jiang, Tengyang Xie 2510.04088v1
Using predefined vector systems as latent space configuration for neural network supervised training on data with arbitrarily large number of classes Nikita Gabdullin 2510.04090v1
Learning-Based Hashing for ANN Search: Foundations and Early Advances Sean Moran 2510.04127v1
Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs Xiaoyu Yang, Jie Lu, En Yu 2510.04142v1
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models Minseo Kim, Coleman Hooper, Aditya Tomar, Chenfeng Xu, Mehrdad Farajtabar, Michael W. Mahoney, Kurt Keutzer, Amir Gholami 2510.04146v1
CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling Zhengyang Tang, Zihan Ye, Chenyu Huang, Xuhan Huang, Chengpeng Li, Sihang Li, Guanua Chen, Ming Yan, Zizhuo Wang, Hongyuan Zha, Dayiheng Liu, Benyou Wang 2510.04204v1
MASC: Boosting Autoregressive Image Generation with a Manifold-Aligned Semantic Clustering Lixuan He, Shikang Zheng, Linfeng Zhang 2510.04220v1
Zoom-In to Sort AI-Generated Images Out Yikun Ji, Yan Hong, Bowen Deng, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang 2510.04225v1
Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales Jinyang Jiang, Jinhui Han, Yijie Peng, Ying Zhang 2510.04272v1
A KL-regularization framework for learning to plan with adaptive priors Álvaro Serra-Gomez, Daniel Jarne Ornia, Dhruva Tirumala, Thomas Moerland 2510.04280v1
SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling Harshil Vejendla 2510.04286v1
Challenge on Optimization of Context Collection for Code Completion Dmitry Ustalov, Egor Bogomolov, Alexander Bezzubov, Yaroslav Golubev, Evgeniy Glukhov, Georgii Levtsov, Vladimir Kovalenko 2510.04349v1
Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators Apurva Badithela, David Snyder, Lihan Zha, Joseph Mikhail, Matthew O\'Kelly, Anushri Dixit, Anirudha Majumdar 2510.04354v1
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models Hyunjun Kim, Sejong Kim 2510.04363v2
Speculative Actions: A Lossless Framework for Faster Agentic Systems Naimeng Ye, Arnav Ahuja, Georgios Liargkovas, Yunan Lu, Kostis Kaffes, Tianyi Peng 2510.04371v1
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks Tejal Patwardhan, Rachel Dias, Elizabeth Proehl, Grace Kim, Michele Wang, Olivia Watkins, Simón Posada Fishman, Marwan Aljubeh, Phoebe Thacker, Laurance Fauconnet, Natalie S. Kim, Patrick Chao, Samuel Miserendino, Gildas Chabot, David Li, Michael Sharman, Alexandra Barr, Amelia Glaese, Jerry Tworek 2510.04374v1
Reconsidering Requirements Engineering: Human-AI Collaboration in AI-Native Software Development Mateen Ahmed Abbasi, Petri Ihantola, Tommi Mikkonen, Niko Mäkitalo 2510.04380v1
MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator Xuehai He, Shijie Zhou, Thivyanth Venkateswaran, Kaizhi Zheng, Ziyu Wan, Achuta Kadambi, Xin Eric Wang 2510.04390v1
Internal World Models as Imagination Networks in Cognitive Agents Saurabh Ranjan, Brian Odegaard 2510.04391v1
Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards Faisal Hamman, Chenyang Zhu, Anoop Kumar, Xujun Peng, Sanghamitra Dutta, Daben Liu, Alfy Samuel 2510.04392v1
MulVuln: Enhancing Pre-trained LMs with Shared and Language-Specific Knowledge for Multilingual Vulnerability Detection Van Nguyen, Surya Nepal, Xingliang Yuan, Tingmin Wu, Fengchao Chen, Carsten Rudolph 2510.04397v1
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations Buyun Liang, Liangzu Peng, Jinqi Luo, Darshan Thaker, Kwan Ho Ryan Chan, René Vidal 2510.04398v1
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang 2510.05173v3
Logistic-Gated Operators Enable Auditable Unit-Aware Thresholds in Symbolic Regression Ou Deng, Ruichen Cong, Jianting Xu, Shoji Nishimura, Atsushi Ogihara, Qun Jin 2510.05178v2
Agentic Misalignment: How LLMs Could Be Insider Threats Aengus Lynch, Benjamin Wright, Caleb Larson, Stuart J. Ritchie, Soren Mindermann, Evan Hubinger, Ethan Perez, Kevin Troy 2510.05179v2
OptiFLIDS: Optimized Federated Learning for Energy-Efficient Intrusion Detection in IoT Saida Elouardi, Mohammed Jouhari, Anas Motii 2510.05180v2
Auditing Pay-Per-Token in Large Language Models Ander Artola Velasco, Stratis Tsirtsis, Manuel Gomez-Rodriguez 2510.05181v1
Ensemble Deep Learning and LLM-Assisted Reporting for Automated Skin Lesion Diagnosis Sher Khan, Raz Muhammad, Adil Hussain, Muhammad Sajjad, Muhammad Rashid 2510.06260v1
AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Linrui Xu, Tian Cheng, Guanyu Jiang, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han 2510.06261v1
Prakriti200: A Questionnaire-Based Dataset of 200 Ayurvedic Prakriti Assessments Aryan Kumar Singh, Janvi Singh 2510.06262v1
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech Yuxin Li, Eng Siong Chng, Cuntai Guan 2510.08593v1
Rounding-Guided Backdoor Injection in Deep Learning Model Quantization Xiangxiang Chen, Peixin Zhang, Jun Sun, Wenhai Wang, Jingyi Wang 2510.09647v1
PolyKAN: A Polyhedral Analysis Framework for Provable and Approximately Optimal KAN Compression Di Zhang 2510.04205v2
Epistemic Diversity and Knowledge Collapse in Large Language Models Dustin Wright, Sarah Masud, Jared Moore, Srishti Yadav, Maria Antoniak, Chan Young Park, Isabelle Augenstein 2510.04226v3
Don\'t Pass$@k$: A Bayesian Framework for Large Language Model Evaluation Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary 2510.04265v1
Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space Christian Limberg, Fares Schulz, Zhe Zhang, Stefan Weinzierl 2510.04339v1

评论