arXiv cs.AI 20251005 论文分析报告
📊 数据统计概览
📈基本统计
- 论文总数: 121
- 分析分类: cs.AI
- 时间范围: 20251005
- 独立作者数: 653
👥高产作者 Top 10
- Quanming Yao (3 篇)
- Ziying Zhang (2 篇)
- Yaqing Wang (2 篇)
- Fan Zhang (2 篇)
- Jiajun Wu (2 篇)
- Braden Teitge (2 篇)
- Jessalyn Holodinsky (2 篇)
- Steve Drew (2 篇)
- Joseph Ramsey (2 篇)
- Bryan Andrews (2 篇)
🔍热门关键词 Top 10
- language (70 次)
- learning (60 次)
- llms (58 次)
- reasoning (55 次)
- data (40 次)
- agents (28 次)
- llm (25 次)
- where (25 次)
- optimization (19 次)
- critical (18 次)
🤖 AI 深度分析
arXiv cs.AI 研究分析报告 对2025年10月5日发布的121篇论文的深入洞察 报告生成日期: 2025-10-28
1. 研究方向热度分析
本次分析的121篇论文中,大型语言模型(LLM)依然是绝对的研究核心,尤其是在Agent智能体、强化学习、模型评估与安全等方向上呈现出爆发性增长。研究者们不仅致力于提升模型的性能,更开始系统性地探索其在复杂任务中的自主性、可靠性和效率。
1.1 LLM Agent 与多智能体系统 (MAS)
相关论文: 约 25-30 篇
LLM Agent是本次分析中最热门的方向。研究重点从单一Agent的能力展示转向更复杂的系统性问题。
- 核心技术: 框架设计(AgentRL, Zephyrus, FairAgent)、多智能体协作与协商(NegotiationGym, Emergent Coordination)、工具使用(AlphaApollo)、以及Agent的自我演进和适应能力(Just-in-time Episodic Feedback)。
- 创新点: 提出了多个用于扩展Agent能力的框架,如用于科学发现的
Zephyrus和用于天气科学的AlphaApollo。同时,对Agent在社会模拟、经济活动和软件开发中的应用进行了探索。Agentic Misalignment等研究开始关注Agent可能带来的潜在风险,如成为内部威胁。 - 未来趋势: Agent的自主性、鲁棒性和安全性将是未来研究的焦点。如何设计能够进行长期规划、高效协作并与人类价值观对齐的Agent系统,将成为一个关键挑战。多Agent系统的社会动力学和经济模型也将是重要的探索方向。
1.2 LLM 推理与认知机制
相关论文: 约 15-20 篇
深入理解和增强LLM的推理能力是另一个核心主题。研究者们正从“让模型会推理”转向“理解模型如何推理”。
- 核心技术: 强化学习与推理的结合(RLVR, SFPO)、思维链(CoT)的忠实性分析(FaithCoT-Bench)、以及潜式推理(Latent Thought Policy Optimization)。
- 创新点:
FaithCoT-Bench等工作开始质疑并评测CoT过程的真实性。Internal states before wait等研究通过分析模型生成特定token(如"wait")前的内部状态,深入探索模型的自我修正机制。研究也开始从数学角度解释Transformer架构(A Mathematical Explanation of Transformers)。 - 未来趋势: 未来的研究将更侧重于“白盒”分析,即解密LLM内部的认知过程。如何实现更可靠、可解释和可控的推理,以及如何将潜式推理与显式推理(如CoT)相结合,将是重要的研究方向。
1.3 模型评估、基准与安全性
相关论文: 约 20 篇
随着模型能力的增强,如何全面、公平地评估模型,并确保其安全性,变得至关重要。
- 核心技术: 新型基准设计(GDPval, MacroBench, WebRenderBench)、对抗性攻击与防御(AgentTypo, SECA, SafeGuider)、以及模型后门与非学习方法的探索(MLLMEraser, QuRA)。
- 创新点:
GDPval从经济价值角度评估AI模型在真实世界任务中的表现,为衡量模型实用性提供了新视角。MacroBench和WebRenderBench则专注于评估LLM在Web自动化和UI生成方面的能力。安全性方面,研究者不仅提出了新型攻击方法(如利用排版漏洞的AgentTypo),也开发了更鲁棒的防御机制和“非学习”技术(MLLMEraser),以便在不重新训练的情况下移除有害知识。 - 未来趋势: 评估将更加注重模型在真实、动态和高风险环境下的表现。多模态安全、Agent安全以及对模型供应链(如量化过程)的安全审计将成为新的研究热点。
1.4 强化学习 (RL) 与 LLM 的融合
相关论文: 约 10-15 篇
强化学习(RL)已成为优化LLM(尤其是Agent)行为和推理能力的主流范式。
- 核心技术: 策略优化算法(SFPO)、模型基RL(Spatiotemporal Forecasting as Planning)、以及多目标RL框架(COSMO-RL)。
- 创新点:
AgentRL提出了一个可扩展的多任务、多轮次Agentic RL训练框架。COSMO-RL则通过联合优化模型的安全性和稳定性,探索如何构建更值得信赖的多模态推理模型。研究也开始关注离线RL在LLM中的理论和算法保证。 - 未来趋势: 如何提高RL训练的样本效率和稳定性仍然是核心挑战。将RL与世界模型、因果推理相结合,以及开发能够处理多模态输入和多目标优化的RL算法,将是未来的重要方向。
1.5 高效模型与 Transformer 架构创新
相关论文: 约 8-10 篇
降低LLM的训练和推理成本是推动其广泛应用的关键。
- 核心技术: 模型量化(PatternKV, QuRA)、注意力机制替代方案(RACE Attention)、以及新型网络结构(SliceMoE, PolyKAN)。
- 创新点:
RACE Attention提出了一种线性复杂度的注意力机制,有望将上下文长度扩展到十亿级别。SliceMoE通过对嵌入向量进行切片路由,实现了比传统MoE更细粒度的专家网络。PatternKV通过扁平化KV表示来优化KV缓存的量化。 - 未来趋势: 探索超越Transformer的新架构(如KAN, SNN)将持续升温。如何在保持模型性能的同时,极致地压缩模型大小、降低内存占用和计算成本,将是工业界和学术界共同追求的目标。
2. 作者关系图谱
通过分析论文作者的合作关系,我们可以识别出高产的研究者和紧密的合作团队。本次分析中,一些作者在多篇论文中出现,形成了一些研究核心。
高产作者: Joseph Ramsey, Bryan Andrews, Quanming Yao 等作者在本次论文集中发表了多篇论文,主要集中在因果发现和LLM推理领域。
影响力作者/团队: 一些论文拥有庞大的作者团队,如 "A global log for medical AI" 和 "Open Agent Specification",这反映了在AI基础设施和高风险应用领域,大规模跨机构合作已成为常态。
graph TD;
subgraph "因果发现"
JR["Joseph Ramsey"] -- "合作" --> BA["Bryan Andrews"];
JR -- "发表" --> p1["Efficient Latent Variable Causal Discovery"];
JR -- "发表" --> p2["Scalable Causal Discovery"];
BA -- "发表" --> p1;
BA -- "发表" --> p2;
end
subgraph "LLM推理与优化"
QY["Quanming Yao"] -- "合作" --> ZY_Zhang["Ziying Zhang"];
QY -- "合作" --> HQ_Qiu["Haiquan Qiu"];
QY -- "发表" --> p3["Searching Meta Reasoning Skeleton"];
QY -- "发表" --> p4["Attending on Multilevel Structure of Proteins"];
QY -- "发表" --> p5["Why Low-Precision Transformer Training Fails"];
ZY_Zhang -- "发表" --> p3;
ZY_Zhang -- "发表" --> p4;
HQ_Qiu -- "发表" --> p5;
end
subgraph "大型跨机构合作"
MedAI["A global log for medical AI (28位作者)"]
AgentSpec["Open Agent Specification (19位作者)"]
GDPval["GDPval (19位作者)"]
end
subgraph "Agent与RL"
AgentRL["AgentRL (14位作者)"]
AlphaApollo["AlphaApollo (17位作者)"]
end
QY --> AgentRL;
3. 技术创新总结
本次论文集展示了AI领域的诸多技术突破,尤其是在方法论、应用和基础理论层面。
3.1 关键技术突破
- 线性注意力机制 (RACE Attention): 提出了一种替代Softmax Attention的线性复杂度方法,理论上能将Transformer的上下文长度扩展到十亿级别,对处理超长序列数据意义重大。
- 细粒度专家混合模型 (SliceMoE): 传统的MoE在token级别路由,而SliceMoE在嵌入向量的“切片”级别进行路由,实现了更细粒度的专家分工和更好的负载均衡,为模型扩展提供了新思路。
- Agent自主演进系统 (AlphaApollo): 展示了一个能自我演进的Agentic推理系统,它通过编排多个基础模型和专业工具(如Python库)来解决复杂问题,并通过自我反思和知识更新来不断提升能力。
3.2 方法论创新
- 基于RL的推理优化 (SFPO, RLVR): 提出了多种RL优化框架(如Slow-Fast Policy Optimization)来稳定和加速LLM的推理能力训练,解决了早期训练梯度噪声大、探索效率低的问题。
- 非学习式模型修改 (MLLMEraser): 提出了一种在测试时通过激活向量转向来“擦除”模型特定知识的方法,无需重新训练即可实现模型内容的快速、可逆修改,为模型安全和隐私保护提供了新工具。
- 因果发现新范式 (BOSS, BF-BIC): 提出了结合分数搜索和定向测试的混合策略因果搜索算法,以及基于基函数展开的BIC分数,旨在从非线性数据中高效、可扩展地学习因果结构。
3.3 应用领域拓展
- AI for Science:
Zephyrus框架将LLM Agent应用于天气科学,AlphaApollo则面向更广泛的科学计算和推理,展示了AI作为科学家助手的巨大潜力。 - 医疗健康: 从
A global log for medical AI提出的标准化日志系统,到Doctor-R1和GROK模型在临床问诊和多模态诊断中的应用,AI正在深度融入医疗流程。 - 软件工程与Web自动化:
MacroBench和WebRenderBench等基准的提出,以及对需求工程、代码补全等方向的探索,标志着LLM正在成为下一代软件开发的核心驱动力。
4. 完整论文列表 (121篇)
| 标题 | 作者 | 链接 |
|---|---|---|
| A global log for medical AI | Ayush Noori, Adam Rodman, Alan Karthikesalingam, Bilal A. Mateen, Christopher A. Longhurst, Daniel Yang, Dave deBronkart, Gauden Galea, Harold F. Wolf III, Jacob Waxman, Joshua C. Mandel, Juliana Rotich, Kenneth D. Mandl, Maryam Mustafa, Melissa Miles, Nigam H. Shah, Peter Lee, Robert Korom, Scott Mahoney, Seth Hain, Tien Yin Wong, Trevor Mundel, Vivek Natarajan, Noa Dagan, David A. Clifton, Ran D. Balicer, Isaac S. Kohane, Marinka Zitnik | 2510.04033v1 |
| FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning | Xu Shen, Song Wang, Zhen Tan, Laura Yao, Xinyu Zhao, Kaidi Xu, Xin Wang, Tianlong Chen | 2510.04040v1 |
| Increasing LLM response trustworthiness using voting ensembles | Aparna Nair-Kanneganti, Trevor J. Chan, Shir Goldfinger, Emily Mackay, Brian Anthony, Alison Pouch | 2510.04048v1 |
| Toward a unified framework for data-efficient evaluation of large language models | Lele Liao, Qile Zhang, Ruofan Wu, Guanhua Fang | 2510.04051v1 |
| Decoding Emotion in the Deep: A Systematic Study of How LLMs Represent, Retain, and Express Emotion | Jingxiang Zhang, Lujia Zhong | 2510.04064v2 |
| Moral Anchor System: A Predictive Framework for AI Value Alignment and Drift Prevention | Santhosh Kumar Ravindran | 2510.04073v1 |
| SPOGW: a Score-based Preference Optimization method via Group-Wise comparison for workflows | Yitong Cui, Liu Liu, Baosheng Yu, Jiayan Qiu, Xikai Zhang, Likang Xiao, Yixing Liu, Quan Chen | 2510.04089v1 |
| Harnessing LLM for Noise-Robust Cognitive Diagnosis in Web-Based Intelligent Education Systems | Guixian Zhang, Guan Yuan, Ziqi Xu, Yanmei Zhang, Jing Ren, Zhenyun Deng, Debo Cheng | 2510.04093v2 |
| WebRenderBench: Enhancing Web Interface Generation through Layout-Style Consistency and Reinforcement Learning | Peichao Lai, Jinhui Zhuang, Kexuan Zhang, Ningchang Xiong, Shengjie Wang, Yanwei Xu, Chong Chen, Yilei Wang, Bin Cui | 2510.04097v2 |
| Searching Meta Reasoning Skeleton to Guide LLM Reasoning | Ziying Zhang, Yaqing Wang, Quanming Yao | 2510.04116v1 |
| The Artificial Intelligence Cognitive Examination: A Survey on the Evolution of Multimodal Evaluation from Recognition to Reasoning | Mayank Ravishankara, Varindra V. Persad Maharaj | 2510.04141v1 |
| Open Agent Specification (Agent Spec) Technical Report | Yassine Benajiba, Cesare Bernardis, Vladislav Blinov, Paul Cayet, Hassan Chafi, Abderrahim Fathan, Louis Faucon, Damien Hilloulin, Sungpack Hong, Ingo Kossyk, Rhicheek Patra, Sujith Ravi, Jonas Schweizer, Jyotika Singh, Shailender Singh, Xuelin Situ, Weiyi Sun, Jerry Xu, Ying Xu | 2510.04173v2 |
| Constructing coherent spatial memory in LLM agents through graph rectification | Puzhen Zhang, Xuyang Chen, Yu Feng, Yuhan Jiang, Liqiu Meng | 2510.04195v1 |
| AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework | Hanchen Zhang, Xiao Liu, Bowen Lv, Xueqiao Sun, Bohao Jing, Iat Long Iong, Zhenyu Hou, Zehan Qi, Hanyu Lai, Yifan Xu, Rui Lu, Hongning Wang, Jie Tang, Yuxiao Dong | 2510.04206v1 |
| GROK: From Quantitative Biomarkers to Qualitative Diagnosis via a Grounded MLLM with Knowledge-Guided Instruction | Zhuangzhi Gao, Hongyi Qin, He Zhao, Qinkai Yu, Feixiang Zhou, Eduard Shantsila, Uazman Alam, Alena Shantsila, Wahbi El-Bouri, Gregory Y. H. Lip, Yalin Zheng | 2510.04281v1 |
| Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning | Yunghwei Lai, Kaiming Liu, Ziyue Wang, Weizhi Ma, Yang Liu | 2510.04284v1 |
| Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation | Hadi Nekoei, Aman Jaiswal, Patrice Bechard, Oleh Shliazhko, Orlando Marquez Ayala, Mathieu Reymond, Massimo Caccia, Alexandre Drouin, Sarath Chandar, Alexandre Lacoste | 2510.04373v1 |
| LLM Based Bayesian Optimization for Prompt Search | Adam Ballew, Jingbo Wang, Shaogang Ren | 2510.04384v2 |
| Representation Potentials of Foundation Models for Multimodal Alignment: A Survey | Jianglin Lu, Hailing Wang, Yi Xu, Yizhou Wang, Kuo Yang, Yun Fu | 2510.05184v1 |
| Distilling Reasoning into Student LLMs: Local Naturalness for Selecting Teacher Data | Hoang Anh Just, Myeongseob Ko, Ruoxi Jia | 2510.03988v1 |
| Quantifying Distributional Robustness of Agentic Tool-Selection | Jehyeok Yeon, Isha Chaudhary, Gagandeep Singh | 2510.03992v1 |
| PrivSpike: Employing Homomorphic Encryption for Private Inference of Deep Spiking Neural Networks | Nges Brian Njungle, Eric Jahns, Milan Stojkov, Michel A. Kinsy | 2510.03995v1 |
| Named Entity Recognition in COVID-19 tweets with Entity Knowledge Augmentation | Xuankang Zhang, Jiangming Liu | 2510.04001v1 |
| Replacing Softmax Similarity with a Sharpened Angular Similarity: Theory and Practice of Scaling To Billion-Context Attention | Sahil Joshi, Agniva Chowdhury, Amar Kanakamedala, Ekam Singh, Evan Tu, Anshumali Shrivastava | 2510.04008v2 |
| What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models | Zicong He, Boxuan Zhang, Weihao Liu, Ruixiang Tang, Lu Cheng | 2510.04009v1 |
| Thai Semantic End-of-Turn Detection for Real-Time Voice Agents | Thanapol Popit, Natthapath Rungseesiripak, Monthol Charattrakool, Saksorn Ruangtanusak | 2510.04016v1 |
| Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models | Hao Wu, Yuan Gao, Xingjian Shi, Shuaipeng Li, Fan Xu, Fan Zhang, Zhihong Zhu, Weiyan Wang, Xiao Luo, Kun Wang, Xian Wu, Xiaomeng Huang | 2510.04020v3 |
| LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions | Mizanur Rahman, Amran Bhuiyan, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Ridwan Mahbub, Ahmed Masry, Shafiq Joty, Enamul Hoque | 2510.04023v1 |
| The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View | Xinhao Yao, Lu Yu, Xiaolin Hu, Fengwei Teng, Qing Cui, Jun Zhou, Yong Liu | 2510.04028v1 |
| Does Using Counterfactual Help LLMs Explain Textual Importance in Classification? | Nelvin Tan, James Asikin Cheung, Yu-Ching Shih, Dong Yang, Amol Salunkhe | 2510.04031v1 |
| Small Language Models for Emergency Departments Decision Support: A Benchmark Study | Zirui Wang, Jiajun Wu, Braden Teitge, Jessalyn Holodinsky, Steve Drew | 2510.04032v1 |
| Prompt-to-Prompt: Text-Based Image Editing Via Cross-Attention Mechanisms -- The Research of Hyperparameters and Novel Mechanisms to Enhance Existing Frameworks | Linn Bieske, Carla Lorente | 2510.04034v1 |
| \textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding | Bin Lei, Nuo Xu, Ali Payani, Mingyi Hong, Chunhua Liao, Yu Cao, Caiwen Ding | 2510.04039v1 |
| Quantization Range Estimation for Convolutional Neural Networks | Bingtao Yang, Yujia Wang, Mengzhi Jiao, Hongwei Huo | 2510.04044v1 |
| MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation | Zhenyu Pan, Yucheng Lu, Han Liu | 2510.04057v1 |
| Efficient Training of Spiking Neural Networks by Spike-aware Data Pruning | Chenxiang Ma, Xinyi Chen, Yujie Wu, Kay Chen Tan, Jibin Wu | 2510.04098v1 |
| TOPO-Bench: An Open-Source Topological Mapping Evaluation Framework with Quantifiable Perceptual Aliasing | Jiaming Wang, Diwen Liu, Jizhuo Chen, Harold Soh | 2510.04100v1 |
| Unveiling LLMs\' Metaphorical Understanding: Exploring Conceptual Irrelevance, Context Leveraging and Syntactic Influence | Fengying Ye, Shanshan Wang, Lidia S. Chao, Derek F. Wong | 2510.04120v1 |
| Attending on Multilevel Structure of Proteins enables Accurate Prediction of Cold-Start Drug-Target Interactions | Ziying Zhang, Yaqing Wang, Yuxuan Sun, Min Ye, Quanming Yao | 2510.04126v1 |
| Internal states before wait modulate reasoning patterns | Dmitrii Troitskii, Koyena Pal, Chris Wendler, Callum Stuart McDougall, Neel Nanda | 2510.04128v1 |
| On the Limitations and Capabilities of Position Embeddings for Length Generalization | Yang Chen, Yitao Liang, Zhouchen Lin | 2510.04130v1 |
| PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting | Yiming Niu, Jinliang Deng, Yongxin Tong | 2510.04134v1 |
| GA4GC: Greener Agent for Greener Code via Multi-Objective Configuration Optimization | Jingzhi Gong, Yixin Bian, Luis de la Cal, Giovanni Pinna, Anisha Uteem, David Williams, Mar Zamorano, Karine Even-Mendoza, W. B. Langdon, Hector Menendez, Federica Sarro | 2510.04135v1 |
| Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs | Zishang Jiang, Jinyi Han, Tingyun Li, Xinyi Wang, Sihang Jiang, Jiaqing Liang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao | 2510.04140v1 |
| Multi Language Models for On-the-Fly Syntax Highlighting | Marco Edoardo Palma, Pooja Rani, Harald C. Gall | 2510.04166v1 |
| Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization | Wengao Ye, Yan Liang, Lianlei Shan | 2510.04182v1 |
| A Complement to Neural Networks for Anisotropic Inelasticity at Finite Strains | Hagen Holthusen, Ellen Kuhl | 2510.04187v1 |
| Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity | Prashansa Panda, Shalabh Bhatnagar | 2510.04189v1 |
| Cooperative Flexibility Exchange: Fair and Comfort-Aware Decentralized Resource Allocation | Rabiya Khalid, Evangelos Pournaras | 2510.04192v1 |
| COSMO-RL: Towards Trustworthy LMRMs via Joint Safety and Stability | Yizhuo Ding, Mingkang Chen, Qiuhua Liu, Fenghua Weng, Wanying Qu, Yue Yang, Yugang Jiang, Zuxuan Wu, Yanwei Fu, Wenqi Shao | 2510.04196v1 |
| World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge | Moo Hyun Son, Jintaek Oh, Sun Bin Mun, Jaechul Roh, Sehyun Choi | 2510.04201v1 |
| Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention | Haiquan Qiu, Quanming Yao | 2510.04212v2 |
| MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering | Chenlu Ding, Jiancan Wu, Leheng Sheng, Fan Zhang, Yancheng Yuan, Xiang Wang, Xiangnan He | 2510.04217v2 |
| When AI Gets Persuaded, Humans Follow: Inducing the Conformity Effect in Persuasive Dialogue | Rikuo Sasaki, Michimasa Inaba | 2510.04229v2 |
| Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling | Kai Yang, Yuqi Huang, Junheng Tao, Wanyu Wang, Qitian Wu | 2510.04233v1 |
| Flexible Locomotion Learning with Diffusion Model Predictive Control | Runhan Huang, Haldun Balim, Heng Yang, Yilun Du | 2510.04234v1 |
| Empowering Denoising Sequential Recommendation with Large Language Model Embeddings | Tongzhou Wu, Yuhao Wang, Maolin Wang, Chi Zhang, Xiangyu Zhao | 2510.04239v1 |
| Diffusion-Assisted Distillation for Self-Supervised Graph Representation Learning with MLPs | Seong Jin Ahn, Myoung-Ho Kim | 2510.04241v1 |
| Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks | Ayushi Mehrotra, Derek Peng, Dipkamal Bhusal, Nidhi Rastogi | 2510.04245v1 |
| ContextVLA: Vision-Language-Action Model with Amortized Multi-Frame Context | Huiwon Jang, Sihyun Yu, Heeseung Kwon, Hojin Jeon, Younggyo Seo, Jinwoo Shin | 2510.04246v1 |
| AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents | Yanjie Li, Yiming Cao, Dong Wang, Bin Xiao | 2510.04257v1 |
| Efficient Latent Variable Causal Discovery: Combining Score Search and Targeted Testing | Joseph Ramsey, Bryan Andrews | 2510.04263v1 |
| LongTail-Swap: benchmarking language models' abilities on rare words | Robin Algayres, Charles-Éric Saint-James, Mahi Luthra, Jiayi Shen, Dongyan Lin, Youssef Benchekroun, Rashel Moritz, Juan Pino, Emmanuel Dupoux | 2510.04268v1 |
| Scalable Causal Discovery from Recursive Nonlinear Data via Truncated Basis Function Scores and Tests | Joseph Ramsey, Bryan Andrews | 2510.04276v1 |
| Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs | Om Tailor | 2510.04303v2 |
| On the Importance of Task Complexity in Evaluating LLM-Based Multi-Agent Systems | Bohan Tang, Huidong Liang, Keyue Jiang, Xiaowen Dong | 2510.04311v1 |
| FairAgent: Democratizing Fairness-Aware Machine Learning with LLM-Powered Agents | Yucong Dai, Lu Zhang, Feng Luo, Mashrur Chowdhury, Yongkai Wu | 2510.04317v1 |
| Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time | Daniel Tan, Anders Woodruff, Niels Warncke, Arun Jose, Maxime Riché, David Demitri Africa, Mia Taylor | 2510.04340v3 |
| Critical appraisal of artificial intelligence for rare-event recognition: principles and pharmacovigilance case studies | G. Niklas Noren, Eva-Lisa Meldau, Johan Ellenius | 2510.04341v1 |
| NegotiationGym: Self-Optimizing Agents in a Multi-Agent Social Simulation Environment | Shashank Mangla, Chris Hokamp, Jack Boylan, Demian Gholipour Ghalandari, Yuuv Jauhari, Lauren Cassidy, Oisin Duffy | 2510.04368v1 |
| Adaptive Weighted Loss for Sequential Recommendations on Sparse Domains | Akshay Mittal, Vinay Venkatesh, Krishna Kandi, Shalini Sudarshan | 2510.04375v1 |
| Utility-Learning Tension in Self-Modifying Agents | Charles L. Wang, Keir Dorchen, Peter Jin | 2510.04399v1 |
| From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs | Guangyu Shen, Siyuan Cheng, Xiangzhe Xu, Yuan Zhou, Hanxi Guo, Zhuo Zhang, Xiangyu Zhang | 2510.05169v1 |
| Emergent Coordination in Multi-Agent Language Models | Christoph Riedl | 2510.05174v1 |
| PatternKV: Flattening KV Representation Expands Quantization Headroom | Ji Zhang, Yiwei Li, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Jiayi Shi, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li | 2510.05176v1 |
| Dual-stage and Lightweight Patient Chart Summarization for Emergency Physicians | Jiajun Wu, Swaleh Zaidi, Braden Teitge, Henry Leung, Jiayu Zhou, Jessalyn Holodinsky, Steve Drew | 2510.06263v1 |
| Real-Time Health Analytics Using Ontology-Driven Complex Event Processing and LLM Reasoning: A Tuberculosis Case Study | Ritesh Chandra, Sonali Agarwal, Navjot Singh | 2510.09646v1 |
| Benchmarking Open-Source Large Language Models for Persian in Zero-Shot and Few-Shot Learning | Mahdi Cherakhloo, Arash Abbasi, Mohammad Saeid Sarafraz, Bijan Vosoughi Vahdat | 2510.12807v1 |
| A Mathematical Explanation of Transformers for Large Language Models and GPTs | Xue-Cheng Tai, Hao Liu, Lingfeng Li, Raymond H. Chan | 2510.03989v1 |
| AI-Driven Grading and Moderation for Collaborative Projects in Computer Science Education | Songmei Yu, Andrew Zagula | 2510.03998v1 |
| Zephyrus: An Agentic Framework for Weather Science | Sumanth Varambally, Marshall Fisher, Jas Thakker, Yiwei Chen, Zhirui Xia, Yasaman Jafari, Ruijia Niu, Manas Jain, Veeramakali Vignesh Manivannan, Zachary Novack, Luyu Han, Srikar Eranky, Salva Rühling Cachay, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yi-An Ma, Rose Yu | 2510.04017v1 |
| Principled and Tractable RL for Reasoning with Diffusion Language Models | Anthony Zhan | 2510.04019v1 |
| What Scales in Cross-Entropy Scaling Law? | Junxi Yan, Zixi Wei, Jingtao Zhan, Qingyao Ai, Yiqun Liu | 2510.04067v1 |
| Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning | Ziyan Wang, Zheng Wang, Jie Fu, Xingwei Qu, Qi Cheng, Shengpu Tang, Minjia Zhang, Xiaoming Huo | 2510.04072v2 |
| Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling | Hyung Gyu Rho, Sian Lee | 2510.04087v2 |
| Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees | Nan Jiang, Tengyang Xie | 2510.04088v1 |
| Using predefined vector systems as latent space configuration for neural network supervised training on data with arbitrarily large number of classes | Nikita Gabdullin | 2510.04090v1 |
| Learning-Based Hashing for ANN Search: Foundations and Early Advances | Sean Moran | 2510.04127v1 |
| Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs | Xiaoyu Yang, Jie Lu, En Yu | 2510.04142v1 |
| Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models | Minseo Kim, Coleman Hooper, Aditya Tomar, Chenfeng Xu, Mehrdad Farajtabar, Michael W. Mahoney, Kurt Keutzer, Amir Gholami | 2510.04146v1 |
| CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling | Zhengyang Tang, Zihan Ye, Chenyu Huang, Xuhan Huang, Chengpeng Li, Sihang Li, Guanua Chen, Ming Yan, Zizhuo Wang, Hongyuan Zha, Dayiheng Liu, Benyou Wang | 2510.04204v1 |
| MASC: Boosting Autoregressive Image Generation with a Manifold-Aligned Semantic Clustering | Lixuan He, Shikang Zheng, Linfeng Zhang | 2510.04220v1 |
| Zoom-In to Sort AI-Generated Images Out | Yikun Ji, Yan Hong, Bowen Deng, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang | 2510.04225v1 |
| Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales | Jinyang Jiang, Jinhui Han, Yijie Peng, Ying Zhang | 2510.04272v1 |
| A KL-regularization framework for learning to plan with adaptive priors | Álvaro Serra-Gomez, Daniel Jarne Ornia, Dhruva Tirumala, Thomas Moerland | 2510.04280v1 |
| SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling | Harshil Vejendla | 2510.04286v1 |
| Challenge on Optimization of Context Collection for Code Completion | Dmitry Ustalov, Egor Bogomolov, Alexander Bezzubov, Yaroslav Golubev, Evgeniy Glukhov, Georgii Levtsov, Vladimir Kovalenko | 2510.04349v1 |
| Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators | Apurva Badithela, David Snyder, Lihan Zha, Joseph Mikhail, Matthew O\'Kelly, Anushri Dixit, Anirudha Majumdar | 2510.04354v1 |
| MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models | Hyunjun Kim, Sejong Kim | 2510.04363v2 |
| Speculative Actions: A Lossless Framework for Faster Agentic Systems | Naimeng Ye, Arnav Ahuja, Georgios Liargkovas, Yunan Lu, Kostis Kaffes, Tianyi Peng | 2510.04371v1 |
| GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks | Tejal Patwardhan, Rachel Dias, Elizabeth Proehl, Grace Kim, Michele Wang, Olivia Watkins, Simón Posada Fishman, Marwan Aljubeh, Phoebe Thacker, Laurance Fauconnet, Natalie S. Kim, Patrick Chao, Samuel Miserendino, Gildas Chabot, David Li, Michael Sharman, Alexandra Barr, Amelia Glaese, Jerry Tworek | 2510.04374v1 |
| Reconsidering Requirements Engineering: Human-AI Collaboration in AI-Native Software Development | Mateen Ahmed Abbasi, Petri Ihantola, Tommi Mikkonen, Niko Mäkitalo | 2510.04380v1 |
| MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator | Xuehai He, Shijie Zhou, Thivyanth Venkateswaran, Kaizhi Zheng, Ziyu Wan, Achuta Kadambi, Xin Eric Wang | 2510.04390v1 |
| Internal World Models as Imagination Networks in Cognitive Agents | Saurabh Ranjan, Brian Odegaard | 2510.04391v1 |
| Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards | Faisal Hamman, Chenyang Zhu, Anoop Kumar, Xujun Peng, Sanghamitra Dutta, Daben Liu, Alfy Samuel | 2510.04392v1 |
| MulVuln: Enhancing Pre-trained LMs with Shared and Language-Specific Knowledge for Multilingual Vulnerability Detection | Van Nguyen, Surya Nepal, Xingliang Yuan, Tingmin Wu, Fengchao Chen, Carsten Rudolph | 2510.04397v1 |
| SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations | Buyun Liang, Liangzu Peng, Jinqi Luo, Darshan Thaker, Kwan Ho Ryan Chan, René Vidal | 2510.04398v1 |
| SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models | Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang | 2510.05173v3 |
| Logistic-Gated Operators Enable Auditable Unit-Aware Thresholds in Symbolic Regression | Ou Deng, Ruichen Cong, Jianting Xu, Shoji Nishimura, Atsushi Ogihara, Qun Jin | 2510.05178v2 |
| Agentic Misalignment: How LLMs Could Be Insider Threats | Aengus Lynch, Benjamin Wright, Caleb Larson, Stuart J. Ritchie, Soren Mindermann, Evan Hubinger, Ethan Perez, Kevin Troy | 2510.05179v2 |
| OptiFLIDS: Optimized Federated Learning for Energy-Efficient Intrusion Detection in IoT | Saida Elouardi, Mohammed Jouhari, Anas Motii | 2510.05180v2 |
| Auditing Pay-Per-Token in Large Language Models | Ander Artola Velasco, Stratis Tsirtsis, Manuel Gomez-Rodriguez | 2510.05181v1 |
| Ensemble Deep Learning and LLM-Assisted Reporting for Automated Skin Lesion Diagnosis | Sher Khan, Raz Muhammad, Adil Hussain, Muhammad Sajjad, Muhammad Rashid | 2510.06260v1 |
| AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning | Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Linrui Xu, Tian Cheng, Guanyu Jiang, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han | 2510.06261v1 |
| Prakriti200: A Questionnaire-Based Dataset of 200 Ayurvedic Prakriti Assessments | Aryan Kumar Singh, Janvi Singh | 2510.06262v1 |
| Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech | Yuxin Li, Eng Siong Chng, Cuntai Guan | 2510.08593v1 |
| Rounding-Guided Backdoor Injection in Deep Learning Model Quantization | Xiangxiang Chen, Peixin Zhang, Jun Sun, Wenhai Wang, Jingyi Wang | 2510.09647v1 |
| PolyKAN: A Polyhedral Analysis Framework for Provable and Approximately Optimal KAN Compression | Di Zhang | 2510.04205v2 |
| Epistemic Diversity and Knowledge Collapse in Large Language Models | Dustin Wright, Sarah Masud, Jared Moore, Srishti Yadav, Maria Antoniak, Chan Young Park, Isabelle Augenstein | 2510.04226v3 |
| Don\'t Pass$@k$: A Bayesian Framework for Large Language Model Evaluation | Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary | 2510.04265v1 |
| Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space | Christian Limberg, Fares Schulz, Zhe Zhang, Stefan Weinzierl | 2510.04339v1 |
评论