sciencemastodon.com is one of the many independent Mastodon servers you can use to participate in the fediverse.
A mastodon instance designed primarily for science journalists and scientists. Those who wish to join: to avoid significant delays in accepting your application please let us know your real name, your affiliation, and your area of expertise. Thank you!

Administered by:

Server stats:

165
active users

#atp

2 posts2 participants0 posts today

TheoremExplainAgent: Towards multimodal explanations for LLM theorem understanding. ~ Max Ku et als. arxiv.org/abs/2502.19400 #AI #LLMs #ATP #Logic #Math

arXiv.orgTheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem UnderstandingUnderstanding domain-specific theorems often requires more than just text-based reasoning; effective communication through structured visual explanations is crucial for deeper comprehension. While large language models (LLMs) demonstrate strong performance in text-based theorem reasoning, their ability to generate coherent and pedagogically meaningful visual explanations remains an open challenge. In this work, we introduce TheoremExplainAgent, an agentic approach for generating long-form theorem explanation videos (over 5 minutes) using Manim animations. To systematically evaluate multimodal theorem explanations, we propose TheoremExplainBench, a benchmark covering 240 theorems across multiple STEM disciplines, along with 5 automated evaluation metrics. Our results reveal that agentic planning is essential for generating detailed long-form videos, and the o3-mini agent achieves a success rate of 93.8% and an overall score of 0.77. However, our quantitative and qualitative studies show that most of the videos produced exhibit minor issues with visual element layout. Furthermore, multimodal explanations expose deeper reasoning flaws that text-based explanations fail to reveal, highlighting the importance of multimodal explanations.

CuDIP: Enhancing theorem proving in LLMs via curriculum learning-based direct preference optimization. ~ Shuming Shi et als. arxiv.org/abs/2502.18532 #AI #LLMs #ATP #Logic #Math

arXiv.orgCuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference OptimizationAutomated theorem proving (ATP) is one of the most challenging mathematical reasoning tasks for Large Language Models (LLMs). Most existing LLM-based ATP methods rely on supervised fine-tuning, which results in a limited alignment between the theorem proving process and human preferences. Direct Preference Optimization (DPO), which aligns LLMs with human preferences, has shown positive effects for certain tasks. However, the lack of high-quality preference data for theorem proving presents a significant challenge. In this paper, we innovatively apply DPO to formal automated theorem proving and introduces a Curriculum Learning-based DPO Iterative Theorem Proving (CuDIP) method. Specifically, we propose a method for constructing preference data which utilizes LLMs and existing theorem proving data to enhance the diversity of the preference data while reducing the reliance on human preference annotations. We then integrate this preference data construction method with curriculum learning to iteratively fine-tune the theorem proving model through DPO. Experimental results on the MiniF2F and ProofNet datasets demonstrate the effectiveness of the proposed method.