Accurate assessment of fish feeding behavior is essential for efficient feed management and stable operation of aquaculture systems. Aquaculture feeding mainly relies on manual and machine-based approaches. Manual feeding depends on operator experience, often resulting in uneven feed distribution and increased water pollution. Machine-based feeding typically uses preset quantities based on stocking density, yet this fixed strategy cannot adjust to environmental fluctuations that influence fish appetite, leading to substantial feed waste. To address the problem of uneven feed distribution, we introduce a Lightweight Dynamic Cross-Modal Fusion Network (LDMF). The system synchronously acquires visual and acoustic information related to fish feeding behavior and evaluates the feeding state with high precision. By obtaining reliable real-time assessments, the network enables accurate feed delivery and effectively supports intelligent feeding management. First, the network extracts feeding-related features through dual audio–video branches, allowing the two modalities to complement each other and address the problem of uneven feed distribution more effectively. In addition, the proposed dynamic interaction fusion module adjusts fusion weights according to the characteristics of the input features. This mechanism enables the model to adapt to real feeding fluctuations and supports timely adjustments to feeding strategies, thereby reducing the risk of feed waste and environmental pollution. Finally, experimental results show that LDMF achieves an accuracy of 98.2% on the AV-FFIA dataset, outperforming existing unimodal models and other mainstream multimodal approaches. The network maintains strong robustness under highly dynamic recirculating aquaculture conditions and offers a practical solution for intelligent feeding applications.