Abstract: Multimodal large language models (MLLMs) possess the ability to comprehend visual images or videos, and show impressive reasoning ability thanks to the vast amounts of pretrained knowledge, ...
To complete the above system, the author’s main research work includes: 1) Office document automation based on python-docx. 2) Use the Django framework to develop the website.