Computer Integrated Manufacturing System ›› 2024, Vol. 30 ›› Issue (5): 1587-1594.DOI: 10.13196/j.cims.2024.0139

Previous Articles     Next Articles

Large language model-based approach for human-mobile inspection robot interactive navigation

WANG Tian,FAN Junming,ZHENG Pai+   

  1. Department of Industrial and Systems Engineering,The Hong Kong Polytechnic University
  • Online:2024-05-31 Published:2024-06-12
  • Supported by:
    Project supported by the General Research Fund (GRF) from the Research Grants Council of the Hong Kong Special Administrative Region,China(No.15210222,15206723).

基于大语言模型的人机交互移动检测机器人导航方法

王湉,范峻铭,郑湃+   

  1. 香港理工大学工业及系统工程学系
  • 作者简介:王湉(2000-),女,四川成都人,博士研究生,研究方向:视觉语言人机交互、人机协作,E-mail:tianna.wang@connect.polyu.hk; 范峻铭(1992-),男,重庆人,博士后,博士,研究方向:计算机视觉、6D姿态估计、人机协作; +郑湃(1988-),男,江苏扬州人,副教授,博士,博士生导师,研究方向:人机协作制造系统、智能产品服务系统、工业人工智能等,通讯作者,E-mail:pai.zheng@polyu.edu.hk。
  • 基金资助:
    香港研究资助局资助项目(15210222,15206723)。

Abstract: In the manufacturing field,the wide application of mobile robots has become the key to improving operational safety and efficiency.However,most existing robotic systems can only complete predefined navigation tasks,and cannot be adapted to the unstructured environment.To overcome this bottleneck,an interactive navigation method for mobile inspection robots based on large language models was introduced,which replaced operators in conducting inspections within hazardous industrial areas,and to execute complex navigation tasks based on verbal instructions.The High-Resolution Net (HRNet) model was utilized for semantic scene segmentation,integrating the segmentation results into the reconstructed 3D scene mesh during the point cloud fusion phase to create a comprehensive 3D semantic map.A large language model was used to make the robot comprehend human natural language instructions and generate Python code based on the 3D semantic map to complete navigation tasks.A series of experiments had been conducted to validate the effectiveness of the proposed system.

Key words: human-robot interaction, large language model, vision and language navigation, smart manufacturing, Industry 5.0

摘要: 在工业制造领域,移动机器人的广泛应用已成为提高作业安全和效率的关键。然而,现有的机器人系统只能完成预定义的导航任务,无法适应非结构化场景。为了突破这一瓶颈,提出一种基于大语言模型(LLM)的人机交互移动检测机器人导航方法,可代替操作人员进入工业环境中的危险区域进行检测,并且可以根据人类自然语言指令完成复杂的导航任务。首先,通过高分辨率网络(HRNet)模型进行场景语义分割,并在点云融合阶段将语义分割结果渲染到重建的三维场景网格模型中,得到三维语义地图;然后利用大语言模型让机器人可以理解人类的自然语言指令,并根据创建的三维语义地图生成Python代码控制机器人完成导航任务。最后,通过一系列非结构化场景下的实验验证了该系统的有效性。

关键词: 人机交互, 大语言模型, 视觉语言导航, 智能制造, 工业5.0

CLC Number: