Computer Integrated Manufacturing System ›› 2022, Vol. 28 ›› Issue (7): 2179-2187.DOI: 10.13196/j.cims.2022.07.023

Previous Articles     Next Articles

Recognition algorithm of irregular table in mechanical processing card

LYU Zhigang1,2,WANG Hongxi1+,LI Liangliang1,WANG Peng2,LI Xiaoyan2,DI Ruohai2   

  1. 1.School of Mechatronic Engineering,Xi'an Technological University
    2.School of Electronics and Information Engineering,Xi'an Technological University
  • Online:2022-07-31 Published:2022-08-09
  • Supported by:
    Project supported by the National Key Laboratory of Electronic Information System Complex Electromagnetic Environmental Effects Foundation,China(No.CEMEE2020Z0202B),the Shaanxi Provincial Natural Science Basic Research Program,China(No.2020JQ-816),the Xian Science and Technology Program,China(No.2020KJRC0033),and the Shaanxi Provincial Department of Education Special Scientific Research Program,China(No.20JK0680).

机械工艺卡非规则表格元素识别算法

吕志刚1,2,王洪喜1+,李亮亮1,王鹏2,李晓艳2,邸若海2   

  1. 1.西安工业大学机电工程学院
    2.西安工业大学电子信息工程学院
  • 基金资助:
    电子信息系统复杂电磁环境效应国家重点实验室基金资助项目(CEMEE2020Z0202B);陕西省自然科学基础研究计划资助项目(2020JQ-816);西安市科技计划资助项目(2020KJRC0033);陕西省教育厅专项科研计划资助项目(20JK0680)。

Abstract: Among the existing printed tables in industrial area,there are many irregular phenomena,such as discontinuity of vertical line segments,dislocation of frame lines and distribution in adjacent pages,which cant be recognized successfully by existing Optical Character Recognition (OCR) software.Therefore,an irregular table recognition algorithm by incorporating local features was proposed.The threshold of regional block was calculated,including block extraction,local horizontal line segment detection,mean clustering solution of row space and vertical line segment detection based on threshold row space in the block area.Pre-positioning recognition of table area was performed,including image corrosion,binarization in grayscale image,horizontal line segment extraction with adaptive threshold,vertical line segment pre-extraction based on threshold of row space,feature fusion of vertical block image,custom mask processing and contour pre-extraction.Area re-detection was used to accurately distinguish the pre-extraction table area.The experimental results showed that the proposed algorithm could solve the above problem of recognizing irregular table.In the 12840 test set samples,the average recognition accuracy could reach more than 98.03%.The proposed algorithm was simple and effective,which had been implemented on the QT platform.The OCR software had been successfully applied in the information center of a certain research institute.

Key words: irregular table recognition, features fusion, statistical clustering, image processing

摘要: 在现有纸质机械工艺卡中,表格元素存在纵向线段不连续、框线错位、跨页等不规则现象,导致传统的光学字符识别(OCR)算法无法准确定位识别表格元素,由此提出一种融合局部特征的非规则表格识别算法。首先,进行区域分块阈值求解,包括分块提取、局部横向线段检测、行距均值聚类求解,以及基于行距阈值的分块区域纵向线段检测;其次,进行表格区域预定位识别,包括源文件腐蚀、灰度二值化、自适应基础阈值的横向线段提取、基于行距阈值的纵向线段预提取、纵向分块图像特征融合、自定义掩膜处理,以及轮廓预提取;最后,使用区域重检测的方法,对预提取表格区域进行精准判别。经实验验证,该方法可以有效地解决未校正、纵向线段不连续、表格跨页等复杂表格难以准确定位提取的问题。在12840张表格图像构成的测试集样本中进行了测试,平均识别准确率可达98.03%以上。该算法简洁有效,并在QT集成开发环境上得到了实现,该OCR软件已在某研究所信息化中心得到了成功应用。

关键词: 非规则表格识别, 特征融合, 统计聚类, 图像处理

CLC Number: