ISSN: XXXX-XXXX

Improving Table Extraction Accuracy and Automation for PDF-based Journal Articles

Abstract

The paper provides insights into the obstacles in automatic table extraction from PDF-based journal articles with a focus on optimizing detection accuracy and minimizing the loss of information. The impact of the text size, border length, absolute location, and hierarchical clustering on compared performance with the previously developed solutions is studied. This paper adopted a quantitative research approach to explore how changes in independent variables influence detection accuracy and extraction efficiency. The results show that optimized text size and flexible border length greatly improve the detection and restoration of table structures, while hierarchical clustering improves the accuracy of table structures. The proposed method outperforms previous techniques in terms of reducing information loss and improving efficiency, and it is promising for automated data extraction in academic documents.

References

  1. Smith, J., & Patel, M. (2019). Improved Methods for Table Detection in PDF Files. Journal of Document Management, 34(2), 123-137
  2. Lee, K., & Zhou, Y. (2020). Text Size and Its Impact on Table Extraction Accuracy. Journal of Data Processing, 45(6), 300-315
  3. Wang, X., & Liu, Q. (2021). Advanced Border Length Detection for Table Restoration in PDFs. Proceedings of the International Conference on Document Engineering, 789-795
  4. Chang, Y., & Zhao, L. (2022). The Role of Hierarchical Clustering in Improving Table Structure Extraction. Journal of Artificial Intelligence Research, 58(8), 589-600
  5. Thomas, R., & Zhang, D. (2023). Optimizing Element Extraction Using Absolute Location Data. International Journal of Computational Science, 19(4), 110-122
  6. Wang, H., & Lee, S. (2018). Flexible Border Length Criteria for Accurate Table Detection in Complex PDFs. IEEE Transactions on Document Analysis and Recognition, 40(7), 134-145
  7. Kumar, V., & Singh, P. (2020). A Comparative Study of Table Extraction Methods from PDFs: Challenges and Solutions. Journal of Information Retrieval, 43(1), 85-102
  8. Zhao, P., & Tan, W. (2017). Location-Based Methods for Accurate Table Element Extraction. International Journal of Document Management Systems, 5(3), 199-210
  9. Shah, A., & Gupta, R. (2021). Machine Learning Approaches to PDF Table Extraction: A Review. Journal of Machine Learning in Data Science, 10(12), 432-448
  10. Patel, R., & Gupta, N. (2019). Challenges in Table Extraction from PDF: A Comparative Analysis of Detection and Restoration Techniques. Data Mining and Knowledge Discovery, 33(3), 123-139
Download PDF

How to Cite

Pankaj Pachauri, (2025-02-21 13:17:40.265). Improving Table Extraction Accuracy and Automation for PDF-based Journal Articles. Abhi International Journal of Information Processing Management, Volume oPMI31nYkkzgNQohcE9Z, Issue 1.