Complex Table Extraction from OECD Czech PISA Assessment
This PDF is a document from the OECD regarding the PISA assessment, provided in Czech. The main extraction goal is to get the survey question table found on page 9. Challenges include the weird table format, making it hard to extract automatically.
pdf.pages[7].inspect()
View full example →
sections[0].find_all('text').inspect()
View full example →
OCR and AI magic
Master OCR techniques with Natural PDF - from basic text recognition to advanced LLM-powered corrections. Learn to extract text from image-based PDFs, handle tables without proper boundaries, and leverage AI for accuracy improvements.
page.apply_ocr(resolution=50)
page.find_all('text').inspect()
View full example →
page.find_all('text').inspect()
View full example →