snap_to_whitespace

2 usages across 2 PDFs

Extracting State Agency Call Center Wait Times from FOIA PDF

This PDF contains data on wait times at a state agency call center. The main focus is on the data on the first two pages, which matches other states' submission formats. The later pages provide granular breakdowns over several years. Challenges include it being heavily pixelated, making it hard to read numbers and text, with inconsistent and unreadable charts.


guide = Guides(table_area)
guide.vertical.divide(3)
guide.vertical.snap_to_whitespace(detection_method='text')
guide.horizontal.from_lines()
guide.show()
View full example →

OCR and AI magic

Master OCR techniques with Natural PDF - from basic text recognition to advanced LLM-powered corrections. Learn to extract text from image-based PDFs, handle tables without proper boundaries, and leverage AI for accuracy improvements.

)

# Shift them around so they don't overlap the text
guides.vertical.snap_to_whitespace(detection_method='text')

# add in horizontal lines in places where 80% of the pixels are 'used'
guides.horizontal.from_lines(threshold=0.8)
View full example →