Arabic Election Results Table Extraction from Mednine PDF
This PDF has a data table showing election results from the Tunisian region of Mednine. Challenges include spanning header cells and rotated headers. It has Arabic script.
import pandas as pd
dataframes = pdf.pages.apply(
lambda page: page.extract_table().to_df(header=None)
)
print("Found", len(dataframes), "tables")
View full example →
Extracting Text from Georgia Legislative Bills
This PDF contains legal bills from the Georgia legislature, published yearly. Challenges include extracting marked-up text like underlines and strikethroughs. It has line numbers that complicate text extraction.
sections = pdf.pages.apply(lambda page: (
page
.region(right=70)
.find_all('text')
View full example →
sections = pdf.pages.apply(lambda page: page.region(
left=70,
top=50,
bottom=page.height - 100
View full example →