apply

5 usages across 2 PDFs

Arabic Election Results Table Extraction from Mednine PDF

This PDF has a data table showing election results from the Tunisian region of Mednine. Challenges include spanning header cells and rotated headers. It has Arabic script.

import pandas as pd
dataframes = pdf.pages.apply(
    lambda page: page.extract_table().to_df(header=None)
)
print("Found", len(dataframes), "tables")
View full example →

Extracting Text from Georgia Legislative Bills

This PDF contains legal bills from the Georgia legislature, published yearly. Challenges include extracting marked-up text like underlines and strikethroughs. It has line numbers that complicate text extraction.

sections = pdf.pages.apply(lambda page: (
    page
        .region(right=70)
        .find_all('text')
View full example →
sections = pdf.pages.apply(lambda page: page.region(
    left=70,
    top=50,
    bottom=page.height - 100
View full example →