right

383 usages across 6 PDFs

Complex Extraction of Law Enforcement Complaints

This PDF contains a set of complaint records from a local law enforcement agency. Challenges include its relational data structure, unusual formatting common in the region, and redactions that disrupt automatic parsing.

complainant = (
  section
  .find("text:contains(Complainant)")
  .right(until='text')
)
print("Complainant is", complainant.extract_text())
complainant.show(crop=100)
View full example →
dob = (
  section
  .find("text:contains(DOB)")
  .right(until='text')
)
print("DOB is", dob.extract_text())
dob.show(crop=100)
View full example →
complainant = (
  section
  .find("text:contains(Complainant)")
  .right(until='text')
)
dob = (
  section
View full example →

Extracting Business Insurance Details from BOP PDF

This PDF is a complex insurance policy document generated for small businesses requiring BOP coverage. It contains an overwhelming amount of information across 111 pages. Challenges include varied forms that may differ slightly between carriers, making extraction inconsistent. It has to deal with different templated layouts, meaning even standard parts can shift when generated by different software.

(
    page
    .find(text="POLICY NUMBER")
    .right(until='text')
    .show()
)
View full example →
(
    page
    .find(text="POLICY NUMBER")
    .right(until='text')
    .extract_text()
)
View full example →
    page
    .find(text="Mailing Address")
    .expand(bottom='text')
    .right()
    .extract_text()
)
View full example →

Extracting Data Tables from Oklahoma Booze Licensees PDF

This PDF contains detailed tables listing alcohol licensees in Oklahoma. It has multi-line cells making it hard to extract data accurately. Challenges include alternative row colors instead of lines ("zebra stripes"), complicating row differentiation and extraction.

region = (
    page
    .find(text="NUMBER")
    .right(include_source=True)
)
region.show(crop=100)
View full example →
headers = (
    page
    .find(text="NUMBER")
    .right(include_source=True)
    .expand(top=3, bottom=3)
    .find_all('text')
)
View full example →

Extracting Economic Data from Brazil's Central Bank PDF

This PDF is the weekly “Focus” report from Brazil’s central bank with economic projections and statistics. Challenges include commas instead of decimal points, images showing projection changes, and tables without border lines that merge during extraction.

(
    data
    .find('text:contains(2025)')
    .right(
        until='text:contains(2026)',
        include_source=True,
        include_endpoint=False
View full example →
table = (
    data
    .find('text:contains(2025)')
    .right(
        until='text:contains(2026)',
        include_source=True,
        include_endpoint=False
View full example →
table = (
    data
    .find('text:contains(2026)')
    .right(
        until='text:contains(2027)',
        include_source=True,
        include_endpoint=False
View full example →

Extracting Text from Georgia Legislative Bills

This PDF contains legal bills from the Georgia legislature, published yearly. Challenges include extracting marked-up text like underlines and strikethroughs. It has line numbers that complicate text extraction.

  page
  .region(right=70)
  .find_all('text')
  .right()
  .show(crop='wide')
)
View full example →
  page
  .region(right=70)
  .find_all('text')
  .right()
  .merge()
  .show(crop='wide')
)
View full example →
    page
        .region(right=70)
        .find_all('text')
        .right()
        .merge()
    )
)
View full example →

Natural PDF basics with text and tables

Learn the fundamentals of Natural PDF - opening PDFs, extracting text with layout preservation, selecting elements by criteria, spatial navigation, and managing exclusion zones. Perfect starting point for PDF data extraction.

# Extract text to the right of "Date:"
date = page.find(text="Date:").right(height='element')
date.show()
View full example →