extract_table

64 usages across 7 PDFs

Animal 911 Calls Extraction from Rainforest Cafe Report

This PDF is a service call report covering 911 incidents at the Rainforest Cafe in Niagara Falls, NY. We're hunting for animals! The data is formatted as a spreadsheet within the PDF, and challenges include varied column widths, borderless tables, and large swaths of missing data.

table_result = guide.extract_table(pages, header="first")
df = table_result.to_df()
df.head()

View full example →

first_table = base.extract_table().to_df()
first_table.head()

View full example →

    guides = Guides(page)
    guides.vertical = base.vertical
    guides.horizontal.from_content(page.find_all('text:starts-with(NF-)'))
    single_df = guides.extract_table().to_df(header=columns)
    dataframes.append(single_df)
print("We made", len(dataframes), "dataframes")

View full example →

Arabic Election Results Table Extraction from Mednine PDF

This PDF has a data table showing election results from the Tunisian region of Mednine. Challenges include spanning header cells and rotated headers. It has Arabic script.

df = flow.extract_table().to_df(header=None)
df

View full example →

import pandas as pd
dataframes = pdf.pages.apply(
    lambda page: page.extract_table().to_df(header=None)
)
print("Found", len(dataframes), "tables")

View full example →

Complex Extraction of Law Enforcement Complaints

This PDF contains a set of complaint records from a local law enforcement agency. Challenges include its relational data structure, unusual formatting common in the region, and redactions that disrupt automatic parsing.

    .right(include_source=True)
    .merge()
    .expand(top=5, bottom=7)
    .extract_table()
    .to_df(header=['Type of Complaint', 'Description', 'Complaint Disposition'])
)

View full example →

# Use the guides
(
  table
  .extract_table(verticals=guides.vertical)
  .to_df(header=['Type of Complaint', 'Description', 'Complaint Disposition'])
)

View full example →


(
  table
  .extract_table(verticals=guides.vertical)
  .to_df(header=['Name', 'ID No.', 'Rank', 'Division', 'Officer Disposition', 'Action Taken', 'Body Cam'])
)

View full example →

    columns = ['Name', 'ID No.', 'Rank', 'Division', 'Officer Disposition', 'Action Taken', 'Body Cam']
    officer_df = (
      table
      .extract_table(verticals=guides.vertical)
      .to_df(header=columns)
    )

View full example →

Extracting Economic Data from Brazil's Central Bank PDF

This PDF is the weekly “Focus” report from Brazil’s central bank with economic projections and statistics. Challenges include commas instead of decimal points, images showing projection changes, and tables without border lines that merge during extraction.

(
    sections[0]
    .expand(top=-50, right=0)
    .extract_table('stream')
    .to_df(header=False)
    .dropna(axis=0, how='all')
)

View full example →

dataframes = sections.apply(lambda section: (
    section
        .expand(top=-50, right=0)
        .extract_table('stream')
        .to_df(header=False)
        .dropna(axis=0, how='all')
        .assign(

View full example →

df_2025 = table.expand(top=-5).extract_table('stream').to_df(header=False)
df_2025

View full example →

df_2026 = table.expand(top=-5).extract_table('stream').to_df(header=False).dropna(axis=0, how='all')
df_2026.insert(0, 'year', 2026)
df_2026.insert(0, 'value', headers)
df_2026

View full example →

    .expand(top=-20)
    .clip(data)
)
df_2027 = table.expand(top=-5).extract_table('stream').to_df(header=False).dropna(axis=0, how='all')
df_2027.insert(0, 'year', 2027)
df_2027.insert(0, 'value', headers)
df_2027

View full example →

    .expand(top=-20)
    .clip(data)
)
df_2028 = table.expand(top=-5).extract_table('stream').to_df(header=False).dropna(axis=0, how='all')
df_2028.insert(0, 'year', 2028)
df_2028.insert(0, 'value', headers)
df_2028

View full example →

Natural PDF basics with text and tables

Learn the fundamentals of Natural PDF - opening PDFs, extracting text with layout preservation, selecting elements by criteria, spatial navigation, and managing exclusion zones. Perfect starting point for PDF data extraction.

table = page.extract_table()
if table:
    df = table.to_df()
    print(df.head())

View full example →

OCR and AI magic

Master OCR techniques with Natural PDF - from basic text recognition to advanced LLM-powered corrections. Learn to extract text from image-based PDFs, handle tables without proper boundaries, and leverage AI for accuracy improvements.

page.extract_table()

View full example →

df = guides.extract_table().to_df()
df

View full example →

Working with page structure

Extract text from complex multi-column layouts while maintaining proper reading order. Learn techniques for handling academic papers, newsletters, and documents with intricate column structures using Natural PDF's layout detection features.

regions[0].extract_table().to_df()

View full example →

# Combine them if we want
import pandas as pd

dfs = regions.apply(lambda region: region.extract_table().to_df())
merged = pd.concat(dfs, ignore_index=True)
merged

View full example →

data = page.find('table').extract_table()
data

View full example →

guides.extract_table().to_df()

View full example →