Complex Extraction of Law Enforcement Complaints
This PDF contains a set of complaint records from a local law enforcement agency. Challenges include its relational data structure, unusual formatting common in the region, and redactions that disrupt automatic parsing.
section
.find_all('text:contains(Complaint #)')
.right(include_source=True)
.merge()
.expand(top=5, bottom=7)
.show(crop=section)
)
View full example →
section
.find_all('text:contains(Complaint #)')
.right(include_source=True)
.merge()
.expand(top=5, bottom=7)
.extract_table()
.to_df(header=['Type of Complaint', 'Description', 'Complaint Disposition'])
View full example →
section
.find_all('text:contains(Complaint #)')
.right(include_source=True)
.merge()
.expand(top=5, bottom=7)
)
View full example →
Extracting Text from Georgia Legislative Bills
This PDF contains legal bills from the Georgia legislature, published yearly. Challenges include extracting marked-up text like underlines and strikethroughs. It has line numbers that complicate text extraction.
.region(right=70)
.find_all('text')
.right()
.merge()
.show(crop='wide')
)
View full example →
.region(right=70)
.find_all('text')
.right()
.merge()
)
)
sections.show()
View full example →