Week notes — 7th May 2024
I’ve been experimenting with how LLMs could be applied to making Local Plans more understandable, perhaps for a non-expert audience, or perhaps for an expert audience to quickly get a systemic view of the plans.
My goal is learn more about applying LLMs, but also to quickly get to demoable iterations. Secondarily, I want the demos to run on Heroku, a prototyping platform which seems widespread in central government.
I’ve previously demoed a very rough version of this, demonstrating some very simple topic moddeling and allowing users to do comparisons between Local Plans.
Over the last week I investigated the following:
- My previous prototype extracted text from PDFs using PyPDF2 which lead to variable results and had no understanding of the PDF structure. Local Plans are highly structured documents, often individual paragraphs mean nothing without titles. I tried the Unstructured library’s PDF parser, it’s not loads better. In particular text seems to end up out of sequence, especially from tables. I’m interested in trying other approaches.
- I’ve decided I want to use Postgres as a vector store so It can run on Heroku. Having experimented with a manual approach, I’m going to try Langchain’s support for Postgres as a vector store.
- While previously I’ve been working with ~140 Local Plans, I’ve decided to focus down to just Newham’s, which is where I live. While I’m still interested in the inter-relation between documents, I’m thinking the I’ll look for connections between Newham’s Local Plan itself, the National Planning Policy Framework, and possibly other related documents.
- I’ve somewhat settled on a structure for the project, with a Heroku Python server with the Postgres instance attached for the API, another running Node for the Prototype Kit front end.