Week notes — 14th May 2024

Jimmy Tidey
2 min readMay 14, 2024

--

Continuing exploring how LLMs can be used to understand planning documents, centred on Local Plans — the documents local councils are obliged to publish.

Hypothesis: I want to look beyond the chatbot application of LLMs to consider how they might be used to restructure complex documents.

Local Plans respond to a framework set out in the National Planning Policy Framework document, however, the connection is loose. Can I decompose individual Local Plans into the topics in the National Planning Policy Framework? One benefit to being able to restructure Local Plans to the NPPF topics would be to make two different Local Plans comparable.

Evaluation method: Right now, evaluation is through eyeballing. Proper evaluation could be through asking Planning Professionals to provide structured feedback, or, potentially finding ways to get and LLM to asses the quality of the outcome.

Setup: I’m now using PDFSherpa to chunk Camden’s Local Plan (No reason to use Camden as an example except that I know they have really good planning data that I may want to link to). PDFSherpa is moderately good at extracting parts of PDFs by section, providing each paragraph from the PDF tagged with a hierarchy of ‘headings’ the paragraph comes under (eg. ‘Camden Local Plan >> Sustainability >> Recycling’). It does sometimes make bizarre choices about what constitutes a heading, including in one case making a tiny footnoted photography credit into a major section heading. The paragraphs it fragments documents into are also very small, and probably need recombining into larger pieces. Overall it demonstrates better performance than the MuPDF and Unstructured libraries.

I manually extracted headings from the National Planning Policy Framework. I then used vector search to retrieve paragraphs from the Camden Local Plan that link to each heading. Paragraphs from Camden’s Local Plan are vector encoded with the heading they came under — paragraphs without heading context are often close to meaningless.

Results: It’s clear that this approach can extract relevant sections from the Camden Local Plan and associate them with relevant headings from the NPPF. It’s not clear it finds all the relevant paragraphs. The term ‘sustainable’ seems to cause particular problems, because it’s used in so many ways.

Next steps: I’d like to start geting a feel for how easy it is to compare restructured Local Plans from two Local Authorities.

--

--

Jimmy Tidey

PhD on digital systems for collective action and social network analysis. jimmytidey.co.uk/blog