Week notes — 5th June 2024

2 min readJun 4, 2024

This week I looked at topic modelling for neighbourhood plans, following on from my previous work building an interface to query them.

Topic modelling — algorithmically detecting what topics are discussed in each neighbourhood plan, could be useful for thinking about neighbourhood plans at a more systemic level. Do certain traffic policies match with certain types of road? If you wanted to create a cycle route, could you use information about which neighbourhoods have sustainable transport policies to help plan the route?

I already have the plans chunked into policies, so as a first pass I tried to see if the policies would form meaningful clusters. I tired PCA, t-SNE, UMAP and no dimensionality reduction; and K-means and DBSCAN for clustering, none of them produced useful results. Ploughing through algorithms probably isn’t the most nuanced approach, but its so quick to do.

The most semantically meaningful clusters emerged around place names, ie. it clustered policies mentioning Abbots Langley, another for Totnes etc — information I already have in the document metadata.

I also tried removing all the place names to see if the different clusters would emerge — no dice. This required re-vectorising the text using BERTopic, but it didn’t help.

My next idea is to give up on vectorised approaches and start having a look at word occurrence — many policies are distinguished by semantically similar words with critical small differences, ie. ‘green belt’ and ‘green amenity space’ may well be very close together in vector space but actually describe importantly distinct policies.

Other strategies might include looking at document-level analysis rather than policy level.

Week notes — 5th June 2024

Written by Jimmy Tidey