I've just begun to dabble with embeddings and LLMs, but recently I've been thing...

I've just begun to dabble with embeddings and LLMs, but recently I've been thing about tryin to use principle component analysis[1] to either project to desirable subspaces, or project out undesirable subspaces.

In your case it would be to take a bunch of texts which roughly mean the same thing but with variance in tone, compute PCA of the normalized embeddings, take the top axsis (or top few) and project it out (ie subtract the projection) of the embeddings for the documents you care about before doing the cosine similarity.

Something along those lines.

Could be it's a terrible idea, haven't had time to do much with it yet due to work.

[1]: https://en.wikipedia.org/wiki/Principal_component_analysis