I don't think there is a way to reliably predict or analyse post-conversion string size that's less expensive than the actual conversion. For characters, it's known that they're at most 2 bytes each, but you won't know how many codepage-switches they'll necessitate without iterating over the Unicode-string and determining the language/cp of each character.
Anyway, I've now started working on strmanip and pretty much achieved what I set out to achieve, or so I thought. In theory, everything was 100% accurate, but when I tested it, LFS screwed me good.
The problem is that certain characters present in multiple codepages are rendered differently between them. For example, Simplified Chinese (S, CP936) contains 66 characters from the Cyrillic set, but their graphical representation in LFS is not identical to the actual Cyrillic (C, CP1251) one.
Here's a comparison screeny for the string 'ЁёАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя' (all 66 characters present in both C and S):
The first two lines are the S-version (134 bytes including '^S', hence 2 lines), the last line the C-one (68 bytes including '^C').
Obviously that renders strmanip's sort of "lazy" conversion useless, because it not only produces visually incorrect but also significantly larger results than it should. The problem is, while detecting this is relatively easy for L, E, T, B, C and G, it's not at all easy for J, H, S and K because these are unified in the Unicode standard as CJK Unified Ideographs.
I'm now in the process of rewriting strmanip entirely, not only because of the issues above but also because there are some reliability issues with Python 2.7's (haven't checked 3/3.2) native conversion mappings.
Among other things, fromUnicode() will determine the Unicode Block of a character and select the appropriate codepage based on that. It'll also attempt to guess which codepage to use for CJK input based on whether the majority of the string is kana (making it most likely Japanese) or hangul (Korean), otherwise it'll default to Chinese.
This is some complex stuff and I realise I've taken this a bit far off-topic, if it bothers anyone just split the topic
If all goes according to plan, I'll also be able to provide conversion mappings for PRISM, though obviously not to and from Unicode but rather UTF-8 directly, since PHP doesn't have native Unicode support (yet).