Also note that bash accepts line breaks in quoted strings and the base64 utility has an "ignore garbage" option that lets it skip over e.g. whitespace in its input. You can use those to break up the base64 over multiple lines:
An even simpler way would be to include a marker to denote the end of the shell script, and the start of the data. For example, if you put this in extract.sh
#!/bin/sh
sed -E '1,/^START-OF-TAR-DATA$/d' "$0" | tar xvzf -
exit
START-OF-TAR-DATA
You can then run foobar.tar.gz.sh to self-extract. And you still get the benefit of being able to modify the shell script without needing to count lines or characters without sacrificing any compression.
There will be only one that is neither preceded by indentation nor followed by an exit code, so that it could match ^exit$, unless you contrive some hypothetical nonsense purely for the sake of contrarianism.
Any reasonable person will indent a conditional exit within the block testing its condition, and more than one unconditional exit doesn't make sense.
Is there an encoding that is less wasteful that base64 but not vulnerable to text editor corruption issues? I think avoiding 0x0 to 0x20 should be enough to not get corrupted by text editors, though base64 avoids a lot more than that.
If you can count on every printable ascii character being not-mangled, you can use ascii85/base85/Z85 (5 "ascii characters" to 4 bytes) instead of base64.
While a couple of people suggested Base65536, that encoding isn't particularly compact, and it can't be as elegant as 65536 would suggest because it has to dodge special cases in unicode.
It's almost always the case that either Base32768 is denser, or encodings with 2^17 or 2^20 characters are denser.
if you mean the thing you want to encode is mostly-ascii, then https://en.wikipedia.org/wiki/Quoted-printable ... it's a real throwback, I've not seen this in the wild since the 90s, but it's there in the python standard library (quopri), perl (MIME::QuotedPrint) etc
As I understood, you base64 the zipped data on input and the other way around on output.
The reasoning being that the base64'd binary data is safe from being corrupted when the file is edited in text editors, as a response to the warning stated on the last paragraph of the original post.
The idea is to first zip the binary, then base64 the zipped data. Conversely, the script first decodes the base64 to a zipped binary, then unzips the binary.
It's just to mitigate the wastefulness of base64 encoding. You end up with a file that is text editor friendly and not quite as bloated as directly encoding the binary would be - but of course the file is still larger than simply appending the binary directly like in the OP.
Also, if you don't care about text editor friendlyness, you could indeed just zip the binary and then append it to the script for an even smaller file.
Or add compression to reclaim at least some of the wasted space:
Also note that bash accepts line breaks in quoted strings and the base64 utility has an "ignore garbage" option that lets it skip over e.g. whitespace in its input. You can use those to break up the base64 over multiple lines: