Why Files Become Bigger in Emails - Computerphile

Watch on YouTube (Embed)

Show annotations

50,695

2,404

28

Genre: Education

Family friendly? Yes

Wilson score: 0.9834

Rating: 4.9539 / 5

Engagement: 4.7973%

Computerphile

Subscribe | 1.94M

Shared April 30, 2021

To send binary files via a text based system, they'll need encoding. Dr Steve Bagley takes us through the attachment system used in email.

name change, formerly "Why Attachments are Larger in Emails"

https://www.facebook.com/computerphile
https://twitter.com/computer_phile

This video was filmed and edited by Sean Riley.

Computer Science at the University of Nottingham: https://bit.ly/nottscomputer

Computerphile is a sister project to Brady Haran's Numberphile. More at http://www.bradyharan.com

jeromesnail

I've always wondered what were those equal signs at the end of a base64 encoded string :)

1 month ago (edited) | [YT] | 87

DeeEfThirteen

Haha I experienced this exact problem many years ago when attaching a 20 MB file, clearly within the limits of Gmail's 25 MB limit, and it kept getting rejected. Fast-forward hours of debugging, sending myself test e-mails, and analyzing headers... I discovered how grossly antiquated e-mail systems are, even to this day.

1 month ago | [YT] | 103

iammaxhailme

I wish emails would go back to being 78 characters. I'd get a lot more work done if I didn't have to read so much junk.

1 month ago | [YT] | 15

Alan Canon

This is a real conundrum. I've generated email attachments programmatically recently enough that I'm not sure I need this trip down memory lane... oh, who am I kidding? [presses play].

1 month ago | [YT] | 16

pev

Yeah, I remember using uuencode/decode for some things in emails and newsgroups in my university years in the 90's. The Unix email programs did not separate between main text and attachments, so the uuencoded part usually was just seen after the end of the natural language part as a large block of random-looking characters of equal line length :)

1 month ago | [YT] | 29

Zeda Thomas

I learned how to encode emails last week so that I could write a program to make it easier to attach files when sending emails from our server :D It was so satisfying to send a pdf and have it come out right.

1 month ago | [YT] | 6

EddyGurge

I could really have used this video some 30 or so years ago! I still really enjoyed it :)

1 month ago | [YT] | 5

Pasan K

Engineer - use email for text, ftp for files. User - I will put my cat videos in email thank you!

1 month ago | [YT] | 40

Christian Rickert

"don't make eye contact, don't make eye contact ..." 😆

1 month ago | [YT] | 5

BasedPeter

Phillip Seymour Hoffman still alive and even younger i see!

1 month ago | [YT] | 4

Idjles Erle

I used base64 back in 1989, and I still use it everyday in my job today to push binary via JSON or XML.

1 month ago | [YT] | 9

Tony Bright

E-mail systems sent 7bit text. So binaries are (or were, I've not kept up with this) converted to 7 bit text characters. Three 8bit bytes become 4 7 bit text characters. Therefore, attachments expand when sent via e-mail.

But that might be some "okay Boomer" as this is how it was back in the 1990s when I administered Unix based sendmail SMTP gateways.

1 month ago | [YT] | 11

Martin Bean

I love these videos. Watch for one thing and accidentally learn how the Base 64 algorithm works. Used it for nearly 15 years, but never bothered to look into the algorithm under the hood. Summed up in an easy to understand explanation in seconds in this video!

1 month ago | [YT] | 0

wheat_blazer

So sum up a 18 min video in 2 words: it's base64

1 month ago | [YT] | 212

Ethan C

If we are blasted back into the Stone Age from ww3 I really would hope somebody at intel or tsmc has a physical instruction set in how to build pc parts from scratch. Because there is no chance that the average undergrad student at uni could ever come close to being as efficient and effective as we are currently…

1 month ago | [YT] | 0

Bar Rotem

I'm so glad I'm subscribed.

1 month ago | [YT] | 2

Olivier Nicole

Cool, I did not know about the = at the end, but I never cared to look either:) Thanks

1 month ago | [YT] | 5

Omnifarious

You need to talk about the matter transport mime type. Also, you have to worry about the email going through some really bizarre re-encodings. Sometimes, for example, email would go through a system that used one of the variants of EBCDIC. And you went on to mention that. 🙂

1 month ago (edited) | [YT] | 1

PrivateSi

What happens when you build standards up from legacy standards instead of starting from scratch.. Having said that, my 'Uni-Text' protocol wish is a compressed, tokenised format that extends Unicode by sacrificing control characters. I know Unicode tried to reuse them for glyphs and rightfully failed, but I reuse them to indicate the next few bytes until a byte >127 is found form a standard, global dictionary index.
--
The ironic thing is, all internet end user devices have built in dictionaries that take up less space than a 'font family' that includes all unicode glyphs. Unicode hugely bloated text transmissions, UNITEXT solves this, while enabling far faster word matching, spell checking and thesaurus in a GLOBAL TEXT STANDARD.. It's easier to encrypt too, and can still benefit from the predictive compression built into some network subsystems.. A local dictionary extension is also possible for more compression, by using extra control codes (0..31).
--
8 local, 8 global control characters provide the first 3 bits of the index
5 extra used to indicate:
'No Space' (as default is to add a space)... means 'overlay next glyph over last' if preceding a glyph code, not a word index
Caps First, ALL CAPS, and No Space+Caps First, No Space + ALL CAPS..
This is appended to the start of the word and stored in the local dictionary as a word index + char code.
Words are stored backwards in a fast dictionary lookup tree which keeps the size of the tree down.

1 week ago | [YT] | 0

Barry Brevik

I feel that this video is perhaps useful, because base64 with MIME encoding is *still*, in 2021, the way that email attachments are sent.

1 month ago | [YT] | 4