1. MIME: Multipurpose Internet Mail Extensions The Imperial College of Computer Online Dictionary FOLDOC explains MIME as: "An encoding standard for multi-part, multimedia email and WWW hypertext, used to transmit non-text data such as graphics, sound and fax. MIME is defined in RFC1341 and uses the MIMENCODE method to convert binary data into a combination of characters of an ASCII subset called BASE64." There is a newsgroup on the Internet that specifically discusses MIME: comp.mail.mime. The FAQ for this newsgroup is available at the following URL: http://www.cis.ohio-state.edu/hypertext/faq/usenet/mail/mime-faq/mime0/faq.html MIMENCODE was first called MMENCODE. MIMENCODE was proposed to replace UUENCODE because UUENCODE uses some characters that cause transmission barriers in some mail gateways (especially those that convert ASCII and EBCDIC codes). (Some software cannot correctly decode all UUENCODE algorithms, resulting in difficulty in reading mails.) Therefore, MIME was designed to replace UUENCODE, but the result is that these protocols coexist. Before the introduction of MIME, only basic ASCII text information could be sent using RFC 822. It was very difficult to include binary files, sounds, animations, etc. in the email content. MIME provides a method to attach multiple differently encoded files to emails, making up for the shortcomings of the original information format. In fact, MIME is not just an email encoding, but now it has become a part of the HTTP protocol standard. 2. Introduction to MIME encoding The original reason for encoding emails was that many gateways on the Internet could not correctly transmit 8-bit coded characters, such as Chinese characters. The principle of encoding is to convert 8-bit content into 7-bit form so that it can be transmitted correctly, and then restore it to 8-bit content after the receiver receives it. Before the MIME protocol, email encoding had used UUENCODE and other encoding methods. However, due to the simplicity of the MIME protocol algorithm and its easy extensibility, it has now become the mainstream email encoding method. It is not only used to transmit 8-bit characters, but also to transmit binary files, such as images, audio and other information in email attachments, and has expanded many MIME-based applications. In terms of encoding, MIME defines two encoding methods: Base64 and QP (Quote-Printable). 1. Base64 encoding Base64 is a universal method, and its principle is very simple, that is, to express three bytes of data with four bytes. Of these four bytes, only the first 6 bits are actually used, so there is no problem of only being able to transmit 7-bit characters. The abbreviation for Base64 is generally "B". Base64 encodes the input string or a piece of data into a string containing only 64 characters {'A'-'Z', 'a'-'z', '0'-'9', '+', '/'}, and '=' is used for padding. The encoding method is to take 6 bits of the input data stream each time, use the value of this 6 bit (0-63) as an index to look up the table, and output the corresponding character. In this way, every 3 bytes will be encoded as 4 characters (3×8 → 4×6); the characters less than 4 characters are padded with '='. In some cases, “=?charset?B?xxxxxxxx?=” is used to indicate that xxxxxxxx is Base64 encoded and the character set of the original text is charset. Encode directly within the paragraph body, and wrap the line at appropriate times. MIME recommends a maximum of 76 characters per line. The Base64 algorithm is very simple. It puts the character stream sequentially into a 24-bit buffer and fills the missing characters with zeros. The buffer is then truncated into 4 parts, with the high bit first, each part is 6 bits, and re-represented with 64 characters. If the input consists of only one or two bytes, the output will be padded with an equal sign "=". This can prevent additional information from cluttering the encoding. How to do base64 encoding 2. QP Coding Another method is the QP (Quote-Printable) method, usually abbreviated as the "Q" method. Its principle is to represent an 8-bit character with two hexadecimal values and then add "=" in front. So we can see that the file after QP encoding usually looks like this: =B3=C2=BF=A1=C7=E5=A3= AC=C4=FA=BA=C3=A3=A1. Quoted-printable encodes the input string or byte range. If there are characters that do not need to be encoded, they are output directly. If encoding is required, output '=' first, followed by the hexadecimal byte value represented by 2 characters. In some cases, “=?charset?Q?xxxxxxxx?=” is used to indicate that xxxxxxxx is a quoted-printable encoding and the character set of the original text is charset. In the paragraph body, encode directly, wrap the line at the appropriate time, and output an additional '=' before the line break. 3. MIME header information Mail Header In the email header, there are many domain names inherited from RFC 822, and MIME also adds some. Common standard domain names and their meanings are as follows: Domain name meaning added by Received Mail servers at all levels of the transmission path Return-Path Reply Address Target Mail Server Delivered-To Sending address Target mail server Reply-To The reply address of the creator of the email From The sender's address is the creator of the email. To recipient address The creator of the email Cc address The creator of the email Bcc The creator of the blind copy address Date The date and time the message was created Subject The creator of the email Message-ID Message ID The creator of the email MIME-Version MIME version of the message creator Content-Type The type of content of the email creator Content-Transfer-Encoding Content transfer encoding method The creator of the email Non-standard, custom domain names all start with X-, such as X-Mailer, X-MSMail-Priority, etc. Their meaning is usually understood only when the program that receives and sends emails is the same. Section Header In the segment header, there are roughly the following fields: Domain Name Meaning Content-Type The type of the body Content-Transfer-Encoding The transfer encoding method of the segment body Content-Disposition: How to arrange the body of a paragraph Content-ID The ID of the segment Content-Location The location (path) of the body Content-Base The base position of the paragraph Some fields have parameters in addition to values. Values and parameters, and parameters and parameters are separated by ";". The parameter name and parameter value are separated by "=". 1.MIME-Version Indicates the version number of the MIME used, usually 1.0; like: MIME-Version: 1.0 2. Content-Type Content-Type defines the type of the body. We actually use this identifier to know what type of file is in the body. For example: text/plain means unformatted text, text/html means Html document, image/gif means gif format image, and so on. Content-Type is in the form of "main type/subtype". The main types are text, image, audio, video, application, multipart, message, etc., which respectively represent text, image, audio, video, application, segment, message, etc. Each main type may have multiple subtypes, such as the text type contains plain, html, xml, css and other subtypes. Main types and subtypes beginning with X- also indicate custom types that are not officially registered with IANA, but are mostly already commonly used. For example, application/x-zip-compressed is a ZIP file type. In Windows, most known Content-Types except multipart are listed in the registry's "HKEY_CLASSES_ROOT\MIME\Database\Content Type". There are many additional provisions in the RFC regarding the form of parameters. Some allow several parameters. The more common ones are: Main type parameter name meaning text charset character set image name application name multipart boundary multipart type Composite type commonly used in emails: multipart. The multipart type indicates that the text is composed of multiple parts, and the following subtypes describe the relationship between these parts. The three types used in emails are: (1).multipart/alternative: Indicates that the body of the message consists of two parts, and you can choose either one of them. The main function is that when the essay has both text format and html format, you can choose one of the two texts to display. Email client software that supports html format will generally display its HTML text, while those that do not support it will display its Text text. (2).multipart/mixed: Indicates that multiple parts of the document are mixed, referring to the relationship between the main text and attachments. If the MIME type of the email is multipart/mixed, it means the email has attachments. (3).multipart/related: Indicates that multiple parts of a document are related. It is generally used to describe the HTML text and its related images. The multipart type is the essence of MIME email. The email body is divided into multiple sections, each section consists of two parts: section header and section body, and these two parts are also separated by blank lines. The hierarchical relationship between them can be summarized as shown in the following figure: +------------------------- multipart/mixed ----------------------------+ | | | +------------------multipart/related ------------------+ | | | | | | | +----- multipart/alternative ------+ +----------+ | +------+ | | | | | | Embedded Resources| | | Attachments| | | | | +------------+ +------------+ | +----------+ | +------+ | | | | | Plain text body| | Hypertext body| | | | | | | +------------+ +------------+ | +----------+ | +------+ | | | | | | Embedded Resources| | | Attachments| | | | +----------------------------------+ +----------+ | +------+ | | | | | | +------------------------------------------------------+ | | | +----------------------------------------------------------------------+ It can be seen that if you want to add attachments to the email, you must define the multipart/mixed segment; if there are embedded resources, at least the multipart/related segment must be defined; if plain text and hypertext coexist, at least the multipart/alternative segment must be defined. What is “at least”? For example, if there is only plain text and a hypertext body, then expanding the type in the email header to define it as multipart/related or even multipart/mixed is allowed. The common feature of multipart types is that the "boundary" parameter string is specified in the segment header, and each sub-segment in the segment body is delimited by this string. All sub-segments start with a "--" + boundary line, and the parent segment ends with a "--" + boundary + "--" line. Paragraphs are also separated by blank lines. In the case of a multipart message body, there may be some additional text lines at the beginning of the message body (before the first "--" +boundary line), which are equivalent to comments and should be ignored during decoding. There can also be some additional lines of text between paragraphs, which will not be displayed. These composite types can be nested. For example, if an email has an attachment and a body in both HTML and text formats, the structure of the email is: Content-Type: multipart/mixed Part 1: Content Type : multipart/alternative: Text: HTML format text Part 2: appendix Mail terminator; Since the composite type consists of multiple parts, a delimiter is needed to separate the multiple parts. This is what the boundary in the email source file above describes. For each content of Contect type :multipart/*, there will be such a description to indicate the separation between multiple parts. When you view the source code of a MIME/BASE64-encoded email, it will generally contain a sentence like "This is a multi-part message in MIME format." It can also be decoded by most email programs, including Netscape, MS Mail, Eudora, etc. These programs can correctly identify the body of the email and restore the MIME/BASE64 encoded parts to the correct text or attached binary files. 3. Content-Transfer-Encoding It indicates how this part of the document is encoded. Only by recognizing this description can it be decoded using the correct decoding method. There are several types of Content-Transfer-Encoding, including Base64, Quoted-printable, 7bit, 8bit, Binary, etc. Among them, 7bit is the default encoding method. Email source code was originally designed to be in the form of all printable ASCII code. Non-ASCII text or data must be encoded into the required format. Base64, Quoted-Printable is the most widely used encoding method in non-English countries. The binary method is only symbolic and has no practical value. 4.boundary This delimiter is a combination of ancient characters that cannot appear in the text. In the document, "--" plus this boundary is used to indicate the beginning of a section. At the end of the document, "--" plus boundary and then "--" at the end are used to indicate the end of the document. Since composite types can be nested, there may be multiple boundaries in an email. |
<<: Display mode of elements in CSS
Transaction isolation level settings set global t...
Preface MySQL query uses the select command, and ...
Mysql auto-increment primary key id does not incr...
Table of contents 1. v-if 2. Use v-if on <temp...
I was playing with CentOS in a VMware virtual mac...
Portainer is a lightweight docker environment man...
Table of contents 1. What is an index? 2. Why do ...
Table of contents 1. Overview of MySQL Logical Ar...
Docker daemon socket The Docker daemon can listen...
Table of contents 1. Prepare data Create a data t...
MySQL 5.7.9 version sql_mode=only_full_group_by i...
1. Installation 1. Download Go to the MySQL offic...
Original link: https://vien.tech/article/138 Pref...
Ubuntu 20.04 has been released, bringing many new...
Table of contents Defining the HTML structure Inp...