實驗三相關問題--信件內容編碼問題

作者Psycap (siraPcitsatnaF)

看板b94902HW

標題實驗三相關問題--信件內容編碼問題

時間Mon Apr 28 10:48:46 2008

※ [本文轉錄自 CSIE_R221 看板] 作者: Psycap (siraPcitsatnaF) 看板: CSIE_R221 標題: 實驗三相關問題--信件內容編碼問題時間: Mon Apr 28 10:42:32 2008 參考以下網址 http://www.free-newslettertemplates.com/smtp.html A character set is simply a mapping of byte values to characters. The most common character set is US-ASCII, which has 32 (non-printable) control characters and 96 (mostly printable) other characters, for a total of 128. These 128 characters can be encoded in 7 bits of data, so each 8-bit byte representing one of these characters has the lower 7 bits set to the appropriate value for the given character and the 8th (high) bit set to zero. US-ASCII is therefore considered a single-byte 7-bit character set. Many European languages have accentuated characters (like the German ü, the French ç and é, the Danish ø and the Spanish ñ). Such languages are commonly represented by characters sets whose lower half (i.e., values 0 - 127) are identical to those of US-ASCII, and whose upper half (i.e., values 128 - 255) represent these accentuated characters. These are therefore considered single-byte 8-bit characters sets; an example is ISO-8859-1. Many Asian languages have so many characters that they need multiple bytes to represent them all. They are therefore considered multiple-byte character sets. Do note that not only the sending mail program must be able to configure this correctly. The receiving program or product must also support the character set used. Otherwise an error will occur and the most common problem is badly formated email content. Headers & Bodies Each message consists of two parts. The headers contain information about who authored the message, the intended recipients, the time of creation, the subject of the message, delivery stamps, ... Each header is of the form "keyword: value", where keyword is a special word (like From or Date) identifying the type of information contained in that header, and value is the information itself. A blank line always separates the headers from the body. The body contains the information the sender is trying to communicate. The "message" as most people think of it is really the body of the message. MIME For many years, most messages were plain text in the US-ASCII character set, so no structure was needed for message bodies. The explosion of messaging in Europe and Asia in the mid 1990s and that of transmission of multi-media messages in the late 1990s brought about such a need. Content-Type: text/plain; charset=us-ascii indicates that the message consists of plain text in the US-ASCII character set. MIME also specifies how to encode data when necessary (more on this below). It is the responsibility of the receiving user agent to use this information to display the message in a form that will be understood by the user. Transfer Protocols The language spoken between transfer agents is known as a transfer protocol. There are many in existence; the most common is Simple Mail Transfer Protocol. Envelopes and Bodies SMTP uses the concept of an envelope to transfer messages; this merely contains information about from whom the message originated and to whom it is destined. The originator address is important: in case there is a problem transferring or delivering the message, the originator can be notified. The SMTP body is the entire message as defined above in Headers & Bodies. So the message headers plus the message body equals the SMTP body. The term SMTP body is not used that commonly, but it is important to distinguish it from the message body. 7-bit data vs. 8-bit data For historical reasons relating to the US-ASCII character set, SMTP is a 7-bit protocol, which means it limits bytes of data sent to use only the low-order 7-bits. If the 8th (high) bit of a byte is set, SMTP dictates that the bit must be zeroed out. In order for a message containing 8-bit data to be transferred without data loss, the message must first be encoded into 7-bit data. As most early e-mail users spoke English, however, and most computers used the 7-bit US-ASCII character set, this was not a problem. By the 1990s, however, several factors had increased the need for 8-bit message transfer. As mentioned above, European languages often use 8-bit character sets, and Asian language character sets often require multiple bytes; their transmission is greatly simplified if all 8 bits can be transferred unaltered. Finally, the explosion of multi-media messages like audio and video clips have brought about a two-fold need for 8-bit message transfer: encoding messages into 7-bit data is not only cumbersome, but the resultant encoded message is significantly (typically 33%) larger than the original message. To meet this need, SMTP has been extended to allow 8-bit data to be properly transferred between consenting transfer agents. The negotiating process used to verify consent is specified in RFC 1869, which describes the general extension mechanism to SMTP (called ESMTP), and RFC 1652, which describes the specific extension to allow 8-bit data transfer, called 8BITMIME. If a transfer agent has a message containing 8-bit data and it cannot negotiate the proper transfer of that data, it must either encode the message into 7-bit data using MIME, or return the message to the sender indicating the reason for the return. It is no coincidence that MIME and ESMTP have common rationales and goals; they were developed in conjunction with each other towards the same end. -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.112.90.247 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.112.90.247