The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. (This is different from A, which means that the remaining bits are all zeros.) The example below illustrates how truncating the input of the above quote changes the output padding: The padding character is =, which indicates that no further bits are needed to fully encode the input. This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. If there is only one significant input octet (e.g., 'M'), or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits (12 bits) the four least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding two = padding characters):īecause Base64 is a six-bit encoding, and because the decoded values are divided into 8-bit octets, every four characters of Base64-encoded text (4 sextets = 4 × 6 = 24 bits) represents three octets of unencoded text or data (3 octets = 3 × 8 = 24 bits). If there are only two significant input octets (e.g., 'Ma'), or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits (18 bits) the two least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding = padding character): Using those four decimal numbers as indices for the Base64 alphabet, the corresponding ASCII characters are TWFu. Those 8 octal digits can be split into pairs ( 23 26 05 56), and each pair is converted to decimal to yield 19 22 05 46. For example, the hexadecimal representation of the 24 bits above is 4D616E. Such conversion is available for both advanced calculators and programming languages. Hexadecimal to octal transformation is useful to convert between binary and Base64. = padding characters might be added to make the last encoded block contain four Base64 characters. Groups of 6 bits (6 bits have a maximum of 2 6 = 64 different binary values) are converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.Īs this example illustrates, Base64 encoding converts three octets into four encoded characters.Įncoding of source characters M, a, n in Base64. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Encoded in ASCII, the characters M, a, and n are stored as the byte values 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. In the above quote, the encoded value of Man is TWFu. Here is a well-known idiom from distributed computing: The more typical use is to encode binary data (such as an image) the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes. The example below uses ASCII text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. This is the Base64 alphabet defined in RFC 4648 §4. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase. The earliest instances of this type of encoding were created for dial-up communication between systems running the same OS, for example, uuencode for UNIX and BinHex for the TRS-80 (later adapted for the Macintosh), and could therefore make more assumptions about what characters were safe to use. Other variations share this property but differ in the symbols chosen for the last two values an example is UTF-7. For example, MIME's Base64 implementation uses A– Z, a– z, and 0– 9 for the first 62 values. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not 8-bit clean. The general strategy is to choose 64 characters that are common to most encodings and that are also printable. The particular set of 64 characters chosen to represent the 64-digit values for the base varies between implementations.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |