Make your own free website on Tripod.com

Japanese Encodings

There are several Japanese encodings, but the great thing about them, is that you can directly convert one into the other. The reason for this, is that the layouts of the Encoding Tables are in almost exactly the same order for each.

We consider the Japanese double byte encoded character [DBEC] (XY) in two portions, the front byte X and the end byte Y.

But these X and Y portions of the DBEC are in ascii form. We need it in something easier to understand, hexadecimal (base 16). With hexadecimal, all you need are the numbers 0123456789abcdef. As the name implies hexadecimal contains 16 base bumbers. If you join these together you will get 16x16 possible combinations, that is 256 individual numbers, decimal (base 10) value zero being 00 and 256 being ff.


 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f

 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f

 20 21 .. .. .. .. .. .. .. .. .. .. .. .. .. 2F

 30 31 .. .. .. .. .. .. .. .. .. .. .. .. .. 3f

 40 41 .. .. .. .. .. .. .. .. .. .. .. .. .. 4f

 50 51 .. .. .. .. .. .. .. .. .. .. .. .. .. 5f

 60 61 .. .. .. .. .. .. .. .. .. .. .. .. .. 6f

 70 71 .. .. .. .. .. .. .. .. .. .. .. .. .. 7f

 80 81 .. .. .. .. .. .. .. .. .. .. .. .. .. 8f

 90 91 .. .. .. .. .. .. .. .. .. .. .. .. .. 9f

 a0 a1 .. .. .. .. .. .. .. .. .. .. .. .. .. af

 b0 b1 .. .. .. .. .. .. .. .. .. .. .. .. .. bf

 c0 c1 .. .. .. .. .. .. .. .. .. .. .. .. .. cf

 d0 d1 .. .. .. .. .. .. .. .. .. .. .. .. .. df

 e0 e1 .. .. .. .. .. .. .. .. .. .. .. .. .. ef

 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff

What distinguishes the various encodings is the point in the 256 by 256 square matrix that the characters appear in. Some areas of the square are unsuitable, because the first byte of the character is an ordinary key sequence that you and I can just type in. Its the combination of double byte characters which cannot easily access via the keyboard that is the secret of all the encodings currently available. So imagine, 256 x 256 is 65536; only a small portion of this array can be used in any encoding at one time.


JIS (DEC & HEX) EUC and SHIFT JIS.



===========Converting the Front Part of the JIS <=> EUC 

                                            JIS <=> S-JIS

                             JIS DEC 01** - 83**

                             JIS HEX 21** - 73**

                                 EUC a1** - f3**

                               S-JIS 81** - ea**



JIS DEC   01 02 03 04 05 06 07

JIS HEX   21 22 23 24 25 26 27

    EUC   a1 a2 a3 a4 a5 a6 a7 

  S-JIS   81 81 82 82 83 83 84



JIS DEC   16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

JIS HEX   30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f

    EUC   b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf 

  S-JIS   88 89 89 8a 8a 8b 8b 8c 8c 8d 8d 8e 8e 8f 8f 90



JIS DEC   32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

JIS HEX   40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f

    EUC   c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf 

  S-JIS   90 91 91 92 92 93 93 94 94 95 95 96 96 97 97 98 



JIS DEC   48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

JIS HEX   50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f

    EUC   d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df 

  S-JIS   98 99 99 9a 9a 9b 9b 9c 9c 9d 9d 9e 9e 9f 9f e0





JIS DEC   64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

JIS HEX   60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f

    EUC   e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef 

  S-JIS   e0 e1 e1 e2 e2 e3 e3 e4 e4 e5 e5 e6 e6 e7 e7 e8



JIS DEC   80 81 82 83

JIS HEX   70 71 72 73

    EUC   f0 f1 f2 f3 

  S-JIS   e8 e9 e9 ea





===========Converting the End Part of the JIS DEC <=> JIS HEX

                                          JIS <=> EUC 



 

                             JIS DEC **00 - **95

                             JIS HEX **a0 - **ff

                       End Part EUC =  End Part JIS HEX



      E   |     JIS DECIMAL -  UNITS

        U | *0 *1 *2 *3 *4 *5 *6 *7 *8 *9 

 -------- C -----------------------------

 J   **0* | a0 a1 a2 a3 a4 a5 a6 a7 a8 a9

 I T **1* | aa ab ac ad ae af b0 b1 b2 b3

 S   **2* | b4 b5 b6 b7 b8 b9 ba bb bc bd

 D E **3* | be bf c0 c1 c2 c3 c4 c5 c6 c7

 E   **4* | c8 c9 ca cb cc cd ce cf d0 d1 

 C N **5* | d2 d3 d4 d5 d6 d7 d8 d9 da db

 I   **6* | dc dd de df e0 e1 e2 e3 e4 e5

 M   **7* | e6 e7 e8 e9 ea eb ec ed ee ef 

 A S **8* | f0 f1 f2 f3 f4 f5 f6 f7 f8 f9

 L   **9* | fa fb fc fd fe ff

The hex JIS end portion is exactly the same as in EUC



The S-JIS encoding follows the same pattern as the EUC and JIS encoding. However, what is strange is that the character groupings are spread with wide gaps in the code and then the whole thing resumes at some other codepoint. We can see this in the S-JIS transition 9f** which then carries on at e0** Also, whereas DEC JIS are grouped so that each group maps onto the HEX EUC as a group itself, two JIS groups make one S-JIS groups. One of these JIS groupings is odd numbered, and the other is even numbered, and this predetermines the outcome of the end parts. For odd numbered JIS numbered groups, the endings @@ of **@@ appears in the following two sets of code


===========JIS to S-JIS conversion for the end part

                      





===========For ODD fronted JIS numbers

      *0 *1 *2 *3 *4 *5 *6 *7 *8 *9 

**0*     40 41 42 43 44 45 46 47 48 

**1*  49 4a 4b 4c 4d 4e 4f 50 51 52 

**2*  53 54 55 56 57 58 59 5a 5b 5c

**3*  5d 5e 5f 60 61 62 63 64 65 66

**4*  67 68 69 6a 6b 6c 6d 6e 6f 70

**5*  71 72 73 74 75 76 77 78 79 7a

**6*  7b 7c 7d 7e 80 81 82 83 84 85 

**7*  86 87 88 89 8a 8b 8c 8d 8e 8f 

**8*  90 91 92 93 94 95 96 97 98 99 

**9*  9a 9b 9c 9d 9e 



===========For EVEN front part JIS numbers

      *0 *1 *2 *3 *4 *5 *6 *7 *8 *9 

**0*     9f a0 a1 a2 a3 a4 a5 a6 a7 

**1*  a8 a9 aa ab ac ad ae af b0 b1 

**2*  b2 b3 b4 b5 b6 b7 b8 b9 ba bb 

**3*  bc bd be bf c0 c1 c2 c3 c4 c5 

**4*  c6 c7 c8 c9 ca cb cc cd ce cf 

**5*  d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 

**6*  da db dc dd de df e0 e1 e2 e3 

**7*  e4 e5 e6 e7 e8 e9 ea eb ec ed 

**8*  ee ef f0 f1 f2 f3 f4 f5 f6 f7 

**9*  f8 f9 fa fb fc 



You will notice that 7f is missing, and is part of the design for reasons unknown to me.

Using Kuni, Goku example previously. The characters are in Shift-JIS


       Decimal    Hexadecimal EUC    S-JIS

         JIS       JIS



X     2581     39f1         c7f1   8d91  

Y     5191     53fb         d5fb   9a9b  

Z     5202     54a2         d6a2   9aa0  

In example X and Y, the front parts were 25** and 51** respectively. Note these are both odd numbers. In example Z, the front part was 52**, which is evens. Looking up the S-JIS in the above tables gives us 8d**, 9a** and 9a** for X,Y and Z respectively.

To find the end parts, the odd numbered JIS front parts mean that the end parts must be looked under the ODDs table. So, for X and Y, they are 91 and 9b respectively. Because Z has an even front part in the JIS code, we have to look under EVENS for the end part of Z, 02, which gives a0.


Index

© Dylan W.H. Sung 1999.