Need Advise,micropython gb2312 encoding bug? Thank you

General discussions and questions abound development of code with MicroPython that is not hardware specific.
Target audience: MicroPython Users.
Post Reply
stategrid
Posts: 2
Joined: Fri Feb 07, 2020 6:57 am

Need Advise,micropython gb2312 encoding bug? Thank you

Post by stategrid » Fri Feb 07, 2020 7:36 am

Hi guys,I‘m a newbie of micropython。
I have ssd1306 OLED display with GT20L16S1Y spi font chip,since micropython officially don’t support GT20L16S1Y 。so I begin to wite driver for font chip 。Accroding to GT20L16S1Y data sheet,GT20L16S1Y is a spi chip。Chinese gb2312 encoded data store in it。

gb2312 = text.encode('gb2312')
# add this line
print("len(gb2312):",len(gb2312))


on pc the rusult is:
"len(gb2312):",2

on micropython the rusult is:
"len(gb2312):",3

Is it a bug of micropython?How can i fix it?Thank you for your help
#############################################
following guide:
https://github.com/pengfexue2/printPlay ... intPlay.py

code run perfect on pc

Code: Select all

def printPlay(textStr,line,background):
    for text in textStr:

        # get chinese  gb2312 encode,each chinese  gb2312 character contains 2 bytes
        gb2312 = text.encode('gb2312')

      # add this line
        print("len(gb2312):",len(gb2312))
on pc the rusult is:
"len(gb2312):",2
#############################################
then modified code and run on ESP32 micropython 1.12 version 20200206 ,

Code: Select all

    def getAddr2(self,chn_str):
        global  Address 

         #encoding chines in 'gb2312'

        print("input is:",text)

        gb2312 = text.encode('gb2312')
        print("gb2312:",gb2312)
        print("len(gb2312)",len(gb2312))

###on  micropython  the rusult is:
####"len(gb2312):",3

        #get hex of gb2312
        hex_str = ubinascii.hexlify(gb2312)
        print("hex_str:",hex_str)
        print("len(hex_str)",len(hex_str))
 
        # get area


 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1   0

        k= hex_str[:2]
        kkk=str(k,'utf8')
        l=hex_str[2:]
        lll=str(l,'utf8')
        print("kkk: ",kkk)
        print("lll: ",lll)
        hb = eval('0x' + bytes.decode(hex_str[:2])) - 0xA0
          #get index 
        lb = eval('0x' + bytes.decode(hex_str[2:])) - 0xA0

        print("HB:")
        print(hb)
        print("LB:")
        print(lb)
        #predefined,or it go wrong
        Address = 0x00

        if (hb > 0xA0  and  lb > 0xA0): ##is GB2312
            BaseAdd = 0
        #/*chines standard(GB2312)15x16 in font IC ,to calculate Address of each character ,
        #following this formula:*/
        #/*Address = ((MSB - 0xa1) * 94 + (LSB - 0xA1))*32+ BaseAdd BaseAdd=0*/
        #/*8bit mcu overlap,so in 3 step*/
        if (hb == 0xA9  and  lb >= 0xA1):
            Address = (282 + (lb - 0xA1 )) ##8 bit chip run whith error,so in 3 step and work well
            Address = Address * 32
            Address += BaseAdd
        elif (hb >= 0xA1  and  hb <= 0xA3  and  lb >= 0xA1):
            Address = ((hb - 0xA1) * 94 + (lb - 0xA1))
            Address = Address * 32
            Address += BaseAdd
        #16~87  (0xb0~0xf7) :chinese characters
        elif (hb >= 0xB0  and  hb <= 0xF7  and  lb >= 0xA1):
            Address = ((hb - 0xB0) * 94 + (lb - 0xA1) + 846)
            Address = Address * 32
            Address += BaseAdd
        #01~09 :symbol、numers
        else: ##is ASCII
            BaseAdd = 0x03b7c0
            if (lb >= 0x20  and  lb <= 0x7E):
                Address = (lb - 0x20 ) * 16 + BaseAdd
        print("Address is:",Address)
        return Address
#########################
on micropython the rusult is:
"len(gb2312):",3
Last edited by stategrid on Wed Feb 12, 2020 7:25 pm, edited 1 time in total.

User avatar
jimmo
Posts: 1177
Joined: Tue Aug 08, 2017 1:57 am
Location: Sydney, Australia

Re: Need Advise,micropython gb2312 encoding bug? Thank you

Post by jimmo » Wed Feb 12, 2020 3:23 am

I'm not sure I quite understand the question you're asking...

Can you provide a simple case (and a link to the libraries you're using) that shows the error you're seeing?

stategrid
Posts: 2
Joined: Fri Feb 07, 2020 6:57 am

Re: Need Advise,micropython gb2312 encoding bug? Thank you

Post by stategrid » Wed Feb 12, 2020 8:39 pm

Thank you for your help,here explan what‘s GB_2312
https://en.wikipedia.org/wiki/GB_2312

Bcause chinese has so much words than english,so in microchip system,engineer use
Dot matrix font to store, read and display chinese。The font can be stored in SPI flash chip,or font chip(which also use SPI interface but more cheeper)。
Each chinese word has an 2 bytes address in font chip,the higher bytes is Index and the lower is area.
For example,chinese word ”啊“(which means oh,pronunciation ”Ah“).
In chinse font chip, data of the word ”啊“ is:
0000b040 00 04 2f 7e f9 04 a9 04 aa 14 aa 7c ac 54 aa 54

It is the 16*16 Dot matrix of chinese word.
16*16 Dot matrix font of ”啊“ is:

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ■ ○ ○
○ ○ ■ ○ ■ ■ ■ ■ ○ ■ ■ ■ ■ ■ ■ ○
■ ■ ■ ■ ■ ○ ○ ■ ○ ○ ○ ○ ○ ■ ○ ○
■ ○ ■ ○ ■ ○ ○ ■ ○ ○ ○ ○ ○ ■ ○ ○
■ ○ ■ ○ ■ ○ ■ ○ ○ ○ ○ ■ ○ ■ ○ ○
■ ○ ■ ○ ■ ○ ■ ○ ○ ■ ■ ■ ■ ■ ○ ○
■ ○ ■ ○ ■ ■ ○ ○ ○ ■ ○ ■ ○ ■ ○ ○
■ ○ ■ ○ ■ ○ ■ ○ ○ ■ ○ ■ ○ ■ ○ ○
■ ○ ■ ○ ■ ○ ■ ○ ○ ■ ○ ■ ○ ■ ○ ○
■ ○ ■ ○ ■ ○ ○ ■ ○ ■ ○ ■ ○ ■ ○ ○
■ ■ ■ ○ ■ ○ ○ ■ ○ ■ ■ ■ ○ ■ ○ ○
■ ○ ■ ○ ■ ■ ○ ■ ○ ■ ○ ■ ○ ■ ○ ○
○ ○ ○ ○ ■ ○ ■ ○ ○ ○ ○ ○ ○ ■ ○ ○
○ ○ ○ ○ ■ ○ ○ ○ ○ ○ ○ ○ ○ ■ ○ ○
○ ○ ○ ○ ■ ○ ○ ○ ○ ○ ○ ■ ○ ■ ○ ○
○ ○ ○ ○ ■ ○ ○ ○ ○ ○ ○ ○ ■ ■ ○ ○

The address of ”啊“ is 1601(decimal system),which means Index==16 and Area==01.
IF we write in Hexadecimal,Index==10 and Area==01.

And chinese word ”啊“ after gb2312 encoding is 0xB0A1,that is ,”啊“.encode('gb2312')==0xB0A1.
the higer of gb2312 encoded ==B0, the lower of gb2312 encoded==A1.

so transition between gb2312 encoding and address is:
the higer of gb2312 encoded - Index == B0-10==A0H
the lower of gb2312 encoded - Area == A1-01== A0H

A0H is a constant.

offset= ( 94*(Index-1) + (Area -1) ) * 32 ==0xb040

After the begin Address+offset,we read font chip for 2 bytes,
then get data of the word ”啊“:

0000b040 00 04 2f 7e f9 04 a9 04 aa 14 aa 7c ac 54 aa 54

We now can get a conclusion ,if we get the correct gb2312 encoding result ,we can calculate the Address of each chinese word in font chip

Post Reply