[Python] convert Unicode String to int

Posted in :

有一個 unicode 或 utf-8 的字碼,想知道產生出來的字,請服用 chr:
https://docs.python.org/3/library/functions.html#chr

例如:

chr(24465)
 '徑'

想知道某個字對到的code 請使用 ord:
https://docs.python.org/3/library/functions.html#ord

例如:

ord('徑')
 24465

Note that when you encrypt end decrypt text, you usually encode text to a binary representation with a character encoding. Unicode text can be encoded with different encodings with different advantages and disadvantages. These days the most commonly used encoding for Unicode text UTF-8, but others exist to.

In Python 3, binary data is represented in the bytes object, and you encode text to bytes with the str.encode() method and go back by using bytes.decode():

>>> 'Hello World!'.encode('utf8')
b'Hello World!'
>>> b'Hello World!'.decode('utf8')
'Hello World!'

bytes values are really just sequences, like lists and tuples and strings, but consisting of integer numbers from 0-255:

>>> list('Hello World!'.encode('utf8'))
[72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]

Personally, when encrypting, you want to encode and encrypt the resulting bytes.

If all this seems overwhelming or hard to follow, perhaps these articles on Unicode and character encodings can help out:


基本轉換之後,還需要把 int 換成 hex, 範例如下:

>>> ord('耄')
 32772
 >>> hex(ord('耄'))
 '0x8004'
 >>> str(hex(ord('耄')))
 '0x8004'
 >>> str(hex(ord('耄')))[2:]
 '8004'

Try:

"0x%x" % 255 # => 0xff

or

"0x%X" % 255 # => 0xFF

Python Documentation says: “keep this under Your pillow: http://docs.python.org/library/index.html


相關文章:

CJK Unified Ideographs
https://en.wikipedia.org/wiki/CJK_Unified_Ideographs

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *