有一個 unicode 或 utf-8 的字碼,想知道產生出來的字,請服用 chr:
https://docs.python.org/3/library/functions.html#chr
例如:
chr(24465) '徑'
想知道某個字對到的code 請使用 ord:
https://docs.python.org/3/library/functions.html#ord
例如:
ord('徑') 24465
Note that when you encrypt end decrypt text, you usually encode text to a binary representation with a character encoding. Unicode text can be encoded with different encodings with different advantages and disadvantages. These days the most commonly used encoding for Unicode text UTF-8, but others exist to.
In Python 3, binary data is represented in the bytes
object, and you encode text to bytes with the str.encode()
method and go back by using bytes.decode()
:
>>> 'Hello World!'.encode('utf8')
b'Hello World!'
>>> b'Hello World!'.decode('utf8')
'Hello World!'
bytes
values are really just sequences, like lists and tuples and strings, but consisting of integer numbers from 0-255:
>>> list('Hello World!'.encode('utf8'))
[72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
Personally, when encrypting, you want to encode and encrypt the resulting bytes.
If all this seems overwhelming or hard to follow, perhaps these articles on Unicode and character encodings can help out:
- What every developer needs to know about Unicode
- Ned Batchelder’s Pragmatic Unicode
- Python’s Unicode HOWTO
基本轉換之後,還需要把 int 換成 hex, 範例如下:
>>> ord('耄') 32772 >>> hex(ord('耄')) '0x8004' >>> str(hex(ord('耄'))) '0x8004' >>> str(hex(ord('耄')))[2:] '8004'
Try:
"0x%x" % 255 # => 0xff
or
"0x%X" % 255 # => 0xFF
Python Documentation says: “keep this under Your pillow: http://docs.python.org/library/index.html“
相關文章:
CJK Unified Ideographs
https://en.wikipedia.org/wiki/CJK_Unified_Ideographs