바이트를 문자열로 변환

Programing

바이트를 문자열로 변환

c10106 2022. 3. 9. 09:47

바이트를 문자열로 변환

외부 프로그램에서 표준 출력을 얻기 위해 이 코드를 사용한다.

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

통신() 메서드는 바이트 배열을 반환한다.

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

하지만, 나는 출력을 일반적인 파이톤 문자열로 하고 싶다.이렇게 인쇄할 수 있도록:

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

빈아시스가 그런 줄 알았다.b2a_qp() 메서드는 용이지만, 시도했을 때 동일한 바이트 배열을 다시 얻었다.

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

바이트 값을 문자열로 다시 변환하는 방법내 말은, 수동으로 하는 대신에 "배터리"를 사용하는 거야.그리고 파이톤 3로도 괜찮았으면 좋겠다.

문자열을 생성하려면 바이트 개체를 디코딩해야 함:

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'

https://docs.python.org/3/library/stdtypes.html#bytes.decode을 참조하십시오.

바이트 문자열을 해독하여 문자(유니코드) 문자열에 넣어야 한다.

On Python 2

encoding = 'utf-8'
'hello'.decode(encoding)

또는

unicode('hello', encoding)

On Python 3

encoding = 'utf-8'
b'hello'.decode(encoding)

또는

str(b'hello', encoding)

나는 이 방법이 쉽다고 생각한다.

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

인코딩을 모르는 경우 Python 3과 Python 2가 호환되는 방법으로 문자열의 이진 입력을 읽으려면 다음 MS-DOS CP437 인코딩을 사용하십시오.

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

인코딩을 알 수 없으므로 영어 이외의 기호가 다음 문자로 변환될 것으로 예상cp437(영어 문자는 대부분의 단일 바이트 인코딩과 UTF-8에서 일치하기 때문에 번역되지 않는다.)

UTF-8로 임의의 이진 입력을 디코딩하는 것은 안전하지 않다. 왜냐하면 다음과 같은 결과를 얻을 수 있기 때문이다.

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

에 대해서도 마찬가지다.latin-1, Python 2의 인기(기본값?)는 무엇이었습니까?코드 페이지 레이아웃에서 누락된 점 보기 - Python이 악명높은 이름으로 칭하는 지점ordinal not in range.

업데이트 20150604:파이톤 3가 가지고 있다는 소문이 있다.surrogateescape 하지만 변환 테스트가 필요하다.[binary] -> [str] -> [binary] 검증하기 및 안정성 검증.

업데이트 20170116:Nearoo의 코멘트 덕분에 - 또한 알 수 없는 모든 바이트를backslashreplace오류 처리기이 방법은 Python 3에만 적용되므로 이 해결 방법을 사용하더라도 다른 Python 버전에서 일관되지 않은 출력을 얻을 수 있다.

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('utf-8', 'backslashreplace'))

자세한 내용은 Python의 유니코드 지원을 참조하십시오.

업데이트 20170119:나는 Python 2와 Python 3에 모두 적용되는 슬래시 탈출 디코드를 구현하기로 결정했다.보다 느려야 한다.cp437해결책이지만 모든 파이썬 버전에서 동일한 결과를 내야 한다.

# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))

Python 3에서 기본 인코딩은"utf-8", 직접 사용할 수 있는 항목:

b'hello'.decode()

에 해당하는

b'hello'.decode(encoding="utf-8")

반면 Python 2에서 인코딩은 기본 문자열 인코딩으로 기본 설정된다.따라서 다음을 사용해야 한다.

b'hello'.decode(encoding)

어디에encoding원하는 부호화야

참고: 키워드 인수에 대한 지원이 Python 2.7에서 추가됨.

난 네가 정말로 이걸 원한다고 생각해:

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

어떤 인코딩을 사용해야 하는지 알아야 한다는 점만 빼면 아론의 대답은 정확했다.그리고 나는 윈도우가 윈도우-1252를 사용한다고 믿는다.(비 ASC) 특이한 것이 있을 때만 문제가 될 것이다.II) 내용에는 문자가 있지만, 그러면 달라질 수 있다.

그런데, 그것이 중요하다는 사실이 Python이 바이너리와 텍스트 데이터에 두 가지 다른 유형을 사용하는 이유인 즉, Python이 그것을 말하지 않으면 인코딩을 모르기 때문에 그들 사이에서 마술적으로 변환할 수 없다!Windows 설명서를 읽거나 여기에서 읽으십시오.

이 이문헌에 이기 때문에.subprocess출력, 더 직접적인 접근법을 이용할 수 있다.가장 현대적인 것은 사용과 패스일 것이다.text=True 3.7 (Python 3.7+) Sttdhabling Stdout을 사용하는 프로그램:

text = subprocess.check_output(["ls", "-l"], text=True)

파이톤 3.6은.Popen인코딩 키워드 수락:

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

하위 프로세스 출력을 처리하지 않는 경우 제목의 질문에 대한 일반적인 대답은 바이트를 텍스트로 디코딩하는 것이다.

>>> b'abcde'.decode()
'abcde'

논쟁 없이, 사용될 것이다.데이터가 없는 경우sys.getdefaultencoding()그런 다음 호출에서 인코딩을 명시적으로 지정하십시오.

>>> b'caf\xe9'.decode('cp1250')
'café'

universal_newline을 True로 설정하십시오.

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

바이트 시퀀스를 텍스트로 해석하려면 해당 문자 인코딩을 알아야 한다.

unicode_text = bytestring.decode(character_encoding)

예:

>>> b'\xc2\xb5'.decode('utf-8')
'µ'

ls명령어는 텍스트로 해석할 수 없는 출력을 생성할 수 있다.Unix의 파일 이름은 슬래시를 제외한 모든 바이트 시퀀스일 수 있음b'/'과 0의 0b'\0':

>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

utf-8 인코딩 높이를 사용하여 이러한 바이트 수프를 디코딩하려고 시도UnicodeDecodeError.

더 나빠질 수도 있다.호환되지 않는 잘못된 인코딩을 사용할 경우 디코딩이 자동으로 실패하고 모히바케가 생성될 수 있음:

>>> '—'.encode('utf-8').decode('cp1252')
'â€”'

데이터가 손상되었지만 프로그램이 오류가 발생했음을 인식하지 못하는 경우.

일반적으로 사용할 문자 인코딩은 바이트 순서 자체에 내장되어 있지 않다.너는 이 정보를 대역 외로 전달해야 한다.어떤 결과들은 다른 결과들보다 더 가능성이 높기 때문에chardet문자 인코딩을 추측할 수 있는 모듈이 존재한다.단일 Python 스크립트는 서로 다른 위치에서 여러 문자 인코딩을 사용할 수 있다.

lspython은 python의 문자열을 하여 할 수 있다.os.fsdecode()코드화되지 않은 파일 이름에서도 성공하는 기능(이 기능sys.getfilesystemencoding()그리고surrogateescapeUnix의 오류 처리기:

import os
import subprocess

output = os.fsdecode(subprocess.check_output('ls'))

원래 바이트를 얻으려면os.fsencode().

합격하면universal_newlines=True그 다음 매개 변수subprocess사용하다locale.getpreferredencoding(False)예를 들어, 바이트를 디코딩하려면cp1252윈도서.

바이트 스트림을 즉시 해독하려면 다음과 같이 사용할 수 있다.

다른 명령어는 출력에 다른 문자 인코딩을 사용할 수 있다(예:dir내부 명령(내부 명령)cmd)은 cp437을 사용할 수 있다.출력을 디코딩하려면 인코딩을 명시적으로 전달하십시오(Python 3.6+):

output = subprocess.check_output('dir', shell=True, encoding='cp437')

파일 이름은 다음과 다를 수 있다.os.listdir()를 사용하는 (Windows 유니코드 API api,'\xb6'으로 대신할 수 있다'\x14'맵 —Python의 cp437 코덱 맵.b'\x14'U+00B6() 대신 문자 U+0014를 제어한다.임의 유니코드 문자가 있는 파일 이름을 지원하려면 Python 문자열로 ASCII가 아닌 유니코드 문자가 포함된 PowerShell 출력 디코딩을 참조하십시오.

@Aaron Maenpaa의 대답이 그냥 효과가 있는 동안, 한 사용자가 최근 다음과 같이 물었다.

더 이상 간단한 방법이 없을까?'fhand.read().fhand.read().'ASCII"] [...] 너무 길어!

사용할 수 있는 항목:

command_stdout.decode()

decode()표준 인수:

codecs.decode(obj, encoding='utf-8', errors='strict')

만약 당신이 노력해서 다음과 같은 것을 얻어야 한다면.decode():

AttributeError: 'str' 개체에 'decode' 특성이 없음

또한 캐스트에서 인코딩 유형을 직접 지정할 수도 있다.

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

이 오류가 발생한 경우:

'utf-8 코덱이 바이트 0x8a를 디코딩할 수 없음'

, 그러면 다음 코드를 사용하여 바이트를 문자열로 변환하는 것이 좋다.

bytes = b"abcdefg"
string = bytes.decode("utf-8", "ignore")

즐겨라!

리스트를 정리하는 기능을 만들었다.

def cleanLists(self, lista):
    lista = [x.strip() for x in lista]
    lista = [x.replace('\n', '') for x in lista]
    lista = [x.replace('\b', '') for x in lista]
    lista = [x.encode('utf8') for x in lista]
    lista = [x.decode('utf8') for x in lista]

    return lista

를 사용하는 경우)의(으)\r\n 내 ) 내 대 대.

String = Bytes.decode("utf-8").replace("\r\n", "\n")

왜? 멀티라인 입력으로 사용해 보십시오.txt:

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

모든 라인 엔딩이 두 배로 늘어나게 될 것이다.\r\r\n (), ( 이 ), 더 아 아 아.Python의 텍스트 읽기 기능은 일반적으로 문자열만 사용하도록 줄 엔딩을 정규화한다.\n만약 을 할 Windows 시스템에서 이진 데이터를 수신하면 Python은 그럴 기회가 없다.그러므로

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

원본 파일을 복제할 수 있을 겁니다.

Python 3의 경우, 이것은 훨씬 더 안전하고 피톤닉한 접근방식이다.byte로string:

def byte_to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes): # Check if it's in bytes
        print(bytes_or_str.decode('utf-8'))
    else:
        print("Object not of byte type")

byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n')

출력:

total 0
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

Python 3.7에서 "셸 명령어를 실행하여 해당 출력을 바이트 대신 텍스트로 가져오기"와 같은 특정 사례의 경우 를 사용하고 전달하십시오.text=True(뿐 아니라)capture_output=True출력)(출력)

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
command_result.stdout  # is a `str` containing your program's stdout

text호명되곤 했다.universal_newlines에서 ( 하려면 3.7 7의 파이썬을 된다universal_newlines=True대신에text=True

From sys — 시스템별 파라미터 및 기능:

표준 스트림에서 또는 표준 스트림으로 이진 데이터를 쓰거나 읽으려면 기본 이진 버퍼를 사용하십시오.예를 들어, stdout에 바이트를 쓰려면sys.stdout.buffer.write(b'abc').

로 디코딩.decode()이건 줄을 해독할 겁니다.합격하다'utf-8'는 내부의 값으로.

def toString(string):    
    try:
        return v.decode("utf-8")
    except ValueError:
        return string

b = b'97.080.500'
s = '97.080.500'
print(toString(b))
print(toString(s))

바이트만 바이트로 변환하는 것이 아니라 바이트를 변환하려는 경우:

with open("bytesfile", "rb") as infile:
    str = base64.b85encode(imageFile.read())

with open("bytesfile", "rb") as infile:
    str2 = json.dumps(list(infile.read()))

그러나 이것은 매우 효율적이지 않다.그것은 2MB 그림을 9MB로 바꿀 것이다.

이렇게 해보다

bytes.fromhex('c3a9').decode('utf-8')

을 사용해 이 집합이 문자 예: 이 기기는 사사로이 보기). 이 기능은 문자 집합이 아닌 모든 문자 집합을 무시합니다(예:utf-8 이진 이진 파일을 반환하고 깨끗한 문자열을 반환하십시오.에 대해 테스트된다.python3.6그 이상

def bin2str(text, encoding = 'utf-8'):
    """Converts a binary to Unicode string by removing all non Unicode char
    text: binary string to work on
    encoding: output encoding *utf-8"""

    return text.decode(encoding, 'ignore')

여기서 함수는 바이너리를 취하여 디코딩한다(파이톤 사전 정의된 문자 집합과 를 사용하여 바이너리 데이터를 문자로 변환한다).ignore인수는 이진에서 모든 비문자 집합 데이터를 무시하고 마지막으로 원하는 데이터를 반환함string가치를 매기다

잘 , 에를 사용한다sys.getdefaultencoding()장치의 기본 인코딩을 가져오십시오.

우리는 바이트 객체를 디코딩하여 문자열을 생성할 수 있다.bytes.decode(encoding='utf-8', errors='strict')문서화용.여기를 클릭

Python3예:

byte_value = b"abcde"
print("Initial value = {}".format(byte_value))
print("Initial value type = {}".format(type(byte_value)))
string_value = byte_value.decode("utf-8")
# utf-8 is used here because it is a very common encoding, but you need to use the encoding your data is actually in.
print("------------")
print("Converted value = {}".format(string_value))
print("Converted value type = {}".format(type(string_value)))

출력:

Initial value = b'abcde'
Initial value type = <class 'bytes'>
------------
Converted value = abcde
Converted value type = <class 'str'>

은 pythongo: Python3에 있는 기기에서 인기는 다.utf-8그래서.<byte_string>.decode("utf-8")라고도 쓸 수 있다.<byte_string>.decode()

참조URL: https://stackoverflow.com/questions/606191/convert-bytes-to-a-string

'Programing' 카테고리의 다른 글

기본 반응: 가져오기 요청이 오류와 함께 실패함 - TypeError: (0)	2022.03.09
Vue.js 하위 구성 요소에서 모달 닫기 (0)	2022.03.09
판다 멀티 인덱스를 컬럼으로 변환 (0)	2022.03.09
Android에서 네이티브 작동 속도가 매우 느림 (0)	2022.03.09
VueJS 조건부로 요소의 속성 추가 (0)	2022.03.09

현재글바이트를 문자열로 변환

c10106

바이트를 문자열로 변환

바이트를 문자열로 변환

'Programing' 카테고리의 다른 글

'Programing'의 다른글

티스토리툴바

바이트를 문자열로 변환

바이트를 문자열로 변환

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바