Edit File by line

'''"Executable documentation" for the pickle module.

[0] Fix | Delete

[1] Fix | Delete

Extensive comments about the pickle protocols and pickle-machine opcodes

[2] Fix | Delete

can be found here. Some functions meant for external use:

[3] Fix | Delete

[4] Fix | Delete

genops(pickle)

[5] Fix | Delete

Generate all the opcodes in a pickle, as (opcode, arg, position) triples.

[6] Fix | Delete

[7] Fix | Delete

dis(pickle, out=None, memo=None, indentlevel=4)

[8] Fix | Delete

Print a symbolic disassembly of a pickle.

[9] Fix | Delete

'''

[10] Fix | Delete

[11] Fix | Delete

import codecs

[12] Fix | Delete

import io

[13] Fix | Delete

import pickle

[14] Fix | Delete

import re

[15] Fix | Delete

import sys

[16] Fix | Delete

[17] Fix | Delete

__all__ = ['dis', 'genops', 'optimize']

[18] Fix | Delete

[19] Fix | Delete

bytes_types = pickle.bytes_types

[20] Fix | Delete

[21] Fix | Delete

# Other ideas:

[22] Fix | Delete

[23] Fix | Delete

# - A pickle verifier: read a pickle and check it exhaustively for

[24] Fix | Delete

# well-formedness. dis() does a lot of this already.

[25] Fix | Delete

[26] Fix | Delete

# - A protocol identifier: examine a pickle and return its protocol number

[27] Fix | Delete

# (== the highest .proto attr value among all the opcodes in the pickle).

[28] Fix | Delete

# dis() already prints this info at the end.

[29] Fix | Delete

[30] Fix | Delete

# - A pickle optimizer: for example, tuple-building code is sometimes more

[31] Fix | Delete

# elaborate than necessary, catering for the possibility that the tuple

[32] Fix | Delete

# is recursive. Or lots of times a PUT is generated that's never accessed

[33] Fix | Delete

# by a later GET.

[34] Fix | Delete

[35] Fix | Delete

[36] Fix | Delete

# "A pickle" is a program for a virtual pickle machine (PM, but more accurately

[37] Fix | Delete

# called an unpickling machine). It's a sequence of opcodes, interpreted by the

[38] Fix | Delete

# PM, building an arbitrarily complex Python object.

[39] Fix | Delete

[40] Fix | Delete

# For the most part, the PM is very simple: there are no looping, testing, or

[41] Fix | Delete

# conditional instructions, no arithmetic and no function calls. Opcodes are

[42] Fix | Delete

# executed once each, from first to last, until a STOP opcode is reached.

[43] Fix | Delete

[44] Fix | Delete

# The PM has two data areas, "the stack" and "the memo".

[45] Fix | Delete

[46] Fix | Delete

# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python

[47] Fix | Delete

# integer object on the stack, whose value is gotten from a decimal string

[48] Fix | Delete

# literal immediately following the INT opcode in the pickle bytestream. Other

[49] Fix | Delete

# opcodes take Python objects off the stack. The result of unpickling is

[50] Fix | Delete

# whatever object is left on the stack when the final STOP opcode is executed.

[51] Fix | Delete

[52] Fix | Delete

# The memo is simply an array of objects, or it can be implemented as a dict

[53] Fix | Delete

# mapping little integers to objects. The memo serves as the PM's "long term

[54] Fix | Delete

# memory", and the little integers indexing the memo are akin to variable

[55] Fix | Delete

# names. Some opcodes pop a stack object into the memo at a given index,

[56] Fix | Delete

# and others push a memo object at a given index onto the stack again.

[57] Fix | Delete

[58] Fix | Delete

# At heart, that's all the PM has. Subtleties arise for these reasons:

[59] Fix | Delete

[60] Fix | Delete

# + Object identity. Objects can be arbitrarily complex, and subobjects

[61] Fix | Delete

# may be shared (for example, the list [a, a] refers to the same object a

[62] Fix | Delete

# twice). It can be vital that unpickling recreate an isomorphic object

[63] Fix | Delete

# graph, faithfully reproducing sharing.

[64] Fix | Delete

[65] Fix | Delete

# + Recursive objects. For example, after "L = []; L.append(L)", L is a

[66] Fix | Delete

# list, and L[0] is the same list. This is related to the object identity

[67] Fix | Delete

# point, and some sequences of pickle opcodes are subtle in order to

[68] Fix | Delete

# get the right result in all cases.

[69] Fix | Delete

[70] Fix | Delete

# + Things pickle doesn't know everything about. Examples of things pickle

[71] Fix | Delete

# does know everything about are Python's builtin scalar and container

[72] Fix | Delete

# types, like ints and tuples. They generally have opcodes dedicated to

[73] Fix | Delete

# them. For things like module references and instances of user-defined

[74] Fix | Delete

# classes, pickle's knowledge is limited. Historically, many enhancements

[75] Fix | Delete

# have been made to the pickle protocol in order to do a better (faster,

[76] Fix | Delete

# and/or more compact) job on those.

[77] Fix | Delete

[78] Fix | Delete

# + Backward compatibility and micro-optimization. As explained below,

[79] Fix | Delete

# pickle opcodes never go away, not even when better ways to do a thing

[80] Fix | Delete

# get invented. The repertoire of the PM just keeps growing over time.

[81] Fix | Delete

# For example, protocol 0 had two opcodes for building Python integers (INT

[82] Fix | Delete

# and LONG), protocol 1 added three more for more-efficient pickling of short

[83] Fix | Delete

# integers, and protocol 2 added two more for more-efficient pickling of

[84] Fix | Delete

# long integers (before protocol 2, the only ways to pickle a Python long

[85] Fix | Delete

# took time quadratic in the number of digits, for both pickling and

[86] Fix | Delete

# unpickling). "Opcode bloat" isn't so much a subtlety as a source of

[87] Fix | Delete

# wearying complication.

[88] Fix | Delete

[89] Fix | Delete

[90] Fix | Delete

# Pickle protocols:

[91] Fix | Delete

[92] Fix | Delete

# For compatibility, the meaning of a pickle opcode never changes. Instead new

[93] Fix | Delete

# pickle opcodes get added, and each version's unpickler can handle all the

[94] Fix | Delete

# pickle opcodes in all protocol versions to date. So old pickles continue to

[95] Fix | Delete

# be readable forever. The pickler can generally be told to restrict itself to

[96] Fix | Delete

# the subset of opcodes available under previous protocol versions too, so that

[97] Fix | Delete

# users can create pickles under the current version readable by older

[98] Fix | Delete

# versions. However, a pickle does not contain its version number embedded

[99] Fix | Delete

# within it. If an older unpickler tries to read a pickle using a later

[100] Fix | Delete

# protocol, the result is most likely an exception due to seeing an unknown (in

[101] Fix | Delete

# the older unpickler) opcode.

[102] Fix | Delete

[103] Fix | Delete

# The original pickle used what's now called "protocol 0", and what was called

[104] Fix | Delete

# "text mode" before Python 2.3. The entire pickle bytestream is made up of

[105] Fix | Delete

# printable 7-bit ASCII characters, plus the newline character, in protocol 0.

[106] Fix | Delete

# That's why it was called text mode. Protocol 0 is small and elegant, but

[107] Fix | Delete

# sometimes painfully inefficient.

[108] Fix | Delete

[109] Fix | Delete

# The second major set of additions is now called "protocol 1", and was called

[110] Fix | Delete

# "binary mode" before Python 2.3. This added many opcodes with arguments

[111] Fix | Delete

# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"

[112] Fix | Delete

# bytes. Binary mode pickles can be substantially smaller than equivalent

[113] Fix | Delete

# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte

[114] Fix | Delete

# int as 4 bytes following the opcode, which is cheaper to unpickle than the

[115] Fix | Delete

# (perhaps) 11-character decimal string attached to INT. Protocol 1 also added

[116] Fix | Delete

# a number of opcodes that operate on many stack elements at once (like APPENDS

[117] Fix | Delete

# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).

[118] Fix | Delete

[119] Fix | Delete

# The third major set of additions came in Python 2.3, and is called "protocol

[120] Fix | Delete

# 2". This added:

[121] Fix | Delete

[122] Fix | Delete

# - A better way to pickle instances of new-style classes (NEWOBJ).

[123] Fix | Delete

[124] Fix | Delete

# - A way for a pickle to identify its protocol (PROTO).

[125] Fix | Delete

[126] Fix | Delete

# - Time- and space- efficient pickling of long ints (LONG{1,4}).

[127] Fix | Delete

[128] Fix | Delete

# - Shortcuts for small tuples (TUPLE{1,2,3}}.

[129] Fix | Delete

[130] Fix | Delete

# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).

[131] Fix | Delete

[132] Fix | Delete

# - The "extension registry", a vector of popular objects that can be pushed

[133] Fix | Delete

# efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but

[134] Fix | Delete

# the registry contents are predefined (there's nothing akin to the memo's

[135] Fix | Delete

# PUT).

[136] Fix | Delete

[137] Fix | Delete

# Another independent change with Python 2.3 is the abandonment of any

[138] Fix | Delete

# pretense that it might be safe to load pickles received from untrusted

[139] Fix | Delete

# parties -- no sufficient security analysis has been done to guarantee

[140] Fix | Delete

# this and there isn't a use case that warrants the expense of such an

[141] Fix | Delete

# analysis.

[142] Fix | Delete

[143] Fix | Delete

# To this end, all tests for __safe_for_unpickling__ or for

[144] Fix | Delete

# copyreg.safe_constructors are removed from the unpickling code.

[145] Fix | Delete

# References to these variables in the descriptions below are to be seen

[146] Fix | Delete

# as describing unpickling in Python 2.2 and before.

[147] Fix | Delete

[148] Fix | Delete

[149] Fix | Delete

# Meta-rule: Descriptions are stored in instances of descriptor objects,

[150] Fix | Delete

# with plain constructors. No meta-language is defined from which

[151] Fix | Delete

# descriptors could be constructed. If you want, e.g., XML, write a little

[152] Fix | Delete

# program to generate XML from the objects.

[153] Fix | Delete

[154] Fix | Delete

##############################################################################

[155] Fix | Delete

# Some pickle opcodes have an argument, following the opcode in the

[156] Fix | Delete

# bytestream. An argument is of a specific type, described by an instance

[157] Fix | Delete

# of ArgumentDescriptor. These are not to be confused with arguments taken

[158] Fix | Delete

# off the stack -- ArgumentDescriptor applies only to arguments embedded in

[159] Fix | Delete

# the opcode stream, immediately following an opcode.

[160] Fix | Delete

[161] Fix | Delete

# Represents the number of bytes consumed by an argument delimited by the

[162] Fix | Delete

# next newline character.

[163] Fix | Delete

UP_TO_NEWLINE = -1

[164] Fix | Delete

[165] Fix | Delete

# Represents the number of bytes consumed by a two-argument opcode where

[166] Fix | Delete

# the first argument gives the number of bytes in the second argument.

[167] Fix | Delete

TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int

[168] Fix | Delete

TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int

[169] Fix | Delete

TAKEN_FROM_ARGUMENT4U = -4 # num bytes is 4-byte unsigned little-endian int

[170] Fix | Delete

TAKEN_FROM_ARGUMENT8U = -5 # num bytes is 8-byte unsigned little-endian int

[171] Fix | Delete

[172] Fix | Delete

class ArgumentDescriptor(object):

[173] Fix | Delete

__slots__ = (

[174] Fix | Delete

# name of descriptor record, also a module global name; a string

[175] Fix | Delete

'name',

[176] Fix | Delete

[177] Fix | Delete

# length of argument, in bytes; an int; UP_TO_NEWLINE and

[178] Fix | Delete

# TAKEN_FROM_ARGUMENT{1,4,8} are negative values for variable-length

[179] Fix | Delete

# cases

[180] Fix | Delete

'n',

[181] Fix | Delete

[182] Fix | Delete

# a function taking a file-like object, reading this kind of argument

[183] Fix | Delete

# from the object at the current position, advancing the current

[184] Fix | Delete

# position by n bytes, and returning the value of the argument

[185] Fix | Delete

'reader',

[186] Fix | Delete

[187] Fix | Delete

# human-readable docs for this arg descriptor; a string

[188] Fix | Delete

'doc',

[189] Fix | Delete

)

[190] Fix | Delete

[191] Fix | Delete

def __init__(self, name, n, reader, doc):

[192] Fix | Delete

assert isinstance(name, str)

[193] Fix | Delete

self.name = name

[194] Fix | Delete

[195] Fix | Delete

assert isinstance(n, int) and (n >= 0 or

[196] Fix | Delete

n in (UP_TO_NEWLINE,

[197] Fix | Delete

TAKEN_FROM_ARGUMENT1,

[198] Fix | Delete

TAKEN_FROM_ARGUMENT4,

[199] Fix | Delete

TAKEN_FROM_ARGUMENT4U,

[200] Fix | Delete

TAKEN_FROM_ARGUMENT8U))

[201] Fix | Delete

self.n = n

[202] Fix | Delete

[203] Fix | Delete

self.reader = reader

[204] Fix | Delete

[205] Fix | Delete

assert isinstance(doc, str)

[206] Fix | Delete

self.doc = doc

[207] Fix | Delete

[208] Fix | Delete

from struct import unpack as _unpack

[209] Fix | Delete

[210] Fix | Delete

def read_uint1(f):

[211] Fix | Delete

r"""

[212] Fix | Delete

>>> import io

[213] Fix | Delete

>>> read_uint1(io.BytesIO(b'\xff'))

[214] Fix | Delete

255

[215] Fix | Delete

"""

[216] Fix | Delete

[217] Fix | Delete

data = f.read(1)

[218] Fix | Delete

if data:

[219] Fix | Delete

return data[0]

[220] Fix | Delete

raise ValueError("not enough data in stream to read uint1")

[221] Fix | Delete

[222] Fix | Delete

uint1 = ArgumentDescriptor(

[223] Fix | Delete

name='uint1',

[224] Fix | Delete

n=1,

[225] Fix | Delete

reader=read_uint1,

[226] Fix | Delete

doc="One-byte unsigned integer.")

[227] Fix | Delete

[228] Fix | Delete

[229] Fix | Delete

def read_uint2(f):

[230] Fix | Delete

r"""

[231] Fix | Delete

>>> import io

[232] Fix | Delete

>>> read_uint2(io.BytesIO(b'\xff\x00'))

[233] Fix | Delete

255

[234] Fix | Delete

>>> read_uint2(io.BytesIO(b'\xff\xff'))

[235] Fix | Delete

65535

[236] Fix | Delete

"""

[237] Fix | Delete

[238] Fix | Delete

data = f.read(2)

[239] Fix | Delete

if len(data) == 2:

[240] Fix | Delete

return _unpack("<H", data)[0]

[241] Fix | Delete

raise ValueError("not enough data in stream to read uint2")

[242] Fix | Delete

[243] Fix | Delete

uint2 = ArgumentDescriptor(

[244] Fix | Delete

name='uint2',

[245] Fix | Delete

n=2,

[246] Fix | Delete

reader=read_uint2,

[247] Fix | Delete

doc="Two-byte unsigned integer, little-endian.")

[248] Fix | Delete

[249] Fix | Delete

[250] Fix | Delete

def read_int4(f):

[251] Fix | Delete

r"""

[252] Fix | Delete

>>> import io

[253] Fix | Delete

>>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))

[254] Fix | Delete

255

[255] Fix | Delete

>>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)

[256] Fix | Delete

True

[257] Fix | Delete

"""

[258] Fix | Delete

[259] Fix | Delete

data = f.read(4)

[260] Fix | Delete

if len(data) == 4:

[261] Fix | Delete

return _unpack("<i", data)[0]

[262] Fix | Delete

raise ValueError("not enough data in stream to read int4")

[263] Fix | Delete

[264] Fix | Delete

int4 = ArgumentDescriptor(

[265] Fix | Delete

name='int4',

[266] Fix | Delete

n=4,

[267] Fix | Delete

reader=read_int4,

[268] Fix | Delete

doc="Four-byte signed integer, little-endian, 2's complement.")

[269] Fix | Delete

[270] Fix | Delete

[271] Fix | Delete

def read_uint4(f):

[272] Fix | Delete

r"""

[273] Fix | Delete

>>> import io

[274] Fix | Delete

>>> read_uint4(io.BytesIO(b'\xff\x00\x00\x00'))

[275] Fix | Delete

255

[276] Fix | Delete

>>> read_uint4(io.BytesIO(b'\x00\x00\x00\x80')) == 2**31

[277] Fix | Delete

True

[278] Fix | Delete

"""

[279] Fix | Delete

[280] Fix | Delete

data = f.read(4)

[281] Fix | Delete

if len(data) == 4:

[282] Fix | Delete

return _unpack("<I", data)[0]

[283] Fix | Delete

raise ValueError("not enough data in stream to read uint4")

[284] Fix | Delete

[285] Fix | Delete

uint4 = ArgumentDescriptor(

[286] Fix | Delete

name='uint4',

[287] Fix | Delete

n=4,

[288] Fix | Delete

reader=read_uint4,

[289] Fix | Delete

doc="Four-byte unsigned integer, little-endian.")

[290] Fix | Delete

[291] Fix | Delete

[292] Fix | Delete

def read_uint8(f):

[293] Fix | Delete

r"""

[294] Fix | Delete

>>> import io

[295] Fix | Delete

>>> read_uint8(io.BytesIO(b'\xff\x00\x00\x00\x00\x00\x00\x00'))

[296] Fix | Delete

255

[297] Fix | Delete

>>> read_uint8(io.BytesIO(b'\xff' * 8)) == 2**64-1

[298] Fix | Delete

True

[299] Fix | Delete

"""

[300] Fix | Delete

[301] Fix | Delete

data = f.read(8)

[302] Fix | Delete

if len(data) == 8:

[303] Fix | Delete

return _unpack("<Q", data)[0]

[304] Fix | Delete

raise ValueError("not enough data in stream to read uint8")

[305] Fix | Delete

[306] Fix | Delete

uint8 = ArgumentDescriptor(

[307] Fix | Delete

name='uint8',

[308] Fix | Delete

n=8,

[309] Fix | Delete

reader=read_uint8,

[310] Fix | Delete

doc="Eight-byte unsigned integer, little-endian.")

[311] Fix | Delete

[312] Fix | Delete

[313] Fix | Delete

def read_stringnl(f, decode=True, stripquotes=True):

[314] Fix | Delete

r"""

[315] Fix | Delete

>>> import io

[316] Fix | Delete

>>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))

[317] Fix | Delete

'abcd'

[318] Fix | Delete

[319] Fix | Delete

>>> read_stringnl(io.BytesIO(b"\n"))

[320] Fix | Delete

Traceback (most recent call last):

[321] Fix | Delete

...

[322] Fix | Delete

ValueError: no string quotes around b''

[323] Fix | Delete

[324] Fix | Delete

>>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)

[325] Fix | Delete

[326] Fix | Delete

[327] Fix | Delete

>>> read_stringnl(io.BytesIO(b"''\n"))

[328] Fix | Delete

[329] Fix | Delete

[330] Fix | Delete

>>> read_stringnl(io.BytesIO(b'"abcd"'))

[331] Fix | Delete

Traceback (most recent call last):

[332] Fix | Delete

...

[333] Fix | Delete

ValueError: no newline found when trying to read stringnl

[334] Fix | Delete

[335] Fix | Delete

Embedded escapes are undone in the result.

[336] Fix | Delete

>>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))

[337] Fix | Delete

'a\n\\b\x00c\td'

[338] Fix | Delete

"""

[339] Fix | Delete

[340] Fix | Delete

data = f.readline()

[341] Fix | Delete

if not data.endswith(b'\n'):

[342] Fix | Delete

raise ValueError("no newline found when trying to read stringnl")

[343] Fix | Delete

data = data[:-1] # lose the newline

[344] Fix | Delete

[345] Fix | Delete

if stripquotes:

[346] Fix | Delete

for q in (b'"', b"'"):

[347] Fix | Delete

if data.startswith(q):

[348] Fix | Delete

if not data.endswith(q):

[349] Fix | Delete

raise ValueError("strinq quote %r not found at both "

[350] Fix | Delete

"ends of %r" % (q, data))

[351] Fix | Delete

data = data[1:-1]

[352] Fix | Delete

break

[353] Fix | Delete

else:

[354] Fix | Delete

raise ValueError("no string quotes around %r" % data)

[355] Fix | Delete

[356] Fix | Delete

if decode:

[357] Fix | Delete

data = codecs.escape_decode(data)[0].decode("ascii")

[358] Fix | Delete

return data

[359] Fix | Delete

[360] Fix | Delete

stringnl = ArgumentDescriptor(

[361] Fix | Delete

name='stringnl',

[362] Fix | Delete

n=UP_TO_NEWLINE,

[363] Fix | Delete

reader=read_stringnl,

[364] Fix | Delete

doc="""A newline-terminated string.

[365] Fix | Delete

[366] Fix | Delete

This is a repr-style string, with embedded escapes, and

[367] Fix | Delete

bracketing quotes.

[368] Fix | Delete

""")

[369] Fix | Delete

[370] Fix | Delete

def read_stringnl_noescape(f):

[371] Fix | Delete

return read_stringnl(f, stripquotes=False)

[372] Fix | Delete

[373] Fix | Delete

stringnl_noescape = ArgumentDescriptor(

[374] Fix | Delete

name='stringnl_noescape',

[375] Fix | Delete

n=UP_TO_NEWLINE,

[376] Fix | Delete

reader=read_stringnl_noescape,

[377] Fix | Delete

doc="""A newline-terminated string.

[378] Fix | Delete

[379] Fix | Delete

This is a str-style string, without embedded escapes,

[380] Fix | Delete

or bracketing quotes. It should consist solely of

[381] Fix | Delete

printable ASCII characters.

[382] Fix | Delete

""")

[383] Fix | Delete

[384] Fix | Delete

def read_stringnl_noescape_pair(f):

[385] Fix | Delete

r"""

[386] Fix | Delete

>>> import io

[387] Fix | Delete

>>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))

[388] Fix | Delete

'Queue Empty'

[389] Fix | Delete

"""

[390] Fix | Delete

[391] Fix | Delete

return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))

[392] Fix | Delete

[393] Fix | Delete

stringnl_noescape_pair = ArgumentDescriptor(

[394] Fix | Delete

name='stringnl_noescape_pair',

[395] Fix | Delete

n=UP_TO_NEWLINE,

[396] Fix | Delete

reader=read_stringnl_noescape_pair,

[397] Fix | Delete

doc="""A pair of newline-terminated strings.

[398] Fix | Delete

[399] Fix | Delete

These are str-style strings, without embedded

[400] Fix | Delete

escapes, or bracketing quotes. They should

[401] Fix | Delete

consist solely of printable ASCII characters.

[402] Fix | Delete

The pair is returned as a single string, with

[403] Fix | Delete

a single blank separating the two strings.

[404] Fix | Delete

""")

[405] Fix | Delete

[406] Fix | Delete

[407] Fix | Delete

def read_string1(f):

[408] Fix | Delete

r"""

[409] Fix | Delete

>>> import io

[410] Fix | Delete

>>> read_string1(io.BytesIO(b"\x00"))

[411] Fix | Delete

[412] Fix | Delete

>>> read_string1(io.BytesIO(b"\x03abcdef"))

[413] Fix | Delete

'abc'

[414] Fix | Delete

"""

[415] Fix | Delete

[416] Fix | Delete

n = read_uint1(f)

[417] Fix | Delete

assert n >= 0

[418] Fix | Delete

data = f.read(n)

[419] Fix | Delete

if len(data) == n:

[420] Fix | Delete

return data.decode("latin-1")

[421] Fix | Delete

raise ValueError("expected %d bytes in a string1, but only %d remain" %

[422] Fix | Delete

(n, len(data)))

[423] Fix | Delete

[424] Fix | Delete

string1 = ArgumentDescriptor(

[425] Fix | Delete

name="string1",

[426] Fix | Delete

n=TAKEN_FROM_ARGUMENT1,

[427] Fix | Delete

reader=read_string1,

[428] Fix | Delete

doc="""A counted string.

[429] Fix | Delete

[430] Fix | Delete

The first argument is a 1-byte unsigned int giving the number

[431] Fix | Delete

of bytes in the string, and the second argument is that many

[432] Fix | Delete

bytes.

[433] Fix | Delete

""")

[434] Fix | Delete

[435] Fix | Delete

[436] Fix | Delete

def read_string4(f):

[437] Fix | Delete

r"""

[438] Fix | Delete

>>> import io

[439] Fix | Delete

>>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))

[440] Fix | Delete

[441] Fix | Delete

>>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))

[442] Fix | Delete

'abc'

[443] Fix | Delete

>>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))

[444] Fix | Delete

Traceback (most recent call last):

[445] Fix | Delete

...

[446] Fix | Delete

ValueError: expected 50331648 bytes in a string4, but only 6 remain

[447] Fix | Delete

"""

[448] Fix | Delete

[449] Fix | Delete

n = read_int4(f)

[450] Fix | Delete

if n < 0:

[451] Fix | Delete

raise ValueError("string4 byte count < 0: %d" % n)

[452] Fix | Delete

data = f.read(n)

[453] Fix | Delete

if len(data) == n:

[454] Fix | Delete

return data.decode("latin-1")

[455] Fix | Delete

raise ValueError("expected %d bytes in a string4, but only %d remain" %

[456] Fix | Delete

(n, len(data)))

[457] Fix | Delete

[458] Fix | Delete

string4 = ArgumentDescriptor(

[459] Fix | Delete

name="string4",

[460] Fix | Delete

n=TAKEN_FROM_ARGUMENT4,

[461] Fix | Delete

reader=read_string4,

[462] Fix | Delete

doc="""A counted string.

[463] Fix | Delete

[464] Fix | Delete

The first argument is a 4-byte little-endian signed int giving

[465] Fix | Delete

the number of bytes in the string, and the second argument is

[466] Fix | Delete

that many bytes.

[467] Fix | Delete

""")

[468] Fix | Delete

[469] Fix | Delete

[470] Fix | Delete

def read_bytes1(f):

[471] Fix | Delete

r"""

[472] Fix | Delete

>>> import io

[473] Fix | Delete

>>> read_bytes1(io.BytesIO(b"\x00"))

[474] Fix | Delete

b''

[475] Fix | Delete

>>> read_bytes1(io.BytesIO(b"\x03abcdef"))

[476] Fix | Delete

b'abc'

[477] Fix | Delete

"""

[478] Fix | Delete

[479] Fix | Delete

n = read_uint1(f)

[480] Fix | Delete

assert n >= 0

[481] Fix | Delete

data = f.read(n)

[482] Fix | Delete

if len(data) == n:

[483] Fix | Delete

return data

[484] Fix | Delete

raise ValueError("expected %d bytes in a bytes1, but only %d remain" %

[485] Fix | Delete

(n, len(data)))

[486] Fix | Delete

[487] Fix | Delete

bytes1 = ArgumentDescriptor(

[488] Fix | Delete

name="bytes1",

[489] Fix | Delete

n=TAKEN_FROM_ARGUMENT1,

[490] Fix | Delete

reader=read_bytes1,

[491] Fix | Delete

doc="""A counted bytes string.

[492] Fix | Delete

[493] Fix | Delete

The first argument is a 1-byte unsigned int giving the number

[494] Fix | Delete

of bytes in the string, and the second argument is that many

[495] Fix | Delete

bytes.

[496] Fix | Delete

""")

[497] Fix | Delete

[498] Fix | Delete

[499] Fix | Delete

12 3 4 5 6