Edit File by line
/home/barbar84/www/wp-conte.../plugins/sujqvwi/AnonR/smanonr..../lib64/perl5/CORE
File: regcomp.h
/* regcomp.h
[0] Fix | Delete
*
[1] Fix | Delete
* Copyright (C) 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
[2] Fix | Delete
* 2000, 2001, 2002, 2003, 2005, 2006, 2007, by Larry Wall and others
[3] Fix | Delete
*
[4] Fix | Delete
* You may distribute under the terms of either the GNU General Public
[5] Fix | Delete
* License or the Artistic License, as specified in the README file.
[6] Fix | Delete
*
[7] Fix | Delete
*/
[8] Fix | Delete
#include "regcharclass.h"
[9] Fix | Delete
[10] Fix | Delete
/* Convert branch sequences to more efficient trie ops? */
[11] Fix | Delete
#define PERL_ENABLE_TRIE_OPTIMISATION 1
[12] Fix | Delete
[13] Fix | Delete
/* Be really aggressive about optimising patterns with trie sequences? */
[14] Fix | Delete
#define PERL_ENABLE_EXTENDED_TRIE_OPTIMISATION 1
[15] Fix | Delete
[16] Fix | Delete
/* Should the optimiser take positive assertions into account? */
[17] Fix | Delete
#define PERL_ENABLE_POSITIVE_ASSERTION_STUDY 0
[18] Fix | Delete
[19] Fix | Delete
/* Not for production use: */
[20] Fix | Delete
#define PERL_ENABLE_EXPERIMENTAL_REGEX_OPTIMISATIONS 0
[21] Fix | Delete
[22] Fix | Delete
/* Activate offsets code - set to if 1 to enable */
[23] Fix | Delete
#ifdef DEBUGGING
[24] Fix | Delete
#define RE_TRACK_PATTERN_OFFSETS
[25] Fix | Delete
#endif
[26] Fix | Delete
[27] Fix | Delete
/*
[28] Fix | Delete
* The "internal use only" fields in regexp.h are present to pass info from
[29] Fix | Delete
* compile to execute that permits the execute phase to run lots faster on
[30] Fix | Delete
* simple cases. They are:
[31] Fix | Delete
*
[32] Fix | Delete
* regstart sv that must begin a match; NULL if none obvious
[33] Fix | Delete
* reganch is the match anchored (at beginning-of-line only)?
[34] Fix | Delete
* regmust string (pointer into program) that match must include, or NULL
[35] Fix | Delete
* [regmust changed to SV* for bminstr()--law]
[36] Fix | Delete
* regmlen length of regmust string
[37] Fix | Delete
* [regmlen not used currently]
[38] Fix | Delete
*
[39] Fix | Delete
* Regstart and reganch permit very fast decisions on suitable starting points
[40] Fix | Delete
* for a match, cutting down the work a lot. Regmust permits fast rejection
[41] Fix | Delete
* of lines that cannot possibly match. The regmust tests are costly enough
[42] Fix | Delete
* that pregcomp() supplies a regmust only if the r.e. contains something
[43] Fix | Delete
* potentially expensive (at present, the only such thing detected is * or +
[44] Fix | Delete
* at the start of the r.e., which can involve a lot of backup). Regmlen is
[45] Fix | Delete
* supplied because the test in pregexec() needs it and pregcomp() is computing
[46] Fix | Delete
* it anyway.
[47] Fix | Delete
* [regmust is now supplied always. The tests that use regmust have a
[48] Fix | Delete
* heuristic that disables the test if it usually matches.]
[49] Fix | Delete
*
[50] Fix | Delete
* [In fact, we now use regmust in many cases to locate where the search
[51] Fix | Delete
* starts in the string, so if regback is >= 0, the regmust search is never
[52] Fix | Delete
* wasted effort. The regback variable says how many characters back from
[53] Fix | Delete
* where regmust matched is the earliest possible start of the match.
[54] Fix | Delete
* For instance, /[a-z].foo/ has a regmust of 'foo' and a regback of 2.]
[55] Fix | Delete
*/
[56] Fix | Delete
[57] Fix | Delete
/*
[58] Fix | Delete
* Structure for regexp "program". This is essentially a linear encoding
[59] Fix | Delete
* of a nondeterministic finite-state machine (aka syntax charts or
[60] Fix | Delete
* "railroad normal form" in parsing technology). Each node is an opcode
[61] Fix | Delete
* plus a "next" pointer, possibly plus an operand. "Next" pointers of
[62] Fix | Delete
* all nodes except BRANCH implement concatenation; a "next" pointer with
[63] Fix | Delete
* a BRANCH on both ends of it is connecting two alternatives. (Here we
[64] Fix | Delete
* have one of the subtle syntax dependencies: an individual BRANCH (as
[65] Fix | Delete
* opposed to a collection of them) is never concatenated with anything
[66] Fix | Delete
* because of operator precedence.) The operand of some types of node is
[67] Fix | Delete
* a literal string; for others, it is a node leading into a sub-FSM. In
[68] Fix | Delete
* particular, the operand of a BRANCH node is the first node of the branch.
[69] Fix | Delete
* (NB this is *not* a tree structure: the tail of the branch connects
[70] Fix | Delete
* to the thing following the set of BRANCHes.) The opcodes are defined
[71] Fix | Delete
* in regnodes.h which is generated from regcomp.sym by regcomp.pl.
[72] Fix | Delete
*/
[73] Fix | Delete
[74] Fix | Delete
/*
[75] Fix | Delete
* A node is one char of opcode followed by two chars of "next" pointer.
[76] Fix | Delete
* "Next" pointers are stored as two 8-bit pieces, high order first. The
[77] Fix | Delete
* value is a positive offset from the opcode of the node containing it.
[78] Fix | Delete
* An operand, if any, simply follows the node. (Note that much of the
[79] Fix | Delete
* code generation knows about this implicit relationship.)
[80] Fix | Delete
*
[81] Fix | Delete
* Using two bytes for the "next" pointer is vast overkill for most things,
[82] Fix | Delete
* but allows patterns to get big without disasters.
[83] Fix | Delete
*
[84] Fix | Delete
* [The "next" pointer is always aligned on an even
[85] Fix | Delete
* boundary, and reads the offset directly as a short.]
[86] Fix | Delete
*/
[87] Fix | Delete
[88] Fix | Delete
/* This is the stuff that used to live in regexp.h that was truly
[89] Fix | Delete
private to the engine itself. It now lives here. */
[90] Fix | Delete
[91] Fix | Delete
typedef struct regexp_internal {
[92] Fix | Delete
int name_list_idx; /* Optional data index of an array of paren names */
[93] Fix | Delete
union {
[94] Fix | Delete
U32 *offsets; /* offset annotations 20001228 MJD
[95] Fix | Delete
data about mapping the program to the
[96] Fix | Delete
string -
[97] Fix | Delete
offsets[0] is proglen when this is used
[98] Fix | Delete
*/
[99] Fix | Delete
U32 proglen;
[100] Fix | Delete
} u;
[101] Fix | Delete
[102] Fix | Delete
regnode *regstclass; /* Optional startclass as identified or constructed
[103] Fix | Delete
by the optimiser */
[104] Fix | Delete
struct reg_data *data; /* Additional miscellaneous data used by the program.
[105] Fix | Delete
Used to make it easier to clone and free arbitrary
[106] Fix | Delete
data that the regops need. Often the ARG field of
[107] Fix | Delete
a regop is an index into this structure */
[108] Fix | Delete
struct reg_code_blocks *code_blocks;/* positions of literal (?{}) */
[109] Fix | Delete
regnode program[1]; /* Unwarranted chumminess with compiler. */
[110] Fix | Delete
} regexp_internal;
[111] Fix | Delete
[112] Fix | Delete
#define RXi_SET(x,y) (x)->pprivate = (void*)(y)
[113] Fix | Delete
#define RXi_GET(x) ((regexp_internal *)((x)->pprivate))
[114] Fix | Delete
#define RXi_GET_DECL(r,ri) regexp_internal *ri = RXi_GET(r)
[115] Fix | Delete
/*
[116] Fix | Delete
* Flags stored in regexp->intflags
[117] Fix | Delete
* These are used only internally to the regexp engine
[118] Fix | Delete
*
[119] Fix | Delete
* See regexp.h for flags used externally to the regexp engine
[120] Fix | Delete
*/
[121] Fix | Delete
#define RXp_INTFLAGS(rx) ((rx)->intflags)
[122] Fix | Delete
#define RX_INTFLAGS(prog) RXp_INTFLAGS(ReANY(prog))
[123] Fix | Delete
[124] Fix | Delete
#define PREGf_SKIP 0x00000001
[125] Fix | Delete
#define PREGf_IMPLICIT 0x00000002 /* Converted .* to ^.* */
[126] Fix | Delete
#define PREGf_NAUGHTY 0x00000004 /* how exponential is this pattern? */
[127] Fix | Delete
#define PREGf_VERBARG_SEEN 0x00000008
[128] Fix | Delete
#define PREGf_CUTGROUP_SEEN 0x00000010
[129] Fix | Delete
#define PREGf_USE_RE_EVAL 0x00000020 /* compiled with "use re 'eval'" */
[130] Fix | Delete
/* these used to be extflags, but are now intflags */
[131] Fix | Delete
#define PREGf_NOSCAN 0x00000040
[132] Fix | Delete
/* spare */
[133] Fix | Delete
#define PREGf_GPOS_SEEN 0x00000100
[134] Fix | Delete
#define PREGf_GPOS_FLOAT 0x00000200
[135] Fix | Delete
[136] Fix | Delete
#define PREGf_ANCH_MBOL 0x00000400
[137] Fix | Delete
#define PREGf_ANCH_SBOL 0x00000800
[138] Fix | Delete
#define PREGf_ANCH_GPOS 0x00001000
[139] Fix | Delete
#define PREGf_RECURSE_SEEN 0x00002000
[140] Fix | Delete
[141] Fix | Delete
#define PREGf_ANCH \
[142] Fix | Delete
( PREGf_ANCH_SBOL | PREGf_ANCH_GPOS | PREGf_ANCH_MBOL )
[143] Fix | Delete
[144] Fix | Delete
/* this is where the old regcomp.h started */
[145] Fix | Delete
[146] Fix | Delete
struct regnode_string {
[147] Fix | Delete
U8 str_len;
[148] Fix | Delete
U8 type;
[149] Fix | Delete
U16 next_off;
[150] Fix | Delete
char string[1];
[151] Fix | Delete
};
[152] Fix | Delete
[153] Fix | Delete
/* Argument bearing node - workhorse,
[154] Fix | Delete
arg1 is often for the data field */
[155] Fix | Delete
struct regnode_1 {
[156] Fix | Delete
U8 flags;
[157] Fix | Delete
U8 type;
[158] Fix | Delete
U16 next_off;
[159] Fix | Delete
U32 arg1;
[160] Fix | Delete
};
[161] Fix | Delete
[162] Fix | Delete
/* Similar to a regnode_1 but with an extra signed argument */
[163] Fix | Delete
struct regnode_2L {
[164] Fix | Delete
U8 flags;
[165] Fix | Delete
U8 type;
[166] Fix | Delete
U16 next_off;
[167] Fix | Delete
U32 arg1;
[168] Fix | Delete
I32 arg2;
[169] Fix | Delete
};
[170] Fix | Delete
[171] Fix | Delete
/* 'Two field' -- Two 16 bit unsigned args */
[172] Fix | Delete
struct regnode_2 {
[173] Fix | Delete
U8 flags;
[174] Fix | Delete
U8 type;
[175] Fix | Delete
U16 next_off;
[176] Fix | Delete
U16 arg1;
[177] Fix | Delete
U16 arg2;
[178] Fix | Delete
};
[179] Fix | Delete
[180] Fix | Delete
/* This give the number of code points that can be in the bitmap of an ANYOF
[181] Fix | Delete
* node. The shift number must currently be one of: 8..12. It can't be less
[182] Fix | Delete
* than 8 (256) because some code relies on it being at least that. Above 12
[183] Fix | Delete
* (4096), and you start running into warnings that some data structure widths
[184] Fix | Delete
* have been exceeded, though the test suite as of this writing still passes
[185] Fix | Delete
* for up through 16, which is as high as anyone would ever want to go,
[186] Fix | Delete
* encompassing all of the Unicode BMP, and thus including all the economically
[187] Fix | Delete
* important world scripts. At 12 most of them are: including Arabic,
[188] Fix | Delete
* Cyrillic, Greek, Hebrew, Indian subcontinent, Latin, and Thai; but not Han,
[189] Fix | Delete
* Japanese, nor Korean. (The regarglen structure in regnodes.h is a U8, and
[190] Fix | Delete
* the trie types TRIEC and AHOCORASICKC are larger than U8 for shift values
[191] Fix | Delete
* below above 12.) Be sure to benchmark before changing, as larger sizes do
[192] Fix | Delete
* significantly slow down the test suite */
[193] Fix | Delete
#define NUM_ANYOF_CODE_POINTS (1 << 8)
[194] Fix | Delete
[195] Fix | Delete
#define ANYOF_BITMAP_SIZE (NUM_ANYOF_CODE_POINTS / 8) /* 8 bits/Byte */
[196] Fix | Delete
[197] Fix | Delete
/* Note that these form structs which are supersets of the next smaller one, by
[198] Fix | Delete
* appending fields. Alignment problems can occur if one of those optional
[199] Fix | Delete
* fields requires stricter alignment than the base struct. And formal
[200] Fix | Delete
* parameters that can really be two or more of the structs should be
[201] Fix | Delete
* declared as the smallest one it could be. See commit message for
[202] Fix | Delete
* 7dcac5f6a5195002b55c935ee1d67f67e1df280b. Regnode allocation is done
[203] Fix | Delete
* without regard to alignment, and changing it to would also require changing
[204] Fix | Delete
* the code that inserts and deletes regnodes. The basic single-argument
[205] Fix | Delete
* regnode has a U32, which is what reganode() allocates as a unit. Therefore
[206] Fix | Delete
* no field can require stricter alignment than U32. */
[207] Fix | Delete
[208] Fix | Delete
/* also used by trie */
[209] Fix | Delete
struct regnode_charclass {
[210] Fix | Delete
U8 flags;
[211] Fix | Delete
U8 type;
[212] Fix | Delete
U16 next_off;
[213] Fix | Delete
U32 arg1; /* set by set_ANYOF_arg() */
[214] Fix | Delete
char bitmap[ANYOF_BITMAP_SIZE]; /* only compile-time */
[215] Fix | Delete
};
[216] Fix | Delete
[217] Fix | Delete
/* has runtime (locale) \d, \w, ..., [:posix:] classes */
[218] Fix | Delete
struct regnode_charclass_class {
[219] Fix | Delete
U8 flags; /* ANYOF_MATCHES_POSIXL bit must go here */
[220] Fix | Delete
U8 type;
[221] Fix | Delete
U16 next_off;
[222] Fix | Delete
U32 arg1;
[223] Fix | Delete
char bitmap[ANYOF_BITMAP_SIZE]; /* both compile-time ... */
[224] Fix | Delete
U32 classflags; /* and run-time */
[225] Fix | Delete
};
[226] Fix | Delete
[227] Fix | Delete
/* A synthetic start class (SSC); is a regnode_charclass_posixl_fold, plus an
[228] Fix | Delete
* extra SV*, used only during its construction and which is not used by
[229] Fix | Delete
* regexec.c. Note that the 'next_off' field is unused, as the SSC stands
[230] Fix | Delete
* alone, so there is never a next node. Also, there is no alignment issue,
[231] Fix | Delete
* because these are declared or allocated as a complete unit so the compiler
[232] Fix | Delete
* takes care of alignment. This is unlike the other regnodes which are
[233] Fix | Delete
* allocated in terms of multiples of a single-argument regnode. SSC nodes can
[234] Fix | Delete
* have a pointer field because there is no alignment issue, and because it is
[235] Fix | Delete
* set to NULL after construction, before any cloning of the pattern */
[236] Fix | Delete
struct regnode_ssc {
[237] Fix | Delete
U8 flags; /* ANYOF_MATCHES_POSIXL bit must go here */
[238] Fix | Delete
U8 type;
[239] Fix | Delete
U16 next_off;
[240] Fix | Delete
U32 arg1;
[241] Fix | Delete
char bitmap[ANYOF_BITMAP_SIZE]; /* both compile-time ... */
[242] Fix | Delete
U32 classflags; /* ... and run-time */
[243] Fix | Delete
[244] Fix | Delete
/* Auxiliary, only used during construction; NULL afterwards: list of code
[245] Fix | Delete
* points matched */
[246] Fix | Delete
SV* invlist;
[247] Fix | Delete
};
[248] Fix | Delete
[249] Fix | Delete
/* We take advantage of 'next_off' not otherwise being used in the SSC by
[250] Fix | Delete
* actually using it: by setting it to 1. This allows us to test and
[251] Fix | Delete
* distinguish between an SSC and other ANYOF node types, as 'next_off' cannot
[252] Fix | Delete
* otherwise be 1, because it is the offset to the next regnode expressed in
[253] Fix | Delete
* units of regnodes. Since an ANYOF node contains extra fields, it adds up
[254] Fix | Delete
* to 12 regnode units on 32-bit systems, (hence the minimum this can be (if
[255] Fix | Delete
* not 0) is 11 there. Even if things get tightly packed on a 64-bit system,
[256] Fix | Delete
* it still would be more than 1. */
[257] Fix | Delete
#define set_ANYOF_SYNTHETIC(n) STMT_START{ OP(n) = ANYOF; \
[258] Fix | Delete
NEXT_OFF(n) = 1; \
[259] Fix | Delete
} STMT_END
[260] Fix | Delete
#define is_ANYOF_SYNTHETIC(n) (PL_regkind[OP(n)] == ANYOF && NEXT_OFF(n) == 1)
[261] Fix | Delete
[262] Fix | Delete
/* XXX fix this description.
[263] Fix | Delete
Impose a limit of REG_INFTY on various pattern matching operations
[264] Fix | Delete
to limit stack growth and to avoid "infinite" recursions.
[265] Fix | Delete
*/
[266] Fix | Delete
/* The default size for REG_INFTY is I16_MAX, which is the same as
[267] Fix | Delete
SHORT_MAX (see perl.h). Unfortunately I16 isn't necessarily 16 bits
[268] Fix | Delete
(see handy.h). On the Cray C90, sizeof(short)==4 and hence I16_MAX is
[269] Fix | Delete
((1<<31)-1), while on the Cray T90, sizeof(short)==8 and I16_MAX is
[270] Fix | Delete
((1<<63)-1). To limit stack growth to reasonable sizes, supply a
[271] Fix | Delete
smaller default.
[272] Fix | Delete
--Andy Dougherty 11 June 1998
[273] Fix | Delete
*/
[274] Fix | Delete
#if SHORTSIZE > 2
[275] Fix | Delete
# ifndef REG_INFTY
[276] Fix | Delete
# define REG_INFTY ((1<<15)-1)
[277] Fix | Delete
# endif
[278] Fix | Delete
#endif
[279] Fix | Delete
[280] Fix | Delete
#ifndef REG_INFTY
[281] Fix | Delete
# define REG_INFTY I16_MAX
[282] Fix | Delete
#endif
[283] Fix | Delete
[284] Fix | Delete
#define ARG_VALUE(arg) (arg)
[285] Fix | Delete
#define ARG__SET(arg,val) ((arg) = (val))
[286] Fix | Delete
[287] Fix | Delete
#undef ARG
[288] Fix | Delete
#undef ARG1
[289] Fix | Delete
#undef ARG2
[290] Fix | Delete
[291] Fix | Delete
#define ARG(p) ARG_VALUE(ARG_LOC(p))
[292] Fix | Delete
#define ARG1(p) ARG_VALUE(ARG1_LOC(p))
[293] Fix | Delete
#define ARG2(p) ARG_VALUE(ARG2_LOC(p))
[294] Fix | Delete
#define ARG2L(p) ARG_VALUE(ARG2L_LOC(p))
[295] Fix | Delete
[296] Fix | Delete
#define ARG_SET(p, val) ARG__SET(ARG_LOC(p), (val))
[297] Fix | Delete
#define ARG1_SET(p, val) ARG__SET(ARG1_LOC(p), (val))
[298] Fix | Delete
#define ARG2_SET(p, val) ARG__SET(ARG2_LOC(p), (val))
[299] Fix | Delete
#define ARG2L_SET(p, val) ARG__SET(ARG2L_LOC(p), (val))
[300] Fix | Delete
[301] Fix | Delete
#undef NEXT_OFF
[302] Fix | Delete
#undef NODE_ALIGN
[303] Fix | Delete
[304] Fix | Delete
#define NEXT_OFF(p) ((p)->next_off)
[305] Fix | Delete
#define NODE_ALIGN(node)
[306] Fix | Delete
/* the following define was set to 0xde in 075abff3
[307] Fix | Delete
* as part of some linting logic. I have set it to 0
[308] Fix | Delete
* as otherwise in every place where we /might/ set flags
[309] Fix | Delete
* we have to set it 0 explicitly, which duplicates
[310] Fix | Delete
* assignments and IMO adds an unacceptable level of
[311] Fix | Delete
* surprise to working in the regex engine. If this
[312] Fix | Delete
* is changed from 0 then at the very least make sure
[313] Fix | Delete
* that SBOL for /^/ sets the flags to 0 explicitly.
[314] Fix | Delete
* -- Yves */
[315] Fix | Delete
#define NODE_ALIGN_FILL(node) ((node)->flags = 0)
[316] Fix | Delete
[317] Fix | Delete
#define SIZE_ALIGN NODE_ALIGN
[318] Fix | Delete
[319] Fix | Delete
#undef OP
[320] Fix | Delete
#undef OPERAND
[321] Fix | Delete
#undef MASK
[322] Fix | Delete
#undef STRING
[323] Fix | Delete
[324] Fix | Delete
#define OP(p) ((p)->type)
[325] Fix | Delete
#define FLAGS(p) ((p)->flags) /* Caution: Doesn't apply to all \
[326] Fix | Delete
regnode types. For some, it's the \
[327] Fix | Delete
character set of the regnode */
[328] Fix | Delete
#define OPERAND(p) (((struct regnode_string *)p)->string)
[329] Fix | Delete
#define MASK(p) ((char*)OPERAND(p))
[330] Fix | Delete
#define STR_LEN(p) (((struct regnode_string *)p)->str_len)
[331] Fix | Delete
#define STRING(p) (((struct regnode_string *)p)->string)
[332] Fix | Delete
#define STR_SZ(l) ((l + sizeof(regnode) - 1) / sizeof(regnode))
[333] Fix | Delete
#define NODE_SZ_STR(p) (STR_SZ(STR_LEN(p))+1)
[334] Fix | Delete
[335] Fix | Delete
#undef NODE_ALIGN
[336] Fix | Delete
#undef ARG_LOC
[337] Fix | Delete
#undef NEXTOPER
[338] Fix | Delete
#undef PREVOPER
[339] Fix | Delete
[340] Fix | Delete
#define NODE_ALIGN(node)
[341] Fix | Delete
#define ARG_LOC(p) (((struct regnode_1 *)p)->arg1)
[342] Fix | Delete
#define ARG1_LOC(p) (((struct regnode_2 *)p)->arg1)
[343] Fix | Delete
#define ARG2_LOC(p) (((struct regnode_2 *)p)->arg2)
[344] Fix | Delete
#define ARG2L_LOC(p) (((struct regnode_2L *)p)->arg2)
[345] Fix | Delete
[346] Fix | Delete
#define NODE_STEP_REGNODE 1 /* sizeof(regnode)/sizeof(regnode) */
[347] Fix | Delete
#define EXTRA_STEP_2ARGS EXTRA_SIZE(struct regnode_2)
[348] Fix | Delete
[349] Fix | Delete
#define NODE_STEP_B 4
[350] Fix | Delete
[351] Fix | Delete
#define NEXTOPER(p) ((p) + NODE_STEP_REGNODE)
[352] Fix | Delete
#define PREVOPER(p) ((p) - NODE_STEP_REGNODE)
[353] Fix | Delete
[354] Fix | Delete
#define FILL_ADVANCE_NODE(ptr, op) STMT_START { \
[355] Fix | Delete
(ptr)->type = op; (ptr)->next_off = 0; (ptr)++; } STMT_END
[356] Fix | Delete
#define FILL_ADVANCE_NODE_ARG(ptr, op, arg) STMT_START { \
[357] Fix | Delete
ARG_SET(ptr, arg); FILL_ADVANCE_NODE(ptr, op); (ptr) += 1; } STMT_END
[358] Fix | Delete
#define FILL_ADVANCE_NODE_2L_ARG(ptr, op, arg1, arg2) \
[359] Fix | Delete
STMT_START { \
[360] Fix | Delete
ARG_SET(ptr, arg1); \
[361] Fix | Delete
ARG2L_SET(ptr, arg2); \
[362] Fix | Delete
FILL_ADVANCE_NODE(ptr, op); \
[363] Fix | Delete
(ptr) += 2; \
[364] Fix | Delete
} STMT_END
[365] Fix | Delete
[366] Fix | Delete
#define REG_MAGIC 0234
[367] Fix | Delete
[368] Fix | Delete
#define SIZE_ONLY cBOOL(RExC_emit == (regnode *) & RExC_emit_dummy)
[369] Fix | Delete
#define PASS1 SIZE_ONLY
[370] Fix | Delete
#define PASS2 (! SIZE_ONLY)
[371] Fix | Delete
[372] Fix | Delete
/* An ANYOF node is basically a bitmap with the index being a code point. If
[373] Fix | Delete
* the bit for that code point is 1, the code point matches; if 0, it doesn't
[374] Fix | Delete
* match (complemented if inverted). There is an additional mechanism to deal
[375] Fix | Delete
* with cases where the bitmap is insufficient in and of itself. This #define
[376] Fix | Delete
* indicates if the bitmap does fully represent what this ANYOF node can match.
[377] Fix | Delete
* The ARG is set to this special value (since 0, 1, ... are legal, but will
[378] Fix | Delete
* never reach this high). */
[379] Fix | Delete
#define ANYOF_ONLY_HAS_BITMAP ((U32) -1)
[380] Fix | Delete
[381] Fix | Delete
/* When the bimap isn't completely sufficient for handling the ANYOF node,
[382] Fix | Delete
* flags (in node->flags of the ANYOF node) get set to indicate this. These
[383] Fix | Delete
* are perennially in short supply. Beyond several cases where warnings need
[384] Fix | Delete
* to be raised under certain circumstances, currently, there are six cases
[385] Fix | Delete
* where the bitmap alone isn't sufficient. We could use six flags to
[386] Fix | Delete
* represent the 6 cases, but to save flags bits, we play some games. The
[387] Fix | Delete
* cases are:
[388] Fix | Delete
*
[389] Fix | Delete
* 1) The bitmap has a compiled-in very finite size. So something else needs
[390] Fix | Delete
* to be used to specify if a code point that is too large for the bitmap
[391] Fix | Delete
* actually matches. The mechanism currently is a swash or inversion
[392] Fix | Delete
* list. ANYOF_ONLY_HAS_BITMAP, described above, being TRUE indicates
[393] Fix | Delete
* there are no matches of too-large code points. But if it is FALSE,
[394] Fix | Delete
* then almost certainly there are matches too large for the bitmap. (The
[395] Fix | Delete
* other cases, described below, either imply this one or are extremely
[396] Fix | Delete
* rare in practice.) So we can just assume that a too-large code point
[397] Fix | Delete
* will need something beyond the bitmap if ANYOF_ONLY_HAS_BITMAP is
[398] Fix | Delete
* FALSE, instead of having a separate flag for this.
[399] Fix | Delete
* 2) A subset of item 1) is if all possible code points outside the bitmap
[400] Fix | Delete
* match. This is a common occurrence when the class is complemented,
[401] Fix | Delete
* like /[^ij]/. Therefore a bit is reserved to indicate this,
[402] Fix | Delete
* rather than having an expensive swash created,
[403] Fix | Delete
* ANYOF_MATCHES_ALL_ABOVE_BITMAP.
[404] Fix | Delete
* 3) Under /d rules, it can happen that code points that are in the upper
[405] Fix | Delete
* latin1 range (\x80-\xFF or their equivalents on EBCDIC platforms) match
[406] Fix | Delete
* only if the runtime target string being matched against is UTF-8. For
[407] Fix | Delete
* example /[\w[:punct:]]/d. This happens only for posix classes (with a
[408] Fix | Delete
* couple of exceptions, like \d where it doesn't happen), and all such
[409] Fix | Delete
* ones also have above-bitmap matches. Thus, 3) implies 1) as well.
[410] Fix | Delete
* Note that /d rules are no longer encouraged; 'use 5.14' or higher
[411] Fix | Delete
* deselects them. But a flag is required so that they can be properly
[412] Fix | Delete
* handled. But it can be a shared flag: see 5) below.
[413] Fix | Delete
* 4) Also under /d rules, something like /[\Wfoo]/ will match everything in
[414] Fix | Delete
* the \x80-\xFF range, unless the string being matched against is UTF-8.
[415] Fix | Delete
* A swash could be created for this case, but this is relatively common,
[416] Fix | Delete
* and it turns out that it's all or nothing: if any one of these code
[417] Fix | Delete
* points matches, they all do. Hence a single bit suffices. We use a
[418] Fix | Delete
* shared flag that doesn't take up space by itself:
[419] Fix | Delete
* ANYOF_SHARED_d_MATCHES_ALL_NON_UTF8_NON_ASCII_non_d_WARN_SUPER.
[420] Fix | Delete
* This also implies 1), with one exception: [:^cntrl:].
[421] Fix | Delete
* 5) A user-defined \p{} property may not have been defined by the time the
[422] Fix | Delete
* regex is compiled. In this case, we don't know until runtime what it
[423] Fix | Delete
* will match, so we have to assume it could match anything, including
[424] Fix | Delete
* code points that ordinarily would be in the bitmap. A flag bit is
[425] Fix | Delete
* necessary to indicate this , though it can be shared with the item 3)
[426] Fix | Delete
* flag, as that only occurs under /d, and this only occurs under non-d.
[427] Fix | Delete
* This case is quite uncommon in the field, and the /(?[ ...])/ construct
[428] Fix | Delete
* is a better way to accomplish what this feature does. This case also
[429] Fix | Delete
* implies 1).
[430] Fix | Delete
* ANYOF_SHARED_d_UPPER_LATIN1_UTF8_STRING_MATCHES_non_d_RUNTIME_USER_PROP
[431] Fix | Delete
* is the shared flag.
[432] Fix | Delete
* 6) /[foo]/il may have folds that are only valid if the runtime locale is a
[433] Fix | Delete
* UTF-8 one. These are quite rare, so it would be good to avoid the
[434] Fix | Delete
* expense of looking for them. But /l matching is slow anyway, and we've
[435] Fix | Delete
* traditionally not worried too much about its performance. And this
[436] Fix | Delete
* condition requires the ANYOFL_FOLD flag to be set, so testing for
[437] Fix | Delete
* that flag would be sufficient to rule out most cases of this. So it is
[438] Fix | Delete
* unclear if this should have a flag or not. But, this flag can be
[439] Fix | Delete
* shared with another, so it doesn't occupy extra space.
[440] Fix | Delete
*
[441] Fix | Delete
* At the moment, there is one spare bit, but this could be increased by
[442] Fix | Delete
* various tricks.
[443] Fix | Delete
*
[444] Fix | Delete
* If just one more bit is needed, at this writing it seems to khw that the
[445] Fix | Delete
* best choice would be to make ANYOF_MATCHES_ALL_ABOVE_BITMAP not a flag, but
[446] Fix | Delete
* something like
[447] Fix | Delete
*
[448] Fix | Delete
* #define ANYOF_MATCHES_ALL_ABOVE_BITMAP ((U32) -2)
[449] Fix | Delete
*
[450] Fix | Delete
* and access it through the ARG like ANYOF_ONLY_HAS_BITMAP is. This flag is
[451] Fix | Delete
* used by all ANYOF node types, and it could be used to avoid calling the
[452] Fix | Delete
* handler function, as the macro REGINCLASS in regexec.c does now for other
[453] Fix | Delete
* cases.
[454] Fix | Delete
*
[455] Fix | Delete
* Another possibility is to instead (or additionally) rename the ANYOF_POSIXL
[456] Fix | Delete
* flag to be ANYOFL_LARGE, to mean that the ANYOF node has an extra 32 bits
[457] Fix | Delete
* beyond what a regular one does. That's what it effectively means now, with
[458] Fix | Delete
* the extra space all for the POSIX class flags. But those classes actually
[459] Fix | Delete
* only occupy 30 bits, so the ANYOFL_FOLD and
[460] Fix | Delete
* ANYOFL_SHARED_UTF8_LOCALE_fold_HAS_MATCHES_nonfold_REQD flags could be moved
[461] Fix | Delete
* to that extra space. The 30 bits in the extra word would indicate if a
[462] Fix | Delete
* posix class should be looked up or not. The downside of this is that ANYOFL
[463] Fix | Delete
* nodes with folding would always have to have the extra space allocated, even
[464] Fix | Delete
* if they didn't use the 30 posix bits. There isn't an SSC problem as all
[465] Fix | Delete
* SSCs are this large anyway.
[466] Fix | Delete
*
[467] Fix | Delete
* One could completely remove ANYOFL_LARGE and make all ANYOFL nodes large.
[468] Fix | Delete
* REGINCLASS would have to be modified so that if the node type were this, it
[469] Fix | Delete
* would call reginclass(), as the flag bit that indicates to do this now would
[470] Fix | Delete
* be gone.
[471] Fix | Delete
*
[472] Fix | Delete
* All told, 5 bits could be available for other uses if all of the above were
[473] Fix | Delete
* done.
[474] Fix | Delete
*
[475] Fix | Delete
* Some flags are not used in synthetic start class (SSC) nodes, so could be
[476] Fix | Delete
* shared should new flags be needed for SSCs, like SSC_MATCHES_EMPTY_STRING
[477] Fix | Delete
* now. */
[478] Fix | Delete
[479] Fix | Delete
/* If this is set, the result of the match should be complemented. regexec.c
[480] Fix | Delete
* is expecting this to be in the low bit. Never in an SSC */
[481] Fix | Delete
#define ANYOF_INVERT 0x01
[482] Fix | Delete
[483] Fix | Delete
/* For the SSC node only, which cannot be inverted, so is shared with that bit.
[484] Fix | Delete
* This is used only during regex compilation. */
[485] Fix | Delete
#define SSC_MATCHES_EMPTY_STRING ANYOF_INVERT
[486] Fix | Delete
[487] Fix | Delete
/* Set if this is a regnode_charclass_posixl vs a regnode_charclass. This
[488] Fix | Delete
* is used for runtime \d, \w, [:posix:], ..., which are used only in locale
[489] Fix | Delete
* and the optimizer's synthetic start class. Non-locale \d, etc are resolved
[490] Fix | Delete
* at compile-time. Only set under /l; can be in SSC */
[491] Fix | Delete
#define ANYOF_MATCHES_POSIXL 0x02
[492] Fix | Delete
[493] Fix | Delete
/* The fold is calculated and stored in the bitmap where possible at compile
[494] Fix | Delete
* time. However under locale, the actual folding varies depending on
[495] Fix | Delete
* what the locale is at the time of execution, so it has to be deferred until
[496] Fix | Delete
* then. Only set under /l; never in an SSC */
[497] Fix | Delete
#define ANYOFL_FOLD 0x04
[498] Fix | Delete
[499] Fix | Delete
It is recommended that you Edit text format, this type of Fix handles quite a lot in one request
Function