Regular Expressions... simplified???



RE - regular expression.  - extracted from:  man -s 7 regex | col -b > (this file) 

 A piece is an atom followed by a single '*', '+', '?', or bound.  

 An atom followed by '*' matches a sequence of 0 or more matches  (translated, doesn't have to match (i.e. 0))
 An atom followed by '+' matches a sequence of 1 or more matches of the atom.   (translated, MUST match)
 An atom followed by '?' matches a sequence of 0 or 1 matches of the atom.

 A bound is '{' number, ',' another number, always followed  by '}'.  between 0 and 255, in order.   e.g.   {2,85}

 An atom followed by a bound with one number i and no comma matches a sequence of exactly i matches of the atom. 

 An atom  followed by a bound with one number i and a comma matches a sequence of i or more matches of the atom. 
 An atom followed by a bound with two numbers i and j matches a sequence of i through j (inclusive) matches of the atom.

 An atom is an RE enclosed in "()" (the match for the RE),
 an empty set of "()" (matching the null string), 
 a bracket expression
 '.' (any single character), 
 '^' (null string beginning of line), 
 '$' (null string at the end of line), 
 a '\' followed by one of the characters "^.[$()|*+?{\" (escapes the special character meaning)
 a '\' followed by any other character is that character... 
 A '{' followed by a character is an ordinary character, not the beginning of a bound.  (if number it's a bound)
 It is illegal to end an RE with '\'.

 A bracket expression is a list of characters enclosed in "[]". 
  It normally matches any single character from the list.
  If the list begins with '^', it matches any single character not from the rest of the list. 
  If two characters in the list are separated by '-', means the full range between those two e.g "[0-9]" 

A bracket expression with a multicharacter collating element can thus match more than one character, 
e.g., if the collating sequence includes a "ch" collating element, then the RE "[[.ch.]]*c" matches the first five characters of "chchcc".

 Within a bracket expression, a collating element enclosed in "[=" and "=]" is an equivalence class, 
 standing for the sequences of characters of all collating elements equivalent to that one, including itself.  

(If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were "[." and ".]".) 
e.g, if o and ^ are the members of an equivalence class, then "[[=o=]]", "[[=^=]]", and "[o^]" are all synonymous. 
An equivalence class may not be an endpoint of a range.

 Within a bracket expression, the name of a character class enclosed in "[:" and ":]" stands for the list of all characters belonging to that
 class. Standard character class names are:   alnum  digit  punct alpha  graph  space blank  lower  upper cntrl  print  xdigit

 Match lengths are measured in characters, not collating elements. 
 A null string is considered longer than no match at all.  
e.g, "bb*" matches the three middle characters of "abbbc", "(wee|week)(knights|nights)" matches all ten characters of "weeknights", 
 when "(.*).*" is matched against "abc" the parenthesized subexpression matches all three characters, 
and when "(a*)*" is matched against "bc" both the whole RE  and the parenthesized subexpression match the null string.

 If case-independent matching is specified, the effect is much as if all case distinctions had vanished from the alphabet. When an alphabetic that
 exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression
 with both cases, for example, 'x' becomes "[xX]". When it appears inside a bracket expression, all case counterparts of it are added to the
 bracket expression, so that, for example, "[x]" becomes "[xX]" and "[^x]" becomes "[^xX]".

 Obsolete ("basic") regular expressions differ in several respects. 
'|', '+', and '?' are ordinary characters and there is no equivalent for their functionality. 

The delimiters for bounds are "\{" and "\}", with '{' and '}' by themselves ordinary characters.  

    \{ \}

The parentheses for nested sub-expressions are "\(" and "\)", with '(' and ')' by themselves ordinary characters. 
'^' is an ordinary character except at the beginning of the RE or the beginning of a parenthesized subexpression, 
'$' is an ordinary character except at the end of the RE or the end of  a parenthesized subexpression, and
'*' is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression (after a possible leading '^').

 Finally, there is one new type of atom, a back reference: '\' followed by a nonzero decimal digit d matches the same sequence of characters
 matched by the dth parenthesized subexpression (numbering subexpressions by the positions of their opening parentheses, left to right), so that,
 for example, "\([bc]\)\1" matches "bb" or "cc" but not "bc".