Search Syntax using GCC Regular Expression

Search Syntax using GCC Regular Expression

Regular expression searches provide a way to do simple or complex searches for strings that match a pattern or set of patterns (branches) separated by vertical bars "|". While a pattern can be built to look for a word or phrase, a simple pattern that consists of a word does not look for only that word but for any place the string of letters that make that word are found. A search for "right" will return verses that contain the word "right", but also "righteous", "righteousness", "unrighteous", "upright" and even "bright". A search for "hall not" is not a search for "hall" AND "not" but for the string "hall not" with a space between the second "l" and the "n". The search for "hall not" will find occurrences of "shall not".

The power of Regular Expressions is in the patterns (or templates) used to define a search. A pattern consists of ordinary characters and some special characters that are used and interpreted by a set of rules. Special characters include .\[^*$?+. Ordinary (or simple) characters are any characters that are not special. The backslash, "\", is used to convert special characters to ordinary and ordinary characters to special.

Example: the pattern "i. love\." will find sentences that end with "his love" or "in love" or "is love" followed by a period. The first period in "i. love\." is a special character that means allow any character in this position. The backslash in "i. love\." means that the period following it is not to be considered a special character, but is an ordinary period.

Rules for GCC Regular Expression Search Requests

  • . The period matches any character. See Details

  • * The asterisk matches 0 or more characters of the preceding: set, character or indicated character. See Details

  • + The plus sign matches 1 or more characters of the preceding: set, character or indicated character. See Details

  • ? The question mark matches 0 or 1 character of the preceding: set, character or indicated character. See Details

  • [ ] Square brackets match any one of the characters specified inside [ ]. See Details

  • ^ A caret as the first character inside [ ] means NOT. See Details

  • ^ A caret beginning a pattern anchors the beginning of a line. See Details

  • $ A dollar at the end of a pattern anchors the end of a line. See Details

  • | A vertical bar means logical OR. See Details

  • ( ) Parentheses enclose expressions for grouping. Not supported!

  • \ A backslash can be used prior to any special character to match that character. See Details

  • \ A backslash can be used prior to an ordinary character to make it a special character. See Details

The Period .

The Period "." will match any single character even a space or other non-alphabet character. s.t matches sit, set, sot, etc., which could be located in sitting, compasseth and sottish b..t matches boot, boat and beat foot.tool matches footstool and foot tool

The Asterisk *

The asterisk matches zero or more characters of the preceding: set, character or indicated character. Using a period asterisk combination ".*" after a commonly found pattern can cause the search to take a very long time, making the program seem to freeze. be*n matches beeen, been, ben, and bn which could locate Reuben and Shebna.

The Plus Sign +

The Plus Sign matches one or more characters of the preceding: set, character or indicated character. Using a period and plus sign combination ".+" after a commonly found pattern can cause the search to take a very long time, making the program seem to freeze. be+n matches beeen, been and ben, but not bn.

The Question Mark ?

The Question Mark matches zero or one character of the preceding: set, character or indicated character. be?n matches ben and bn but not been. trees? matches trees or tree.

The Square Brackets [ ]

The Square Brackets enclose a set of characters that can match. The period, asterisk, plus sign and question mark are not special inside the brackets. A minus sign can be used to indicate a range. If you want a caret "^" to be part of the range do not place it first after the left bracket or it will be a special character. To include a "]" in the set make it the first (or second after a special "^") character in the set. To include a minus sign in the set make it the first (or second after a special "^") or last character in the set. s[eia]t matches set, sit, and sat, but not sot. s[eia]+t matches as above but also, seat, seet, siet, etc. [a-d] matches a, b, c, or d. [A-Z] matches any uppercase letter. [.;:?!] matches ., ;, :, ?, or ! but not a comma. [ ]^-] matches ] or ^ or -

The Caret first in Square Brackets [^xxx

If the Caret is the first character after the left bracket it means NOT. s[^io]t matches set, sat, etc., but not sit and sot.

The Caret as Start of Line Anchor ^xxx

If the Caret is the first character in a pattern it anchors the pattern to the start of a line. Any match must be at the beginning of a line. Because of unfiltered formatting characters in some texts, this feature does not always work, but may if a few periods are placed after the caret to account for the formatting characters. ^In the beginning matches lines that start with "In the beginning". (May need to use: ^.....In the beginning)

The Dollar Sign as End of Line Anchor xxx$

If the Dollar Sign is the last character in a pattern it anchors the pattern to the end of a line. Any match must be at the end of a line. Because of unfiltered formatting characters in some texts, this feature does not always work, but may if a few periods are placed before the dollar sign to account for the formatting characters. Amen\.$ matches lines that end with "Amen." (May need to use Amen\....$, Amen\..........$, or even Amen\....................$)

The Vertical Bar |

The Vertical Bar between patterns means OR. John|Peter matches John or Peter. John .*Peter|Peter .*John matches John ... Peter or Peter ... John. (.* slows a search) pain|suffering|sorrow matches pain, or suffering, or sorrow.

The Parentheses ( )

The use of Parentheses ( ) is not supported!

The Backslash Prior to a Special Character \*

The Backslash prior to a special character indicates that the character is not being used in its special meaning, but is just to match itself. amen\. matches amen. but not ament and will not locate firmament.

The Backslash Prior to an Ordinary Character \s

The Backslash prior to an ordinary character indicates that the character is not being used to match itself, but has special meaning.

  • \b if use outside [ ] means word boundary. If used inside [ ] means backspace. \brighteous\b matches righteous but not unrighteous or righteousness

  • \B means non-word boundary. \Brighteous\B matches unrighteousness and unrighteously but not righteous, unrighteous or righteousness.

  • \d means digit; same as [0-9].

  • \D means non-digit, same as [^0-9].

  • \s means space.

  • \S means not a space.

  • \w means alphanumeric; same as [a-zA-Z0-9_].

  • \W means not alphanumeric; same as [^a-zA-Z0-9_].

KDE Logo