Mosr's Scripts

Perl Regular Expression pattern matching (aka RegEx)

RegEx in the confines of Achaea is used for pattern matching in clients like Zmud, Cmud, MUSHclient, Mudlet and more. It is an extremely powerful tool for this. This is a tutorial for this type of pattern matching.

Wildcards, Anchors, and Escaping





There are a few major symbols in RegEx to keep an eye out for.

--this tells the client to only match the pattern if it is at the very beginning of the line.
--this tells the client to only match the pattern if it is at the very end of the line.
--this will match a single non-whitespace character or string (a letter). (wildcard)
--this will match a single non-whitespace numeric digits (a digit). (wildcard)
--this matches a single whitespace characters (a space). (wildcard)
--this will match pretty much anything. (wildcard)
--this is the escape character, it tells the client that it whatever symbol that is in front
  of HAS to be there (only when used outside of wildcards).

--Sets a capture group (to be explained later). 


All of those matching symbols are great and all, but eventually you'll want to match more than just 1 letter, number, or character. That is where quantifiers come in handy. They communicate with the preceding wildcard and tell them all sorts of stuff.

 "+"  -- This means 1 or more. So, \w+ would mean 1 or more letters (a word)

 "*"  -- This means 0 or more. Almost the same thing but it will still match the pattern if that
        specific piece of the pattern is missing (and the rest of the pattern matches).

"?"  -- Means 0 or 1, but can be used anywhere in the pattern. So ^hello?$ would match "hello" or

(item 1|item 2|item 3|etc) - this is a list of possible things that you can have in a pattern.
                             (Mosr|Vadol|Darknight) would match "Mosr", "Vadol", or "Darknight".


These four are the most common. Some other more advanced ones you'll run across are:

"?!" -- which means "not followed by". If you precede an element of the pattern with it, it will only
        match if the following element is not in the pattern.

"{n}" - means that there are exactly n of the preceding element. \d{5} means that there has to be 5
        digits for it to match, no more no less.

"{n,}" - means n or more of the preceding element. \d{5,} would only match if there were 5 or more

"{m,n}" - means that there are between m and n of the preceding element.
 \d{3,5} would only match if there were
                       between 3 and 5 numbers.

"{m,n}?" - means as few as possible.

That's pretty much as in-depth as I'm going to go for those, I'm not going to do any examples for those last five. 

Capture Groups

Using Mudlet as an example, capture groups are what tell Mudlet to save the wildcard to a variable (or matches table). By simply putting anything into parenthesis, you turn it into a capture group and the target stuff is sent to the saved wildcard table (matches[2-99] in mudlet, %1-%99 in Zmud/Cmud).

To still be able to use parenthesis, you have to escape them. Or if you want to put something into a group but not save it, you just put a ?: in after the first parenthesis. 

A Few Examples

Vadol has successfully inscribed the image of the Star on his Tarot card.

^(\w+) has successfully inscribe the image of the (.+?) on (?:his|her) Tarot card\.$

This would send "Vadol" and "Star" to the matches table and disregard "his" but still match if it was a female that was inscribing.



 You set the bomb's timer for 18 seconds.

^You set the bomb's timer for (\d+) seconds\.$ 

This would send the number "18" to the matches table as matches[2].


Mosr tells you, "Empress."
Mosr tells you, "Empress please."
Mosr tells you, "Emp?"
Mosr tells you, "Emp."

^(\w+) tells you, "Emp(?:ress)? ?(?:please)?\."$ 

Sometimes you need to capture multiple ways of somebody telling you something. This is an example of how to do it.
Mosr will be sent to matches[2]. Simple enough.

What the Emp(?:ress)? does is check for "Emp". That much HAS to be there. By "ress" being in parenthesis, it groups the letters together and tells Mudlet to send "ress" to the matches table. However, by placing ?: after the first parenthesis, it tells Mudlet NOT to send it to the matches table, but to keep them grouped. The ? after the last parenthsis means that the previous capture group (the stuff in parenthesis) may or may not be there. Meaning that it can match Emp or empress. We do the same thing for the "please" portion. To cover the space between the words, we just stick a question mark after the space. This way it will match the possibilities listed above.