Capturing groups
Capturing groups are an extremely useful feature of regular
expression matching that allow us to query the Matcher to find out
what the part of the string was that matched against a particular part of the
regular expression.
Let's look directly at an example. Say we write an expression to parse dates
in the format DD/MM/YYYY.
We can write an expression to do this as follows:
[0-9]{2}/[0-9]{2}/[0-9]{4}
(In principle, we could make the regular expression put further constraints
on the date. For example, the month part could be just [0-1][0-9], since
the first digit of a month number must be 0 or 1. But for this example, we'll accept
any number with the correct number of digits and assume that further range checking
would then take place when a match was found.)
As it stands, this expression will tell us if a given string matches
the required date format, but it won't help us read what the date is.
This is where capturing groups come in. We re-write the expression as follows.
([0-9]{2})/([0-9]{2})/([0-9]{4})
The brackets surround the parts of the expression whose corresponding string
we want to "remember". These bracketed expressions are called groups,
and are number from 1 upwards from left to right.
Now, we can "pull out" these elements of the string with the following code:
Pattern datePatt = Pattern.compile("([0-9]{2})/([0-9]{2})/([0-9]{4})");
...
Matcher m = datePatt.matcher(dateStr);
if (m.matches()) {
int day = Integer.parseInt(m.group(1));
int month = Integer.parseInt(m.group(2));
int year = Integer.parseInt(m.group(3));
}
Note that to use capturing groups, we basically have to use the
explicit Pattern/Matcher means of matching.
In advanced use of capturing groups,
there are exceptions where we can actually refer to a captured
group from inside the expression itself.
When performing a search and replace
with a regular expression, we can also refer to groups by their number
from inside the replacement string.
Group 0
Capturing groups start at group number 1, as in the example above.
There is also a group 0, which is always the entire string that matched.
See also the section on search and replace
using the Matcher.find() method.
Alternatives and optional capturing groups
On the following page, we look at using
alternatives in capturing groups.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.