Java中的正则表达式

正则表达式是使用单个字符串来描述、匹配一系列匹配某个句法规则的字符串，通常被用来检索、替换那些符合某个模式（规则）的文本。

在Java中,主要用到java.util.regex.Pattern 类和java.util.regex.Matcher 两个类

1.使用正则表达式
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "www.baidu.com";

//创建正则表达式对象,制定好匹配规则
Pattern p = Pattern.compile("[a-z]+");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m = p.matcher(str);

//开始匹配
while(m.find()){
//输出匹配的结果
System.out.println(m.group(0));
}
}
}

输出结果：
www
baidu
com0

2.find方法和gooup方法
2.1 find方法
1)按照匹配规则查找将要处理的文本,定位到符合条件的子字符串
2)将子字符串的开始位置索引记录到Matcher对象属性int[] groups的groups[0]中,子字符串结束索引+1记录到groups[1]中,+1是因为字符串截取函数含头不含尾
3)记录oldlast=子字符串结束索引+1用于下次执行find方法时从oldlast开始

2.2 group方法
关键代码:return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
CharSequence getSubSequence(int beginIndex, int endIndex) {
return text.subSequence(beginIndex, endIndex);
}
text是传进来的CharSequence对象,这里也就是String对象,相当于根据之前存储的int[] groups中子字符串起止位置下标从原字符串截取出来变成String对象返回。

3.匹配规则
3.1 \\ ->转义,表示后面的特殊字符将失去特殊含义只表示单纯的文本(其他语言的正则表达式用\表示转义,Java中\是特殊符号,用于表示\r,\n,\t等。所以\\表示"\",而在正则中,又通过\来转义,所以Java中使用正则的时候需要\\,而表示文本\的时候需要\\\\)

场景1：
public class Main {
public static void main(String[] args) {
//想要处理的文本 a.b.c
String str = "a.b.c";

//创建正则表达式对象,制定好匹配规则,这里使用转义
Pattern p1 = Pattern.compile("\\.");
//创建正则表达式对象,制定好匹配规则,这里不使用转义
Pattern p2 = Pattern.compile(".");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);

//开始匹配

while(m1.find()){
//输出匹配的结果
System.out.println(m1.group(0));
}

System.out.println("======================================");

while(m2.find()){
//输出匹配的结果
System.out.println(m2.group(0));
}
}
}
输出结果：
.
.
======================================
a
.
b
.
c

场景2:
public class Main {
public static void main(String[] args) {
//想要处理的文本 C:\Users.\ 这里在定义的时候就需要转义
String str = "C:\\Users.\\";

//创建正则表达式对象,制定好匹配规则-> "\" 这里用\\表示接下来的字符要转义,而后面的\\表示"\"文本
Pattern p = Pattern.compile("\\\\");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m = p.matcher(str);

//开始匹配

while(m.find()){
//输出匹配的结果
System.out.println(m.group(0));
}
}
}
执行结果：
\
\

需要转义的字符：.
               *
               +
               (
               )
               $
               /
               \
               ?
               [
               ]
               ^
               {
               }
也就是\\.表示匹配规则的文本.

3.2 [] ->匹配单个字符,可以将要匹配的单个字符放在[]内
3.2.1 [abc] ->匹配目标字符串中的单个字符满足'a','b','c'三个字符其中一个
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本 a.b.c.d
String str = "a.b.c.d";

//创建正则表达式对象,制定好匹配规则 ->[abc]
Pattern p = Pattern.compile("[abc]");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m = p.matcher(str);

//开始匹配

while(m.find()){
//输出匹配的结果
System.out.println(m.group(0));
}
}
}
输出结果:
a
b
c

3.2.2 [a-z] ->匹配目标字符串中单个字符满足从a-z小写字母26个字符范围内,也可以用A-Z,0-9分别表示A-Z大写字母26个字符和0-9数字10个字符,同时也可以灵活规范范围,如b-d。
场景:
public class Main {
public static void main(String[] args) {
//想要处理的文本 a.b.C.d.e.1
String str = "a.b.C.d.e.1";

//创建正则表达式对象,制定好匹配规则 [a-z]
Pattern p1 = Pattern.compile("[a-z]");

//创建正则表达式对象,制定好匹配规则 [A-Z]
Pattern p2 = Pattern.compile("[A-Z]");

//创建正则表达式对象,制定好匹配规则 [0-9]
Pattern p3 = Pattern.compile("[0-9]");

//创建正则表达式对象,制定好匹配规则 [b-d]
Pattern p4 = Pattern.compile("[b-d]");

//创建正则表达式对象,制定好匹配规则 [d-b],定义是会报错
//Pattern p5 = Pattern.compile("[d-b]");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);
Matcher m3 = p3.matcher(str);
Matcher m4 = p4.matcher(str);
//开始匹配
System.out.print("[a-z]的匹配结果:");
while(m1.find()){
//输出匹配的结果
System.out.print(m1.group(0)+" ");
}
System.out.println();
System.out.print("[A-Z]的匹配结果:");
while(m2.find()){
//输出匹配的结果
System.out.print(m2.group(0)+" ");
}
System.out.println();
System.out.print("[0-9]的匹配结果:");
while(m3.find()){
//输出匹配的结果
System.out.print(m3.group(0)+" ");
}
System.out.println();
System.out.print("[b-d]的匹配结果:");
while(m4.find()){
//输出匹配的结果
System.out.print(m4.group(0)+" ");
}

}
}
输出结果:
[a-z]的匹配结果:a b d e
[A-Z]的匹配结果:C
[0-9]的匹配结果:1
[b-d]的匹配结果:b d

3.2.3[^a-z] ->匹配目标字符串中单个字符不满足从a-z小写字母26个字符范围内,[^abC]表示除了'a','b','C'以外的所有字符
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本 a.b.C.d.e.1
String str = "a.b.C.d.e.1";

//创建正则表达式对象,制定好匹配规则 [^a-z]
Pattern p1 = Pattern.compile("[^a-z]");

//创建正则表达式对象,制定好匹配规则 [^abC]
Pattern p2 = Pattern.compile("[^abC]");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);

//开始匹配
System.out.print("[^a-z]的匹配结果:");
while(m1.find()){
//输出匹配的结果
System.out.print(m1.group(0)+" ");
}
System.out.println();
System.out.print("[^abC]的匹配结果:");
while(m2.find()){
//输出匹配的结果
System.out.print(m2.group(0)+" ");
}

}
}
输出结果:
[^a-z]的匹配结果:. . C . . . 1
[^abC]的匹配结果:. . . d . e . 1

所以,[a-zA-Z]表示大写字母小写字母52个字符

3.2.4 . 表示任意单个字符(除了空白符\r \n)
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本 ab9C~!@#$%^&*()_
String str = "ab9C~!@#$%^&*()_";

//创建正则表达式对象,制定好匹配规则 .
Pattern p1 = Pattern.compile(".");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);

//开始匹配
System.out.print(". 的匹配结果:");
while(m1.find()){
//输出匹配的结果

System.out.println(i+" "+m1.group(0));
}

}
}
执行结果
. 的匹配结果:
a
b
9
C
~
!
@
#
$
%
^
&
*
(
)
_

3.2.5 \\d \\D 分别表示匹配单个数字字符和单个非数字字符,相当于[0-9] 和[^0-9]
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本 09abc
String str = "09abc";

//创建正则表达式对象,制定好匹配规则 .
Pattern p1 = Pattern.compile("\\d");
Pattern p2 = Pattern.compile("\\D");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);

//开始匹配
System.out.print("\\\\d 的匹配结果:");
System.out.println();
while(m1.find()){
//输出匹配的结果
System.out.println(m1.group(0));
}
System.out.println();
System.out.print("\\\\D 的匹配结果:");
System.out.println();
while(m2.find()){
//输出匹配的结果
System.out.println(m2.group(0));
}

}
}
执行结果:
\\d 的匹配结果:
0
9
\\D 的匹配结果:
a
b
c

3.2.6 \\w \\W 分别表示匹配单个数字字符,大小写字母字符,下划线的集合。单个非数字字符,大小写字母字符的集合相当于[0-9a-zA-Z] 和 [^0-9a-zA-Z]
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本 09abcABC#$%_
String str = "09abcABC#$%_";

//创建正则表达式对象,制定好匹配规则 .
Pattern p1 = Pattern.compile("\\w");
Pattern p2 = Pattern.compile("\\W");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);

//开始匹配
System.out.print("\\\\w 的匹配结果:");
System.out.println();
while(m1.find()){
//输出匹配的结果
System.out.print(m1.group(0));
}
System.out.println();
System.out.print("\\\\W 的匹配结果:");
System.out.println();
while(m2.find()){
//输出匹配的结果
System.out.print(m2.group(0));
}

}
}
执行结果：
\\w 的匹配结果:
09abcABC_
\\W 的匹配结果:
#$%

3.2.7 \\s \\S 分别表示匹配单个空白符(包括\r \n)。单个非空白符
场景:
public class Main {
public static void main(String[] args) {
//想要处理的文本 \r \n \t \f
String str = "b @#\r\n\t\f";

//创建正则表达式对象,制定好匹配规则 .
Pattern p1 = Pattern.compile("\\s");
Pattern p2 = Pattern.compile("\\S");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);

int i =0;

//开始匹配
System.out.print("\\\\s 的匹配结果:");
System.out.println();
while(m1.find()){
//输出匹配的结果
System.out.println("找到了");
System.out.print(m1.group(0)+i++);
System.out.println("空白符");
}
System.out.println();
System.out.print("\\\\S 的匹配结果:");
System.out.println();
while(m2.find()){
//输出匹配的结果
System.out.println(m2.group(0));
}

}
}
输出结果:
\\s 的匹配结果:
找到了
0空白符
找到了
1空白符
找到了

2空白符
找到了
3空白符
找到了
4空白符

\\S 的匹配结果:
b
@
#

3.3 多个字符匹配
3.3.1 "abc" 匹配abc3个字符组成的字符串,默认区分大小写。字符前加"(?i)"不区分大小写
场景:
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "abcdABCeaBCfaBcg";

//创建正则表达式对象,制定好匹配规则 .
Pattern p1 = Pattern.compile("abc");
Pattern p2 = Pattern.compile("(?i)abc");
Pattern p3 = Pattern.compile("a(?i)bc");
Pattern p4 = Pattern.compile("a((?i)b)c");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);
Matcher m3 = p3.matcher(str);
Matcher m4 = p4.matcher(str);

//开始匹配
System.out.println("abc 的匹配结果:");
while(m1.find()){
//输出匹配的结果
System.out.println(m1.group(0));
}
System.out.println("(?i)abc 的匹配结果:");
while(m2.find()){
//输出匹配的结果
System.out.println(m2.group(0));
}
System.out.println("a(?i)bc 的匹配结果:");
while(m3.find()){
//输出匹配的结果
System.out.println(m3.group(0));
}
System.out.println("a((?i)b)c 的匹配结果:");
while(m4.find()){
//输出匹配的结果
System.out.println(m4.group(0));
}

}
}
输出结果:
abc 的匹配结果:
abc
(?i)abc 的匹配结果:
abc
ABC
aBC
aBc
a(?i)bc 的匹配结果:
abc
aBC
aBc
a((?i)b)c 的匹配结果:
abc
aBc

使用Pattern p = Pattern.compile("abc",Pattern.CASE_INSENSITIVE));等同于Pattern p2 = Pattern.compile("(?i)abc");

3.3.2 |表示或 a|bcd 则表示两种情况满足一种即可
场景:
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "abcabdabc";

//创建正则表达式对象,制定好匹配规则 .
Pattern p1 = Pattern.compile("ab|ca");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);

//开始匹配
System.out.println("ab|ca 的匹配结果:");
while(m1.find()){
//输出匹配的结果
System.out.println(m1.group(0));
}

}
}
执行结果:
ab|ca 的匹配结果:
ab
ca
ab

3.3.2 {m} 表示精准m次匹配前面的表达式。{m,} 至少m次匹配前面的表达式。{m,n} 至少匹配m次,至多匹配n次前面的表达式
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "aababaaabaaaab";

//创建正则表达式对象,制定好匹配规则 .
Pattern p1 = Pattern.compile("a{2}");
Pattern p2 = Pattern.compile("a{2,}");
Pattern p3 = Pattern.compile("a{2,3}");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);
Matcher m3 = p3.matcher(str);

//开始匹配
System.out.println("a{2} 的匹配结果:");
while(m1.find()){
//输出匹配的结果
System.out.println(m1.group(0));
}

System.out.println("a{2,} 的匹配结果:");
while(m2.find()){
//输出匹配的结果
System.out.println(m2.group(0));
}

System.out.println("a{2,3} 的匹配结果:");
while(m3.find()){
//输出匹配的结果
System.out.println(m3.group(0));
}

}
}
输出结果:
a{2} 的匹配结果:
aa
aa
aa
aa
a{2,} 的匹配结果:
aa
aaa
aaaa
a{2,3} 的匹配结果:
aa
aaa
aaa

+表示一次或多次匹配前面的表达式,相当于{1,} ？表示0次或1次匹配前面的表达式,相当于{0,1} * 表示0次或多次匹配前面的表达式相当于{0,}
java的限定符(+,*,?,{m,},{m,n})默认开启的是贪心匹配,匹配符合条件的最长子字符串。在限定符后再加一个？则开启非贪心模式匹配,匹配符合条件的最短字符串
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "baaaaaac";

//创建正则表达式对象,制定好匹配规则默认开启贪心匹配
Pattern p1 = Pattern.compile("a{3,4}");
//开启非贪心匹配
Pattern p2 = Pattern.compile("a{3,4}?");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);

//开始匹配
System.out.println("a{3,4} 的匹配结果:");
while(m1.find()){
//输出匹配的结果
System.out.println(m1.group(0));
}
System.out.println("a{3,4}? 的匹配结果:");
while(m2.find()){
//输出匹配的结果
System.out.println(m2.group(0));
}
}
}
输出结果：
a{3,4} 的匹配结果:
aaaa
a{3,4}? 的匹配结果:
aaa
aaa

3.4.定位符只表示该位置的规则
3.4.1 ^表示开始位置,$表示结束位置
场景:
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "b2b3";

//创建正则表达式对象,制定好匹配规则表示以a-z小写字母1个或者多个开头,以数字结尾，中间位非空白字符
Pattern p1 = Pattern.compile("^[a-z]\\w*\\d$");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);

//开始匹配
System.out.println("^[a-z]\\\\w*\\\\d$ 的匹配结果:");
while(m1.find()){
//输出匹配的结果
System.out.println(m1.group(0));
}

}
}
输出结果：
^[a-z]\\w*\\d$ 的匹配结果:
b2b3

3.4.2 \\b 表示边界位置,\\B表示非边界位置
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "abc def";

//创建正则表达式对象,制定好匹配规则 cd和
Pattern p1 = Pattern.compile("abc\\b def");
Pattern p2 = Pattern.compile("abc def\\b");

Pattern p3 = Pattern.compile("abc def");
Pattern p4 = Pattern.compile("abc def\\B");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);
Matcher m3 = p3.matcher(str);
Matcher m4 = p4.matcher(str);

//开始匹配
System.out.println("abc\\\\bdef 的匹配结果:");
while(m1.find()){
//输出匹配的结果
System.out.println(m1.group(0));
}
System.out.println("abc def\\\\b 的匹配结果:");
while(m2.find()){
//输出匹配的结果
System.out.println(m2.group(0));
}
System.out.println("abc def 的匹配结果:");
while(m3.find()){
//输出匹配的结果
System.out.println(m3.group(0));
}
System.out.println("abc def\\\\B 的匹配结果:");
while(m4.find()){
//输出匹配的结果
System.out.println(m4.group(0));
}
}
}
输出结果：
abc\\bdef 的匹配结果:
abc def
abc def\\b 的匹配结果:
abc def
abc def 的匹配结果:
abc def
abc def\\B 的匹配结果:

4.分组
4.1 捕获分组 ->通过对已经匹配完成的子字符串进行分组得到部分匹配到的内容对其内容捕获储存
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "abc1234def2234";

//创建正则表达式对象,制定好匹配规则
Pattern p1 = Pattern.compile("(\\d\\d)(\\d\\d)");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);

int i = 1;

//开始匹配
System.out.println("匹配结果:");
while(m1.find()){
System.out.println("第"+i+++"次匹配结果");
//输出匹配的结果
System.out.println(m1.group(0));
System.out.println(m1.group(1));
System.out.println(m1.group(2));
}
}
}
输出结果:
匹配结果:
第1次匹配结果
1234
12
34
第2次匹配结果
2234
22
34

4.1.1 捕获分组之后的find方法和group方法
4.1.1.1 find方法
1)按照匹配规则查找将要处理的文本,定位到符合条件的子字符串
2)将子字符串的开始位置索引记录到Matcher对象属性int[] groups的groups[0]中,子字符串结束索引+1记录到groups[1]中
将子字符串分组的第一组的开始位置索引记录到Matcher对象属性int[] groups的groups[2]中,子字符串分组的第一组的结束索引+1记录到groups[3]中
.
.
.
3)记录oldlast=子字符串结束索引+1用于下次执行find方法时从oldlast开始

4.1.1.2 group方法
关键代码:return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
CharSequence getSubSequence(int beginIndex, int endIndex) {
return text.subSequence(beginIndex, endIndex);
}
调用m.group(0)返回的是匹配到的子字符串
调用m.group(1)返回的是匹配到的子字符串的第一组子字符串
.
.
.

4.1.2 非命名捕获分组和命名捕获分组可以在匹配规则制定时为捕获分组取名方便查找
场景：
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "abc1234def2234";

//创建正则表达式对象,制定好匹配规则命名捕获组g1和非命名捕获组
Pattern p1 = Pattern.compile("(?<g1>\\d\\d)(\\d\\d)");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);

int i = 1;

//开始匹配
System.out.println("匹配结果:");
while(m1.find()){
System.out.println("第"+i+++"次匹配结果");
//输出匹配的结果
System.out.println(m1.group(0));
System.out.println(m1.group(1));
System.out.println(m1.group("g1"));
System.out.println(m1.group(2));
}
}
}
输出结果:
匹配结果:
第1次匹配结果
1234
12
12
34
第2次匹配结果
2234
22
22
34

4.2 非捕获分组
4.2.1 (?:Pattern) 匹配 pattern 但不捕获该匹配的子表达式
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "abc12def#2234#32ij";

//创建正则表达式对象,制定好匹配规则捕获组(\\d\\d)和非捕获组((?:\d\d|[a-z]{2}))以及不分组的[a-z]*
Pattern p1 = Pattern.compile("(\\d\\d)(?:\\d\\d|[a-z]{2})[a-z]*");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);

int i = 1;

//开始匹配
System.out.println("匹配结果:");
while(m1.find()){
System.out.println("第"+i+++"次匹配结果");
//输出匹配的结果
System.out.println(m1.group(0));
System.out.println(m1.group(1));
}
}
}
输出结果:
匹配结果:
第1次匹配结果
12def
12
第2次匹配结果
2234
22
第3次匹配结果
32ij
32

4.2.2 Pattern1(?=Pattern2) 当源字符串满足Pattern1时,还要看Pattern1后面的字符串是否满足Pattern2,若满足才能匹配Pattern1
Pattern1(?!Pattern2) 当源字符串满足Pattern1时,还要看Pattern1后面的字符串是否满足Pattern2,若不满足才能匹配Pattern1
场景:
public class Main {
public static void main(String[] args) {
//想要处理的文本
String str = "Windows 98#Windows XP#Windows 7";

//创建正则表达式对象,制定好匹配规则
Pattern p1 = Pattern.compile("Windows (?=98|7)");
Pattern p2 = Pattern.compile("Windows (?!98|7)");

//创建匹配器,将正则表达式对象和想要处理的文本关联起来
Matcher m1 = p1.matcher(str);
Matcher m2 = p2.matcher(str);

int i = 1;

//开始匹配
System.out.println("(?=98|7) 匹配结果:");
while(m1.find()){
System.out.println("第"+i+++"次匹配结果");
//输出匹配的结果
System.out.println(m1.group(0));
}

i=1;
//开始匹配
System.out.println("(?!98|7) 匹配结果:");
while(m2.find()){
System.out.println("第"+i+++"次匹配结果");
//输出匹配的结果
System.out.println(m2.group(0));
}
}
}
输出结果:
(?=98|7) 匹配结果:
第1次匹配结果
Windows
第2次匹配结果
Windows
(?!98|7) 匹配结果:
第1次匹配结果
Windows

Java中的正则表达式

悦读