java string底层实现_String底层原理学习笔记

1、String的定义

public final classStringimplements java.io.Serializable, Comparable, CharSequence { }

String是一个final类，既不能被继承的类

String类实现了java.io.Serializable接口，可以实现序列化

String类实现了Comparable，可以用于比较大小(按顺序比较单个字符的ASCII码)

String类实现了 CharSequence 接口，表示是一个有序字符的序列，因为String的本质是一个char类型数组\

2、字段属性

/**用来存储char型字符数组这是String字符串的本质，是一个字符集合，而且是final的，是不可变的。*/

private final charvalue[];/**缓存hash,默认0*/

private int hash; //Default to 0

/**实现序列化的标识*/

private static final long serialVersionUID = -6849794470754667710L;

3、构造函数

构造函数有16个;

/**

* String的无参构造函数,默认是一个空字符串,但是我们一般不这么实例化,因为字符串是不可变的,可以直接String a=”aaaaa”*/

publicString() {this.value = "".value;

}/**这是一个有参构造函数，参数为一个String对象

* 将形参的value和hash赋值给实例对象作为初始化

* 相当于深拷贝了一个形参String对象,返回了一个新的对象

但是还是不建议这么构造*/

publicString(String original) {this.value =original.value;this.hash =original.hash;

}/**这是一个有参构造函数，参数为一个char字符数组,

这里使用Array拷贝,而不是引用原来的字符数组,有可能是因为这个非常长的字符数组一直在被引用，所以无法被回收，就可能导致内存泄露

* 意义就是通过字符数组去构建一个新的String对象*/

public String(charvalue[]) {this.value =Arrays.copyOf(value, value.length);

}/**这是一个有参构造函数，参数为char字符数组,offset(起始位置，偏移量),count(个数)

* 作用就是在char数组的基础上，从offset位置开始计数count个，构成一个新的String的字符串

* 意义就类似于截取count个长度的字符集合构成一个新的String对象*/

public String(char value[], int offset, intcount) {if (offset < 0) {throw newStringIndexOutOfBoundsException(offset);

}if (count <= 0) {if (count < 0) {throw newStringIndexOutOfBoundsException(count);

}if (offset <=value.length) {this.value = "".value;return;

}

}//Note: offset or count might be near -1>>>1.

if (offset > value.length -count) {throw new StringIndexOutOfBoundsException(offset +count);

}//上面这一段都是校验然后抛异常//重点，截取字符,这里也用到数组拷贝

this.value = Arrays.copyOfRange(value, offset, offset+count);

}/*** 这是一个有参构造函数，参数为int字符数组,offset(起始位置，偏移量),count(个数)

* 作用跟04构造函数差不多，但是传入的不是char字符数组，而是int数组。

* 而int数组的元素则是字符对应的ASCII整数值

* 例子：new String(new int[]{97,98,99},0,3); output: abc*/

public String(int[] codePoints, int offset, intcount) {if (offset < 0) {throw newStringIndexOutOfBoundsException(offset);

}if (count <= 0) {if (count < 0) {throw newStringIndexOutOfBoundsException(count);

}if (offset <=codePoints.length) {this.value = "".value;return;

}

}//Note: offset or count might be near -1>>>1.

if (offset > codePoints.length -count) {throw new StringIndexOutOfBoundsException(offset +count);

}final int end = offset +count;//Pass 1: Compute precise size of char[]

int n =count;for (int i = offset; i < end; i++) {int c =codePoints[i];if(Character.isBmpCodePoint(c))continue;else if(Character.isValidCodePoint(c))

n++;else throw newIllegalArgumentException(Integer.toString(c));

}//Pass 2: Allocate and fill in char[]

final char[] v = new char[n];for (int i = offset, j = 0; i < end; i++, j++) {int c =codePoints[i];if(Character.isBmpCodePoint(c))

v[j]= (char)c;elseCharacter.toSurrogates(c, v, j++);

}this.value =v;

}

过时的方法就不列出来的。

/*** 这是一个有参构造函数，参数为byte数组,offset(起始位置，偏移量),长度，和字符编码格式

* 就是传入一个byte数组，从offset开始截取length个长度，其字符编码格式为charsetName，如UTF-8

* 例子：new String(bytes, 2, 3, "UTF-8");

**/

public String(byte bytes[], int offset, intlength, String charsetName)throwsUnsupportedEncodingException {if (charsetName == null)throw new NullPointerException("charsetName");

checkBounds(bytes, offset, length);this.value =StringCoding.decode(charsetName, bytes, offset, length);

}//这个跟上面差不多

public String(byte bytes[], int offset, intlength, Charset charset) {if (charset == null)throw new NullPointerException("charset");

checkBounds(bytes, offset, length);this.value =StringCoding.decode(charset, bytes, offset, length);

}/**这是一个有参构造函数，参数为byte数组和字符集编码

* 用charsetName的方式构建byte数组成一个String对象*/

public String(bytebytes[], String charsetName)throwsUnsupportedEncodingException {this(bytes, 0, bytes.length, charsetName);

}

//这个跟上面的类似public String(bytebytes[], Charset charset) {this(bytes, 0, bytes.length, charset);

}/*** 这是一个有参构造函数，参数为byte数组,offset(起始位置，偏移量),length(个数)

* 通过使用平台的默认字符集解码指定的 byte 子数组，构造一个新的 String。*/

public String(byte bytes[], int offset, intlength) {

checkBounds(bytes, offset, length);this.value =StringCoding.decode(bytes, offset, length);

}/**这是一个有参构造函数，参数为byte数组

* 通过使用平台默认字符集编码解码传入的byte数组，构造成一个String对象，不需要截取*/

public String(bytebytes[]) {this(bytes, 0, bytes.length);

}/*** 有参构造函数，参数为StringBuffer类型

* 就是将StringBuffer构建成一个新的String,比较特别的就是这个方法有synchronized锁

* 同一时间只允许一个线程对这个buffer构建成String对象*/

publicString(StringBuffer buffer) {synchronized(buffer) {this.value =Arrays.copyOf(buffer.getValue(), buffer.length());

}

}/*** 有参构造函数，参数为StringBuilder

* 同上面差不多，只不过是StringBuilder的版本，差别就是没有实现线程安全*/

publicString(StringBuilder builder) {this.value =Arrays.copyOf(builder.getValue(), builder.length());

}

/** 这个构造函数比较特殊，有用的参数只有char数组value,是一个不对外公开的构造函数，没有访问修饰符

* 加入这个share的只是为了区分于String(char[] value)方法，用于重载，

* 为什么提供这个方法呢，因为性能好，不需要拷贝。为什么不对外提供呢？因为对外提供会打破value为不变数组的限制。

* 如果对外提供这个方法让String与外部的value产生关联，如果修改外部的value，会影响String的value。所以不能

* 对外提供*/String(char[] value, booleanshare) {//assert share : "unshared not supported";

this.value =value;

}

以上代码展示了总共14种构造方法，忽略了两种被标记为过时的构造方法：

可以构造空字符串对象,既""

可以根据String,StringBuilder,StringBuffer构造字符串对象

可以根据char数组，其子数组构造字符串对象

可以根据int数组，其子数组构造字符串对象

可以根据某个字符集编码对byte数组，其子数组解码并构造字符串对象

4. 长度、是否为空

public intlength() {returnvalue.length;

}public booleanisEmpty() {return value.length == 0;

}

5. charAt、codePointAt类型函数

/** 返回String对象的char数组index位置的元素*/

public char charAt(intindex) {if ((index < 0) || (index >=value.length)) {throw newStringIndexOutOfBoundsException(index);

}returnvalue[index];

}//返回String对象的char数组index位置的元素的ASSIC码(int类型)

public int codePointAt(intindex) {if ((index < 0) || (index >=value.length)) {throw newStringIndexOutOfBoundsException(index);

}returnCharacter.codePointAtImpl(value, index, value.length);

}//返回index位置元素的前一个元素的ASSIC码(int型)

public int codePointBefore(intindex) {int i = index - 1;if ((i < 0) || (i >=value.length)) {throw newStringIndexOutOfBoundsException(index);

}return Character.codePointBeforeImpl(value, index, 0);

}/*** 方法返回的是代码点个数，是实际上的字符个数,功能类似于length()

* 对于正常的String来说，length方法和codePointCount没有区别，都是返回字符个数。

* 但当String是Unicode类型时则有区别了。

* 例如：String str = “/uD835/uDD6B” (即 'Z' ), length() = 2 ,codePointCount() = 1*/

public int codePointCount(int beginIndex, intendIndex) {if (beginIndex < 0 || endIndex > value.length || beginIndex >endIndex) {throw newIndexOutOfBoundsException();

}return Character.codePointCountImpl(value, beginIndex, endIndex -beginIndex);

}/*** 也是相对Unicode字符集而言的，从index索引位置算起，偏移codePointOffset个位置，返回偏移后的位置是多少

* 例如，index = 2 ,codePointOffset = 3 ，maybe返回 5*/

public int offsetByCodePoints(int index, intcodePointOffset) {if (index < 0 || index >value.length) {throw newIndexOutOfBoundsException();

}return Character.offsetByCodePointsImpl(value, 0, value.length,

index, codePointOffset);

}

6.getChar、getBytes类型函数

/*** * 这是一个不对外的方法，是给String内部调用的，因为它是没有访问修饰符的，只允许同一包下的类访问

* 参数：dst[]是目标数组，dstBegin是目标数组的偏移量，既要复制过去的起始位置(从目标数组的什么位置覆盖)

* 作用就是将String的字符数组value整个复制到dst字符数组中，在dst数组的dstBegin位置开始拷贝*/

void getChars(char dst[], intdstBegin) {

System.arraycopy(value,0, dst, dstBegin, value.length);

}/*** 得到char字符数组，原理是getChars() 方法将一个字符串的字符复制到目标字符数组中。

* 参数：srcBegin是原始字符串的起始位置，srcEnd是原始字符串要复制的字符末尾的后一个位置(既复制区域不包括srcEnd)

* dst[]是目标字符数组，dstBegin是目标字符的复制偏移量，复制的字符从目标字符数组的dstBegin位置开始覆盖。*/

public void getChars(int srcBegin, int srcEnd, char dst[], intdstBegin) {if (srcBegin < 0) {throw newStringIndexOutOfBoundsException(srcBegin);

}if (srcEnd >value.length) {throw newStringIndexOutOfBoundsException(srcEnd);

}if (srcBegin >srcEnd) {throw new StringIndexOutOfBoundsException(srcEnd -srcBegin);

}

System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd-srcBegin);

}/*** 获得charsetName编码格式的bytes数组*/

public byte[] getBytes(String charsetName)throwsUnsupportedEncodingException {if (charsetName == null) throw newNullPointerException();return StringCoding.encode(charsetName, value, 0, value.length);

}/*** 与上个方法类似，*/

public byte[] getBytes(Charset charset) {if (charset == null) throw newNullPointerException();return StringCoding.encode(charset, value, 0, value.length);

}/*** 使用平台默认的编码格式获得bytes数组*/

public byte[] getBytes() {return StringCoding.encode(value, 0, value.length);

}

7.equal类函数

//先比较地址是否相等,地址相等则内容一样, 然后比较长度,长度相等再逐个比较字符串

public booleanequals(Object anObject) {if (this ==anObject) {return true;

}if (anObject instanceofString) {

String anotherString=(String)anObject;int n =value.length;if (n ==anotherString.value.length) {char v1[] =value;char v2[] =anotherString.value;int i = 0;while (n-- != 0) {if (v1[i] !=v2[i])return false;

i++;

}return true;

}

}return false;

}//忽略大小写比较是否内容相同,先判断地址,再判断长度,最后再执行regionMatchs方法,忽略大小写

public booleanequalsIgnoreCase(String anotherString) {return (this == anotherString) ? true: (anotherString!= null)&& (anotherString.value.length ==value.length)&& regionMatches(true, 0, anotherString, 0, value.length);

}/*** 这是一个公有的比较方法，参数是StringBuffer类型

* 实际调用的是contentEquals(CharSequence cs)方法，可以说是StringBuffer的特供版,方法里面会同步一下*/

public booleancontentEquals(StringBuffer sb) {returncontentEquals((CharSequence)sb);

}/*** 这是一个私有方法，特供给比较StringBuffer和StringBuilder使用的。

* 比如在contentEquals方法中使用，参数是AbstractStringBuilder抽象类的子类*/

private booleannonSyncContentEquals(AbstractStringBuilder sb) {char v1[] =value;char v2[] =sb.getValue();int n =v1.length;if (n !=sb.length()) {return false;

}for (int i = 0; i < n; i++) {if (v1[i] !=v2[i]) {return false;

}

}return true;

}/*** 这是一个常用于String对象跟StringBuffer和StringBuilder比较的方法

* 参数是StringBuffer或StringBuilder或String或CharSequence

* StringBuffer和StringBuilder和String都实现了CharSequence接口*/

public booleancontentEquals(CharSequence cs) {//Argument is a StringBuffer, StringBuilder

if (cs instanceofAbstractStringBuilder) {if (cs instanceofStringBuffer) {synchronized(cs) {returnnonSyncContentEquals((AbstractStringBuilder)cs);

}

}else{returnnonSyncContentEquals((AbstractStringBuilder)cs);

}

}//Argument is a String

if (cs instanceofString) {returnequals(cs);

}//Argument is a generic CharSequence

char v1[] =value;int n =v1.length;if (n !=cs.length()) {return false;

}for (int i = 0; i < n; i++) {if (v1[i] !=cs.charAt(i)) {return false;

}

}return true;

}

以上代码重点说明：

equals()方法作为常用的方法，很具有层次感和借鉴意义，首先判断是否为同一个对象，再判断是否为要比较的类型，再判断两个对象的长度是否相等，

首先从广的角度过滤筛选不符合的对象，再符合条件的对象基础上再一个一个字符的比较。

equalsIgnoreCase()方法是对equals()方法补充，不区分大小写的判断

contentEquals()则是用于String对象与4种类型的判断，通常用于跟StringBuilder和StringBuffer的判断,也是对equals方法的一个补充

8、regionMatchs()方法

/*** 这是一个类似于equals的方法，比较的是字符串的片段，也即是部分区域的比较

* toffset是当前字符串的比较起始位置(偏移量),other是要比较的String对象参数，ooffset是要参数String的比较片段起始位置，len是两个字符串要比较的片段的长度大小

* 例子：String str1 = "0123456",Str2 = "0123456789";

* str1.regionMatchs(0,str2,0,6);意思是str1从0位置开始于str2的0位置开始比较6个长度的字符串片段

* 相等则返回 true,不等返回false*/

public boolean regionMatches(int toffset, String other, intooffset,intlen) {char ta[] =value;int to =toffset;char pa[] =other.value;int po =ooffset;//Note: toffset, ooffset, or len might be near -1>>>1.

if ((ooffset < 0) || (toffset < 0)|| (toffset > (long)value.length -len)|| (ooffset > (long)other.value.length -len)) {return false;

}while (len-- > 0) {if (ta[to++] != pa[po++]) {return false;

}

}return true;

}/*** 这个跟上面的方法一样，只不过多了一个参数，既ignoreCase,既是否为区分大小写。

* 是equalsIgnoreCase()方法的片段比较版本，实际上equalsIgnoreCase()也是调用regionMatches函数*/

public boolean regionMatches(boolean ignoreCase, inttoffset,

String other,int ooffset, intlen) {….}

从上可以看出：

片段比较时针对String对象的。所以如果你要跟StringBuffer和StringBuilder比较，那么记得toString.

如果你要进行两个字符串之间的片段比较的话，就可以使用regionMatches，如果是完整的比较那么就equals吧

9.compareTo类函数和CaseInsensitiveComparator静态内部类

/*** 这是一个比较字符串中字符大小的函数，因为String实现了Comparable接口，所以重写了compareTo方法

* Comparable是排序接口。若一个类实现了Comparable接口，就意味着该类支持排序。

* 实现了Comparable接口的类的对象的列表或数组可以通过Collections.sort或Arrays.sort进行自动排序。

* 参数是需要比较的另一个String对象

* 返回的int类型，正数为大，负数为小，是基于字符的ASSIC码比较的

*例如: "abc".compareTo("bac") 结果是-1,因为a-b=-1

"abc".compareTo("abcggff"); //-4,因为长度差4*/

public intcompareTo(String anotherString) {int len1 =value.length;int len2 =anotherString.value.length;int lim =Math.min(len1, len2);char v1[] =value;char v2[] =anotherString.value;int k = 0;while (k < lim) { //一直遍历到最小的字符长度

char c1 =v1[k];char c2 =v2[k];if (c1 !=c2) {return c1 - c2; //从前向后遍历，只要其实一个不相等，返回字符ASSIC的差值,int类型

}

k++;

}return len1 - len2; //如果两个字符串同样位置的索引都相等，返回长度差值，完全相等则为0 }

/*** 这是一个类似compareTo功能的方法，但是不是comparable接口的方法，是String本身的方法

* 使用途径，我目前只知道可以用来不区分大小写的比较大小，但是不知道如何让它被工具类Collections和Arrays运用

**/

public intcompareToIgnoreCase(String str) {return CASE_INSENSITIVE_ORDER.compare(this, str);

}/*** 这是一个饿汉单例模式，是String类型的一个不区分大小写的比较器

* 提供给Collections和Arrays的sort方法使用

* 例如：Arrays.sort(strs,String.CASE_INSENSITIVE_ORDER);

* 效果就是会将strs字符串数组中的字符串对象进行忽视大小写的排序

**/

public static final ComparatorCASE_INSENSITIVE_ORDER= newCaseInsensitiveComparator();/*** 这一个私有的静态内部类，只允许String类本身调用

* 实现了序列化接口和比较器接口，comparable接口和comparator是有区别的

* 重写了compare方法，该静态内部类实际就是一个String类的比较器

**/

private static classCaseInsensitiveComparatorimplements Comparator, java.io.Serializable {//use serialVersionUID from JDK 1.2.2 for interoperability

private static final long serialVersionUID = 8575799808933029326L;public intcompare(String s1, String s2) {int n1 =s1.length();int n2 =s2.length();int min =Math.min(n1, n2);for (int i = 0; i < min; i++) {char c1 =s1.charAt(i);char c2 =s2.charAt(i);if (c1 !=c2) {

c1=Character.toUpperCase(c1);

c2=Character.toUpperCase(c2);if (c1 !=c2) {

c1=Character.toLowerCase(c1);

c2=Character.toLowerCase(c2);if (c1 !=c2) {//No overflow because of numeric promotion

return c1 -c2;

}

}return n1 -n2;

}/**Replaces the de-serialized object.*/

private Object readResolve() { returnCASE_INSENSITIVE_ORDER; }

}

以上的代码可以看出：

①String实现了comparable接口，重写了compareTo方法，可以用于自己写类进行判断排序，也可以使用collections，Arrays工具类的sort进行排序。只有集合或数组中的元素实现了comparable接口，并重写了compareTo才能使用工具类排序。

②CASE_INSENSITIVE_ORDER是一个单例，是String提供为外部的比较器，该比较器的作用是忽视大小写进行比较，我们可以通过Collections或Arrays的sort方法将CASE_INSENSITIVE_ORDER比较器作为参数传入，进行排序。

10.startWith、endWith类函数

public boolean startsWith(String prefix, inttoffset) {char ta[] =value;int to =toffset;char pa[] =prefix.value;int po = 0;int pc =prefix.value.length;//Note: toffset might be near -1>>>1.

if ((toffset < 0) || (toffset > value.length -pc)) {return false;

}while (--pc >= 0) {if (ta[to++] != pa[po++]) {return false;

}

}return true;

}/*** 判断当前字符串对象是否以字符串prefix起头

* 是返回true,否返回fasle*/

public booleanstartsWith(String prefix) {return startsWith(prefix, 0);

}/*** 判断当前字符串对象是否以字符串prefix结尾

* 是返回true,否返回fasle*/

public booleanendsWith(String suffix) {//suffix是需要判断是否为尾部的字符串。//value.length - suffix.value.length是suffix在当前对象的起始位置

return startsWith(suffix, value.length -suffix.value.length);

}

11.hashCode()函数

/*** 这是String字符串重写了Object类的hashCode方法。

* 给由哈希表来实现的数据结构来使用，比如String对象要放入HashMap中。

* 如果没有重写HashCode，或HaseCode质量很差则会导致严重的后果，既不靠谱的后果*/

public inthashCode() {int h =hash;if (h == 0 && value.length > 0) {char val[] =value;//重点，String的哈希函数,//遍历len次

for (int i = 0; i < value.length; i++) {

h= 31 * h + val[i]; //每次都是31 * 每次循环获得的h +第i个字符的ASSIC码

}

hash=h;

}returnh;

}

所以我们可以知道：

hashCode的重点就是哈希函数

String的哈希函数就是循环len次，每次循环体为 31 * 每次循环获得的hash + 第i次循环的字符

12.indexOf、lastIndexOf类函数

/*** 返回ch对应的字符在字符串中第一次出现的位置，从字符串的索引0位置开始遍历,没有就返回-1

**/

public int indexOf(intch) {return indexOf(ch, 0);

}public int indexOf(int ch, intfromIndex) {final int max =value.length;if (fromIndex < 0) {

fromIndex= 0;

}else if (fromIndex >=max) {//Note: fromIndex might be near -1>>>1.

return -1;

}if (ch

final char[] value = this.value;for (int i = fromIndex; i < max; i++) {if (value[i] ==ch) {returni;

}

}return -1;

}else{returnindexOfSupplementary(ch, fromIndex);

}

}/***.处理(罕见)带有补充字符的indexOf调用。*/

private int indexOfSupplementary(int ch, intfromIndex) {if(Character.isValidCodePoint(ch)) {final char[] value = this.value;final char hi =Character.highSurrogate(ch);final char lo =Character.lowSurrogate(ch);final int max = value.length - 1;for (int i = fromIndex; i < max; i++) {if (value[i] == hi && value[i + 1] ==lo) {returni;

}

}return -1;

}/*** 从尾部向头部遍历，返回cn第一次出现的位置，value.length - 1就是起点

* 为了理解，我们可以认为是返回cn对应的字符在字符串中最后出现的位置

* ch是字符对应的整数*/

public int lastIndexOf(intch) {return lastIndexOf(ch, value.length - 1);

}/*** 从尾部向头部遍历，从fromIndex开始作为起点，返回ch对应字符第一次在字符串出现的位置

* 既从头向尾遍历，返回cn对应字符在字符串中最后出现的一次位置，fromIndex为结束点

**/

public int lastIndexOf(int ch, intfromIndex) {if (ch

final char[] value = this.value;int i = Math.min(fromIndex, value.length - 1);for (; i >= 0; i--) {if (value[i] ==ch) {returni;

}

}return -1;

}else{returnlastIndexOfSupplementary(ch, fromIndex);

}

}/*** Handles (rare) calls of lastIndexOf with a supplementary character.*/

private int lastIndexOfSupplementary(int ch, intfromIndex) {if(Character.isValidCodePoint(ch)) {final char[] value = this.value;char hi =Character.highSurrogate(ch);char lo =Character.lowSurrogate(ch);int i = Math.min(fromIndex, value.length - 2);for (; i >= 0; i--) {if (value[i] == hi && value[i + 1] ==lo) {returni;

}

}return -1;

}

//从fromIndex开始遍历，返回第一次出现str字符串的位置public intindexOf(String str) {return indexOf(str, 0);

}/*** 从fromIndex开始遍历，返回第一次出现str字符串的位置*/

public int indexOf(String str, intfromIndex) {return indexOf(value, 0, value.length,

str.value,0, str.value.length, fromIndex);

}/*** 这是一个不对外公开的静态函数

* source就是原始字符串，sourceOffset就是原始字符串的偏移量，起始位置。

* sourceCount就是原始字符串的长度，target就是要查找的字符串。

* fromIndex就是从原始字符串的第fromIndex开始遍历

**/

static int indexOf(char[] source, int sourceOffset, intsourceCount,

String target,intfromIndex) {returnindexOf(source, sourceOffset, sourceCount,

target.value,0, target.value.length,

fromIndex);

}/*** 同是一个不对外公开的静态函数

* 比上更为强大。

* 多了一个targetOffset和targetCount，即代表被查找的字符串也可以被切割*/

static int indexOf(char[] source, int sourceOffset, intsourceCount,char[] target, int targetOffset, inttargetCount,intfromIndex) {if (fromIndex >=sourceCount) {return (targetCount == 0 ? sourceCount : -1);

}if (fromIndex < 0) {

fromIndex= 0;

}if (targetCount == 0) {returnfromIndex;

}char first =target[targetOffset];int max = sourceOffset + (sourceCount -targetCount);for (int i = sourceOffset + fromIndex; i <= max; i++) {/*Look for first character.*/

if (source[i] !=first) {while (++i <= max && source[i] !=first);

}/*Found first character, now look at the rest of v2*/

if (i <=max) {int j = i + 1;int end = j + targetCount - 1;for (int k = targetOffset + 1; j < end &&source[j]== target[k]; j++, k++);if (j ==end) {/*Found whole string.*/

return i -sourceOffset;

}

}return -1;

}//查找字符串Str最后一次出现的位置

public intlastIndexOf(String str) {returnlastIndexOf(str, value.length);

}public int lastIndexOf(String str, intfromIndex) {return lastIndexOf(value, 0, value.length,

str.value,0, str.value.length, fromIndex);

}static int lastIndexOf(char[] source, int sourceOffset, intsourceCount,

String target,intfromIndex) {returnlastIndexOf(source, sourceOffset, sourceCount,

target.value,0, target.value.length,

fromIndex);

}static int lastIndexOf(char[] source, int sourceOffset, intsourceCount,char[] target, int targetOffset, inttargetCount,intfromIndex) {/** Check arguments; return immediately where possible. For

* consistency, don't check for null str.*/

int rightIndex = sourceCount -targetCount;if (fromIndex < 0) {return -1;

}if (fromIndex >rightIndex) {

fromIndex=rightIndex;

}/*Empty string always matches.*/

if (targetCount == 0) {returnfromIndex;

}int strLastIndex = targetOffset + targetCount - 1;char strLastChar =target[strLastIndex];int min = sourceOffset + targetCount - 1;int i = min +fromIndex;

startSearchForLastChar:while (true) {while (i >= min && source[i] !=strLastChar) {

i--;

}if (i

}int j = i - 1;int start = j - (targetCount - 1);int k = strLastIndex - 1;while (j >start) {if (source[j--] != target[k--]) {

i--;continuestartSearchForLastChar;

}

}return start - sourceOffset + 1;

}

13.substring()函数

/*** 截取当前字符串对象的片段，组成一个新的字符串对象

* beginIndex为截取的初始位置，默认截到len - 1位置*/

public String substring(intbeginIndex) {if (beginIndex < 0) {throw newStringIndexOutOfBoundsException(beginIndex);

}int subLen = value.length -beginIndex;if (subLen < 0) {throw newStringIndexOutOfBoundsException(subLen);

}return (beginIndex == 0) ? this : newString(value, beginIndex, subLen);

}/*** 截取一个区间范围

* [beginIndex,endIndex)，不包括endIndex*/

public String substring(int beginIndex, intendIndex) {…..}

从上面可以看到：

substring函数是一个不完全闭包的区间，是[beginIndex,end)，不包括end位置;

subString的原理是通过String的构造函数实现的。

14.concat()函数

/*** String的拼接函数

* 例如:String str = "abc"; str.concat("def") output: "abcdef"*/

publicString concat(String str) {int otherLen =str.length();if (otherLen == 0) {return this;

}int len =value.length;//将数组扩容，将value数组拷贝到buf数组中，长度为len + str.lenght

char buf[] = Arrays.copyOf(value, len +otherLen);//然后将str字符串从buf字符数组的len位置开始覆盖，得到一个完整的buf字符数组

str.getChars(buf, len);//构建新的String对象，调用私有的String构造方法

return new String(buf, true);

}

15.replace、replaceAll类函数

//替换，将字符串中的oldChar字符全部替换成newChar

public String replace(char oldChar, charnewChar) {if (oldChar !=newChar) {int len =value.length;int i = -1;char[] val = value; /*avoid getfield opcode*/

//先找到第一个旧的字符位置,然后再进行替换

while (++i

}

`if (i

buf[j]=val[j];

}while (i

buf[i]= (c == oldChar) ?newChar : c;

i++;

}return new String(buf, true);

}

}return this;

}//替换第一个旧字符

publicString replaceFirst(String regex, String replacement) {return Pattern.compile(regex).matcher(this).replaceFirst(replacement);

}//当不是正规表达式时，与replace效果一样，都是全体换。如果字符串的正则表达式，则规矩表达式全体替换

publicString replaceAll(String regex, String replacement) {return Pattern.compile(regex).matcher(this).replaceAll(replacement);

}//可以用旧字符串去替换新字符串

publicString replace(CharSequence target, CharSequence replacement) {returnPattern.compile(target.toString(), Pattern.LITERAL).matcher(this).replaceAll(Matcher.quoteReplacement(replacement.toString()));

}

从replace的算法中，我们可以发现: 四种用法，字符全替换字符，表达式全体换字符，表达式只替换第一个字符，字符串替换字符串

16.matches()和contains()函数

/*** matches() 方法用于检测字符串是否匹配给定的正则表达式。

* regex -- 匹配字符串的正则表达式。

* 如：String Str = new String("www.snailmann.com");

* System.out.println(Str.matches("(.*)snailmann(.*)")); output:true

* System.out.println(Str.matches("www(.*)")); output:true*/

public booleanmatches(String regex) {return Pattern.matches(regex, this);

}//是否含有CharSequence这个子类元素，通常用于StrngBuffer,StringBuilder

public booleancontains(CharSequence s) {return indexOf(s.toString()) > -1;

}

17.split()函数

//根据正则表达式切割

public String[] split(String regex, intlimit) {/*fastpath if the regex is a

(1)one-char String and this character is not one of the

RegEx's meta characters ".$|()[{^?*+\\", or

(2)two-char String and the first char is the backslash and

the second is not the ascii digit or ascii letter.*/

char ch = 0;if (((regex.value.length == 1 &&

".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||(regex.length()== 2 &&regex.charAt(0) == '\\' &&(((ch= regex.charAt(1))-'0')|('9'-ch)) < 0 &&((ch-'a')|('z'-ch)) < 0 &&((ch-'A')|('Z'-ch)) < 0)) &&(ch< Character.MIN_HIGH_SURROGATE ||ch>Character.MAX_LOW_SURROGATE))

{int off = 0;int next = 0;boolean limited = limit > 0;

ArrayList list = new ArrayList<>();while ((next = indexOf(ch, off)) != -1) {if (!limited || list.size() < limit - 1) {

list.add(substring(off, next));

off= next + 1;

}else { //last one//assert (list.size() == limit - 1);

list.add(substring(off, value.length));

off=value.length;break;

}

}//If no match was found, return this

if (off == 0)return new String[]{this};//Add remaining segment

if (!limited || list.size()

list.add(substring(off, value.length));//Construct result

int resultSize =list.size();if (limit == 0) {while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {

resultSize--;

}

String[] result= newString[resultSize];return list.subList(0, resultSize).toArray(result);

}return Pattern.compile(regex).split(this, limit);

}publicString[] split(String regex) {return split(regex, 0);

}

18.join()函数

/*** join方法是JDK1.8加入的新函数，静态方法

* 这个方法就是跟split有些对立的函数,不过join是静态方法

* delimiter就是分割符，后面就是要追加的可变参数，

* 例子: String message = String.join("-", "Java", "is", "cool");

* // message returned is: "Java-is-cool"

*@since1.8*/

public staticString join(CharSequence delimiter, CharSequence... elements) {

Objects.requireNonNull(delimiter);

Objects.requireNonNull(elements);//Number of elements not likely worth Arrays.stream overhead.

StringJoiner joiner = newStringJoiner(delimiter);for(CharSequence cs: elements) {

joiner.add(cs);

}returnjoiner.toString();

}/*** 功能是一样的，不过传入的参数不同

* 这里第二个参数一般就是装着CharSequence子类的集合

* 比如String.join(",",lists)

* list可以是一个Collection接口实现类，所含元素的基类必须是CharSequence类型

* 比如String,StringBuilder,StringBuffer等*/public staticString join(CharSequence delimiter,

Iterable extends CharSequence>elements) {

Objects.requireNonNull(delimiter);

Objects.requireNonNull(elements);

StringJoiner joiner= newStringJoiner(delimiter);for(CharSequence cs: elements) {

joiner.add(cs);

}returnjoiner.toString();

}

Java 1.8加入的新功能，有点跟split对立的意思，是个静态方法

有两个重载方法，一个是直接传字符串数组，另个是传集合。传集合的方式是一个好功能，很方遍将集合的字符串元素拼接成一个字符串。

19.trim()函数

/*** 去除字符串首尾部分的空值，如,' ' or " ",非""

* 原理是通过substring去实现的，首尾各一个指针

* 头指针发现空值就++，尾指针发现空值就--

* ' '的Int值为32，其实不仅仅是去空的作用，应该是整数值小于等于32的去除掉*/

publicString trim() {int len =value.length;int st = 0;char[] val = value; /*avoid getfield opcode*/

while ((st < len) && (val[st] <= ' ')) {

st++;

}while ((st < len) && (val[len - 1] <= ' ')) {

len--;

}return ((st > 0) || (len < value.length)) ? substring(st, len) : this;

}

20.toString()函数

publicString toString() {return this;

}

21.toCharArray()函数

/*** 就是将String转换为字符数组并返回*/

public char[] toCharArray() {//Cannot use Arrays.copyOf because of class initialization order issues

char result[] = new char[value.length];

System.arraycopy(value,0, result, 0, value.length); //拷贝

returnresult;

}

22.toLowerCase()、toUpperCase()函数

方法有点长,略

23.format()函数

//JAVA字符串格式化//新字符串使用本地语言环境，制定字符串格式和参数生成格式化的新字符串。

public staticString format(String format, Object... args) {return newFormatter().format(format, args).toString();

}

例子:

String format= String.format("3>7的结果是：%b %n", 3 > 7);

System.out.println("format :" +format);//format :3>7的结果是：false

24.valueOf类函数

//将Object转换为String

public staticString valueOf(Object obj) {return (obj == null) ? "null": obj.toString();

}//将char数组转换为String

public static String valueOf(chardata[]) {return newString(data);

}//将char数组的子数组转换为String

public static String valueOf(char data[], int offset, intcount) {return newString(data, offset, count);

}//将布尔值转换为String

public static String valueOf(booleanb) {return b ? "true" : "false";

}//将单个字符转换为String

public static String valueOf(charc) {char data[] ={c};return new String(data, true);

}//将int转换为String, long, float, double同理

public static String valueOf(inti) {returnInteger.toString(i);

}

copyValueOf和valueOf在Java8看来已经是完全没有区别的函数; 所有的value的本质都是新new一个String对象.

25.intern()函数

public nativeString intern();

String类中唯一的一条本地方法，既不是用Java语言实现的方法。比如str.intern(),作用就是去字符串常量池中寻找str字符串，如果有则返回str在常量池中的引用，如果没有则在常量池中创建str

原文:

https://blog.csdn.net/snailmann/article/details/80882719

java string底层实现_String底层原理学习笔记

悦读