18.字符串-[译] 写给不耐烦程序员的 JavaScript

[译] 写给不耐烦程序员的 JavaScript

18.字符串

原文： http://exploringjs.com/impatient-js/ch_strings.html

字符串是 JavaScript 中的原始值和不可变的。也就是说，与字符串相关的操作总是会生成新的字符串，并且永远不会更改

18.1。普通字符串字面值

纯字符串字面值由单引号或双引号分隔：

const str1 = 'abc';
const str2 = "abc";
assert.equal(str1, str2);

单引号更常用，因为它更容易用双引号提及 HTML。

下一章涵盖 _ 模板字面值 _，它们为您提供：

字符串插值
多行
原始字符串字面值（反斜杠没有特殊含义）

18.1.1。逃离

反斜杠可让您创建特殊字符：

Unix 换行符：'\n'
Windows 换行符：'\r\n'
标签：'\t'
反斜杠：'\\'

反斜杠还允许您使用该字面值内的字符串字面值的分隔符：

'She said: "Let\'s go!"'
"She said: \"Let's go!\""

18.2。访问字符和代码点

18.2.1。访问 JavaScript 角色

JavaScript 没有字符的额外数据类型 - 字符总是作为字符串传输。

const str = 'abc';

// Reading a character at a given index
assert.equal(str[1], 'b');

// Counting the characters in a string:
assert.equal(str.length, 3);

18.2.2。通过`for-of`访问 Unicode 代码点并进行传播

通过for-of或扩展（...）迭代字符串访问 Unicode 代码点。每个代码点由 1-2 个 JavaScript 字符表示。有关详细信息，请参阅本章中有关原子的部分。

这是通过for-of迭代字符串的代码点的方法：

for (const ch of 'abc') {
  console.log(ch);
}
// Output:
// 'a'
// 'b'
// 'c'

这就是你通过传播将字符串转换为代码点数组的方法：

assert.deepEqual([...'abc'], ['a', 'b', 'c']);

18.3。通过`+`进行字符串连接

如果至少有一个操作数是字符串，则加号运算符（+）将任何非字符串转换为字符串并连接结果：

assert.equal(3 + ' times ' + 4, '3 times 4');

如果要逐个组合字符串，则赋值运算符+=非常有用：

let str = ''; // must be `let`!
str += 'Say it';
str += ' one more';
str += ' time';

assert.equal(str, 'Say it one more time');

顺便说一句，这种组合字符串的方式非常有效，因为大多数 JavaScript 引擎都在内部优化它。

练习：连接字符串

exercises/strings/concat_string_array_test.js

18.4。转换为字符串

这是将值x转换为字符串的三种方法：

String(x)
''+x
x.toString()（对undefined和null不起作用）

建议：使用描述性和安全性String()。

例子：

assert.equal(String(undefined), 'undefined');
assert.equal(String(null), 'null');

assert.equal(String(false), 'false');
assert.equal(String(true), 'true');

assert.equal(String(123.45), '123.45');

布尔值的陷阱：如果通过String()将布尔值转换为字符串，则无法通过Boolean()将其转换回来。

> String(false)
'false'
> Boolean('false')
true

18.4.1。字符串化对象

普通对象具有一个不太有用的默认表示：

> String({a: 1})
'[object Object]'

数组有更好的字符串表示，但它仍然隐藏了很多信息：

> String(['a', 'b'])
'a,b'
> String(['a', ['b']])
'a,b'

> String([1, 2])
'1,2'
> String(['1', '2'])
'1,2'

> String([true])
'true'
> String(['true'])
'true'
> String(true)
'true'

字符串化函数返回其源代码：

> String(function f() {return 4})
'function f() {return 4}'

18.4.2。自定义对象的字符串化

您可以通过实现方法toString()来覆盖字符串化对象的内置方式：

const obj = {
  toString() {
    return 'hello';
  }
};

assert.equal(String(obj), 'hello');

18.4.3。字符串化值的另一种方法

JSON 数据格式是 JavaScript 值的文本表示。因此，JSON.stringify()也可用于字符串化数据：

> JSON.stringify({a: 1})
'{"a":1}'
> JSON.stringify(['a', ['b']])
'["a",["b"]]'

需要注意的是，JSON 仅支持null，布尔值，数字，字符串，数组和对象（它总是将它们视为由对象字面值创建）。

提示：第三个参数允许您打开多行输出并指定缩进量。例如：

console.log(JSON.stringify({first: 'Jane', last: 'Doe'}, null, 2));

该语句产生以下输出。

{
  "first": "Jane",
  "last": "Doe"
}

18.5。比较字符串

可以通过以下运算符比较字符串：

< <= > >=

需要考虑一个重要的警告：这些运算符基于 JavaScript 字符的数值进行比较。这意味着 JavaScript 用于字符串的顺序与字典和调用簿中使用的顺序不同：

> 'A' < 'B' // ok
true
> 'a' < 'B' // not ok
false
> 'ä' < 'b' // not ok
false

正确比较文本超出了本书的范围。它通过支持 ECMAScript 国际化 API （Intl）。

18.6。文本原子：JavaScript 字符，代码点，字形集群

快速回顾关于 Unicode 的章节：

代码点：Unicode 字符，范围为21 位。
UTF-16 代码单元：JavaScript 字符，范围为16 位。代码点编码为 1-2 个 UTF-16 代码单元。
字形簇：_ 字素 _ 是书面符号，显示在屏幕或纸上。 _ 字形簇 _ 是编码字素的 1 个或多个代码点的序列。

要在 JavaScript 字符串中表示代码点，请使用一个或两个 JavaScript 字符。通过.length计算字符时可以看到：

// 3 Unicode code points, 3 JavaScript characters:
assert.equal('abc'.length, 3);

// 1 Unicode code point, 2 JavaScript characters:
assert.equal('🙂'.length, 2);

18.6.1。使用代码点

让我们探索 JavaScript 用于处理代码点的工具。

_ 代码点转义 _ 允许您以十六进制的方式指定代码点。它们扩展为一个或两个 JavaScript 字符。

> '\u{1F642}'
'🙂'

从代码点转换：

> String.fromCodePoint(0x1F642)
'🙂'

转换为代码点：

> '🙂'.codePointAt(0).toString(16)
'1f642'

迭代表彰代码点。例如，基于迭代的for-of循环：

const str = '🙂a';
assert.equal(str.length, 3);

for (const codePoint of str) {
  console.log(codePoint);
}

// Output:
// '🙂'
// 'a'

或基于迭代的 _ 传播 _（...）：

> [...'🙂a']
[ '🙂', 'a' ]

因此，传播是计算代码点的好工具：

> [...'🙂a'].length
2
> '🙂a'.length
3

18.6.2。使用代码单元

字符串的索引和长度基于 JavaScript 字符（即代码单元）。

要以数字方式指定代码单位，可以使用 _ 代码单元转义 _：

> '\uD83D\uDE42'
'🙂'

你可以使用所谓的 _ 字符代码 _：

> String.fromCharCode(0xD83D) + String.fromCharCode(0xDE42)
'🙂'

要获取角色的 char 代码，请使用.charCodeAt()：

> '🙂'.charCodeAt(0).toString(16)
'd83d'

18.6.3。警告：字形簇

处理可能使用任何人类语言编写的文本时，最好在字形集群的边界处进行拆分，而不是在代码单元的边界处进行拆分。

TC39 正在开发 Intl.Segmenter ，这是一项支持 ECMAScript 国际化 API 的提议，支持 Unicode 分段（沿着字形簇边界，字边界，句子边界等）。

在该提案成为标准之前，您可以使用几个可用库中的一个（对“JavaScript 字形”进行 Web 搜索）。

18.7。快速参考：字符串

字符串是不可变的，没有任何字符串方法可以修改它们的字符串。

18.7.1。转换为字符串

TBL。 16 描述了各种值如何转换为字符串。

Table 16: Converting values to strings.

`x`	`String(x)`
`undefined`	`'undefined'`
`null`	`'null'`
布尔值	`false` `→` `'false'`，`true` `→` `'true'`
数值	示例：`123` `→` `'123'`
字符串值	`x`（输入，未更改）
一个东西	可配置通过，例如，`toString()`

18.7.2。字符和代码点的数字值

字符代码： Unicode UTF-16 代码单位为数字
- String.fromCharCode()，String.prototype.charCodeAt()
- 精度：16 位，无符号
代码点： Unicode 代码指向数字
- String.fromCodePoint()，String.prototype.codePointAt()
- 精度：21 位，无符号（17 个平面，每个 16 位）

18.7.3。字符串运算符

// Access characters via []
const str = 'abc';
assert.equal(str[1], 'b');

// Concatenate strings via +
assert.equal('a' + 'b' + 'c', 'abc');
assert.equal('take ' + 3 + ' oranges', 'take 3 oranges');

18.7.4。 `String.prototype`：寻找和匹配

.endsWith(searchString: string, endPos=this.length): boolean [ES6]

如果字符串以searchString结束，如果其长度为endPos，则返回true。否则返回false。
```
> 'foo.txt'.endsWith('.txt')
true
> 'abcde'.endsWith('cd', 4)
true
```
.includes(searchString: string, startPos=0): boolean [ES6]

如果字符串包含searchString和false，则返回true。搜索从startPos开始。
```
> 'abc'.includes('b')
true
> 'abc'.includes('b', 2)
false
```
.indexOf(searchString: string, minIndex=0): number [ES1]

返回字符串中searchString出现的最低索引，否则返回-1。任何返回的索引都是minIndex或更高。
```
> 'abab'.indexOf('a')
0
> 'abab'.indexOf('a', 1)
2
> 'abab'.indexOf('c')
-1
```
.lastIndexOf(searchString: string, maxIndex=Infinity): number [ES1]

返回字符串中出现searchString的最高索引，否则返回-1。任何返回的索引都是maxIndex或更低。
```
> 'abab'.lastIndexOf('ab', 2)
2
> 'abab'.lastIndexOf('ab', 1)
0
> 'abab'.lastIndexOf('ab')
2
```
.match(regExp: string | RegExp): RegExpMatchArray | null [ES3]

如果regExp是未设置标志/g的正则表达式，则.match()返回字符串中regExp的第一个匹配项。或者null如果没有匹配。如果regExp是字符串，则在执行前面的步骤之前，它用于创建正则表达式。

结果具有以下类型：
```
interface RegExpMatchArray extends Array<string> {
  index: number;
  input: string;
  groups: undefined | {
    [key: string]: string
  };
}
```
编号的捕获组成为数组索引。命名捕获组（ES2018）成为.groups的属性。在此模式下，.match()的作用类似于RegExp.prototype.exec()。

例子：
```
> 'ababb'.match(/a(b+)/)
{ 0: 'ab', 1: 'b', index: 0, input: 'ababb', groups: undefined }
> 'ababb'.match(/a(?<foo>b+)/)
{ 0: 'ab', 1: 'b', index: 0, input: 'ababb', groups: { foo: 'b' } }
> 'abab'.match(/x/)
null
```
.match(regExp: RegExp): string[] | null [ES3]

如果设置了regExp的标志/g，.match()将返回一个包含所有匹配项的数组，如果没有匹配则返回null。
```
> 'ababb'.match(/a(b+)/g)
[ 'ab', 'abb' ]
> 'ababb'.match(/a(?<foo>b+)/g)
[ 'ab', 'abb' ]
> 'abab'.match(/x/g)
null
```
.search(regExp: string | RegExp): number [ES3]

返回字符串中regExp出现的索引。如果regExp是字符串，则用于创建正则表达式。
```
> 'a2b'.search(/[0-9]/)
1
> 'a2b'.search('[0-9]')
1
```
.startsWith(searchString: string, startPos=0): boolean [ES6]

如果searchString出现在索引startPos的字符串中，则返回true。否则返回false。
```
> '.gitignore'.startsWith('.')
true
> 'abcde'.startsWith('bc', 1)
true
```

18.7.5。 `String.prototype`：提取

.slice(start=0, end=this.length): string [ES3]

返回从（包括）索引start开始并以（不包括）索引end结束的字符串的子字符串。您可以使用负指数，其中-1表示this.length-1（等）。
```
> 'abc'.slice(1, 3)
'bc'
> 'abc'.slice(1)
'bc'
> 'abc'.slice(-2)
'bc'
```
.split(separator: string | RegExp, limit?: number): string[] [ES3]

将字符串拆分为子字符串数组 - 分隔符之间出现的字符串。分隔符可以是字符串或正则表达式。正则表达式中由组进行的捕获包含在结果中。
```
> 'abc'.split('')
[ 'a', 'b', 'c' ]
> 'a | b | c'.split('|')
[ 'a ', ' b ', ' c' ]

> 'a : b : c'.split(/ *: */)
[ 'a', 'b', 'c' ]
> 'a : b : c'.split(/( *):( *)/)
[ 'a', ' ', ' ', 'b', ' ', ' ', 'c' ]
```
.substring(start: number, end=this.length): string [ES1]

使用.slice()代替此方法。 .substring()未在旧引擎中一致地实施，并且不支持负指数。

18.7.6。 `String.prototype`：合并

.concat(...strings: string[]): string [ES3]

返回字符串和strings的串联。 'a'+'b'相当于'a'.concat('b')，更简洁。
```
> 'ab'.concat('cd', 'ef', 'gh')
'abcdefgh'
```
.padEnd(len: number, fillString=' '): string [ES2017]

将fillString追加到字符串，直到它具有所需的长度len。
```
> '#'.padEnd(2)
'# '
> 'abc'.padEnd(2)
'abc'
> '#'.padEnd(5, 'abc')
'#abca'
```
.padStart(len: number, fillString=' '): string [ES2017]

将fillString添加到字符串，直到它具有所需的长度len。
```
> '#'.padStart(2)
' #'
> 'abc'.padStart(2)
'abc'
> '#'.padStart(5, 'abc')
'abca#'
```
.repeat(count=0): string [ES6]

返回一个字符串，重复count次。
```
> '*'.repeat()
''
> '*'.repeat(3)
'***'
```

18.7.7。 `String.prototype`：转型

.normalize(form: 'NFC'|'NFD'|'NFKC'|'NFKD' = 'NFC'): string [ES6]

根据 Unicode 规范化表单规范化字符串。
.replace(searchValue: string | RegExp, replaceValue: string): string [ES3]

将searchValue的匹配替换为replaceValue。如果searchValue是一个字符串，则只替换第一个逐字出现。如果searchValue是没有标志/g的正则表达式，则仅替换第一个匹配项。如果searchValue是带有/g的正则表达式，则替换所有匹配项。
```
> 'x.x.'.replace('.', '#')
'x#x.'
> 'x.x.'.replace(/./, '#')
'#.x.'
> 'x.x.'.replace(/./g, '#')
'####'
```
replaceValue中的特殊字符是：
- $$：成为$
- $n：成为编号组n的捕获（唉，$0不起作用）
- $&：成为完全匹配
- `$``：在比赛前成为一切
- $'：比赛结束后成为一切
例子：
```
> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$2|')
'a |04| b'
> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$&|')
'a |2020-04| b'
> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$`|')
'a |a | b'
```
也支持命名捕获组（ES2018）：
- $<name>成为命名组name的捕获
例：
```
> 'a 2020-04 b'.replace(/(?<year>[0-9]{4})-(?<month>[0-9]{2})/, '|$<month>|')
'a |04| b'
```
.replace(searchValue: string | RegExp, replacer: (...args: any[]) => string): string [ES3]

如果第二个参数是函数，则将其返回的字符串替换。其参数args是：
- matched: string：完全匹配
- g1: string|undefined：捕获编号的组 1
- g2: string|undefined：捕获编号的组 2
- （等等。）
- offset: number：在输入字符串中找到匹配的位置？
- input: string：整个输入字符串
```
const regexp = /([0-9]{4})-([0-9]{2})/;
const replacer = (all, year, month) => '|' + all + '|';
assert.equal(
  'a 2020-04 b'.replace(regexp, replacer),
  'a |2020-04| b');
```
也支持命名捕获组（ES2018）。如果有，则最后一个参数包含一个对象，其属性包含捕获：
```
const regexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/;
const replacer = (...args) => {
  const groups=args.pop();
  return '|' + groups.month + '|';
};
assert.equal(
  'a 2020-04 b'.replace(regexp, replacer),
  'a |04| b');
```
.toUpperCase(): string [ES1]

返回字符串的副本，其中所有小写字母字符都转换为大写。适用于各种字母表的效果取决于 JavaScript 引擎。
```
> '-a2b-'.toUpperCase()
'-A2B-'
> 'αβγ'.toUpperCase()
'ΑΒΓ
```
.toLowerCase(): string [ES1]

返回字符串的副本，其中所有大写字母字符都转换为小写。适用于各种字母表的效果取决于 JavaScript 引擎。
```
> '-A2B-'.toLowerCase()
'-a2b-'
> 'ΑΒΓ'.toLowerCase()
'αβγ'
```
.trim(): string [ES5]

返回字符串的副本，其中所有前导和尾随空格（空格，制表符，行终止符等）都消失了。
```
> '\r\n#\t  '.trim()
'#'
```
.trimEnd(): string [ES2019]

与.trim()类似，但仅修剪字符串的末尾：
```
> '  abc  '.trimEnd()
'  abc'
```
.trimStart(): string [ES2019]

与.trim()类似，但仅修剪字符串的开头：
```
> '  abc  '.trimStart()
'abc  '
```

18.7.8。 `String.prototype`：字符，字符代码，代码点

.charAt(pos: number): string [ES1]

返回索引pos处的字符，作为字符串（JavaScript 没有字符的数据类型）。 str[i]等同于str.charAt(i)并且更简洁（警告：可能不适用于旧引擎）。
```
> 'abc'.charAt(1)
'b'
```
.charCodeAt(pos: number): number [ES1]

返回索引pos处的 UTF-16 代码单元（字符）的 16 位数字（0-65535）。
```
> 'abc'.charCodeAt(1)
98
```
.codePointAt(pos: number): number | undefined [ES6]

返回索引pos处 1-2 个字符的 Unicode 代码点的 21 位数。如果没有这样的索引，则返回undefined。

18.7.9。来源

练习：使用字符串方法

exercises/strings/remove_extension_test.js

测验

参见测验应用程序。

[译] 写给不耐烦程序员的 JavaScript

18.字符串

18.1。普通字符串字面值

18.1.1。逃离

18.2。访问字符和代码点

18.2.1。访问 JavaScript 角色

18.2.2。通过`for-of`访问 Unicode 代码点并进行传播

18.3。通过`+`进行字符串连接

18.4。转换为字符串

18.4.1。字符串化对象

18.4.2。自定义对象的字符串化

18.4.3。字符串化值的另一种方法

18.5。比较字符串

18.6。文本原子：JavaScript 字符，代码点，字形集群

18.6.1。使用代码点

18.6.2。使用代码单元

18.6.3。警告：字形簇

18.7。快速参考：字符串

18.7.1。转换为字符串

18.7.2。字符和代码点的数字值

18.7.3。字符串运算符

18.7.4。 `String.prototype`：寻找和匹配

18.7.5。 `String.prototype`：提取

18.7.6。 `String.prototype`：合并

18.7.7。 `String.prototype`：转型

18.7.8。 `String.prototype`：字符，字符代码，代码点

18.7.9。来源

JSON风格指南

前端开发规范手册

ECMAScript 6入门

React 学习之道

对开发人员有用的定律、理论、原则和模式

Atom飞行手册

[译] 写给不耐烦程序员的 JavaScript

18.字符串

18.1。普通字符串字面值

18.1.1。逃离

18.2。访问字符和代码点

18.2.1。访问 JavaScript 角色

18.2.2。通过for-of访问 Unicode 代码点并进行传播

18.3。通过+进行字符串连接

18.4。转换为字符串

18.4.1。字符串化对象

18.4.2。自定义对象的字符串化

18.4.3。字符串化值的另一种方法

18.5。比较字符串

18.6。文本原子：JavaScript 字符，代码点，字形集群

18.6.1。使用代码点

18.6.2。使用代码单元

18.6.3。警告：字形簇

18.7。快速参考：字符串

18.7.1。转换为字符串

18.7.2。字符和代码点的数字值

18.7.3。字符串运算符

18.7.4。 String.prototype：寻找和匹配

18.7.5。 String.prototype：提取

18.7.6。 String.prototype：合并

18.7.7。 String.prototype：转型

18.7.8。 String.prototype：字符，字符代码，代码点

18.7.9。来源

JSON风格指南

前端开发规范手册

ECMAScript 6入门

React 学习之道

对开发人员有用的定律、理论、原则和模式

Atom飞行手册

18.2.2。通过`for-of`访问 Unicode 代码点并进行传播

18.3。通过`+`进行字符串连接

18.7.4。 `String.prototype`：寻找和匹配

18.7.5。 `String.prototype`：提取

18.7.6。 `String.prototype`：合并

18.7.7。 `String.prototype`：转型

18.7.8。 `String.prototype`：字符，字符代码，代码点