UnicodeSet

Kotlin |Java

open class UnicodeSet : UnicodeFilter, Comparable<UnicodeSet!>, Freezable<UnicodeSet!>, MutableIterable<String!>

kotlin.Any
↳	android.icu.text.UnicodeFilter
	↳	android.icu.text.UnicodeSet

A mutable set of Unicode characters and multicharacter strings. Objects of this class represent character classes used in regular expressions. A character specifies a subset of Unicode code points. Legal code points are U+0000 to U+10FFFF, inclusive. Note: method freeze() will not only make the set immutable, but also makes important methods much higher performance: contains(c), containsNone(...), span(...), spanBack(...) etc. After the object is frozen, any subsequent call that wants to change the object will throw UnsupportedOperationException.

The UnicodeSet class is not designed to be subclassed.

UnicodeSet supports two APIs. The first is the operand API that allows the caller to modify the value of a UnicodeSet object. It conforms to Java 2's java.util.Set interface, although UnicodeSet does not actually implement that interface. All methods of Set are supported, with the modification that they take a character range or single character instead of an Object, and they take a UnicodeSet instead of a Collection. The operand API may be thought of in terms of boolean logic: a boolean OR is implemented by add, a boolean AND is implemented by retain, a boolean XOR is implemented by complement taking an argument, and a boolean NOT is implemented by complement with no argument. In terms of traditional set theory function names, add is a union, retain is an intersection, remove is an asymmetric difference, and complement with no argument is a set complement with respect to the superset range MIN_VALUE-MAX_VALUE

The second API is the applyPattern()/toPattern() API from the java.text.Format-derived classes. Unlike the methods that add characters, add categories, and control the logic of the set, the method applyPattern() sets all attributes of a UnicodeSet at once, based on a string pattern.

Pattern syntax

Patterns are accepted by the constructors and the applyPattern() methods and returned by the toPattern() method. These patterns follow a syntax similar to that employed by version 8 regular expression character classes. Here are some simple examples:

`[]`	No characters
`[a]`	The character 'a'
`[ae]`	The characters 'a' and 'e'
`[a-e]`	The characters 'a' through 'e' inclusive, in Unicode code point order
`[\\u4E01]`	The character U+4E01
`[a{ab}{ac}]`	The character 'a' and the multicharacter strings "ab" and "ac"
`[\p{Lu}]`	All characters in the general category Uppercase Letter

Any character may be preceded by a backslash in order to remove any special meaning. White space characters, as defined by the Unicode Pattern_White_Space property, are ignored, unless they are escaped.

Property patterns specify a set of characters having a certain property as defined by the Unicode standard. Both the POSIX-like "[:Lu:]" and the Perl-like syntax "\p{Lu}" are recognized. For a complete list of supported property patterns, see the User's Guide for UnicodeSet at https://unicode-org.github.io/icu/userguide/strings/unicodeset. Actual determination of property data is defined by the underlying Unicode database as implemented by UCharacter.

Patterns specify individual characters, ranges of characters, and Unicode property sets. When elements are concatenated, they specify their union. To complement a set, place a '^' immediately after the opening '['. Property patterns are inverted by modifying their delimiters; "[:^foo]" and "\P{foo}". In any other location, '^' has no special meaning.

Since ICU 70, "[^...]", "[:^foo]", "\P{foo}", and "[:binaryProperty=No:]" perform a “code point complement” (all code points minus the original set), removing all multicharacter strings, equivalent to .complement().removeAllStrings() . The complement() API function continues to perform a symmetric difference with all code points and thus retains all multicharacter strings.

Ranges are indicated by placing two a '-' between two characters, as in "a-z". This specifies the range of all characters from the left to the right, in Unicode order. If the left character is greater than or equal to the right character it is a syntax error. If a '-' occurs as the first character after the opening '[' or '[^', or if it occurs as the last character before the closing ']', then it is taken as a literal. Thus "[a\\-b]", "[-ab]", and "[ab-]" all indicate the same set of three characters, 'a', 'b', and '-'.

Sets may be intersected using the '&' operator or the asymmetric set difference may be taken using the '-' operator, for example, "[[:L:]&[\\u0000-\\u0FFF]]" indicates the set of all Unicode letters with values less than 4096. Operators ('&' and '|') have equal precedence and bind left-to-right. Thus "[[:L:]-[a-z]-[\\u0100-\\u01FF]]" is equivalent to "[[[:L:]-[a-z]]-[\\u0100-\\u01FF]]". This only really matters for difference; intersection is commutative.

`[a]`	The set containing 'a'
`[a-z]`	The set containing 'a' through 'z' and all letters in between, in Unicode order
`[^a-z]`	The set containing all characters but 'a' through 'z', that is, U+0000 through 'a'-1 and 'z'+1 through U+10FFFF
`[[pat1][pat2]]`	The union of sets specified by pat1 and pat2
`[[pat1]&[pat2]]`	The intersection of sets specified by pat1 and pat2
`[[pat1]-[pat2]]`	The asymmetric difference of sets specified by pat1 and pat2
`[:Lu:] or \p{Lu}`	The set of characters having the specified Unicode property; in this case, Unicode uppercase letters
`[:^Lu:] or \P{Lu}`	The set of characters not having the given Unicode property

Formal syntax

`pattern :=`	`('[' '^'? item* ']') \| property`
`item :=`	`char \| (char '-' char) \| pattern-expr`
`pattern-expr :=`	`pattern \| pattern-expr pattern \| pattern-expr op pattern`
`op :=`	`'&' \| '-'`
`special :=`	`'[' \| ']' \| '-'`
`char :=`	any character that is not`special \| ('\\'`any character`) \| ('\u' hex hex hex hex)`
`hex :=`	`'0' \| '1' \| '2' \| '3' \| '4' \| '5' \| '6' \| '7' \| '8' \| '9' \| 'A' \| 'B' \| 'C' \| 'D' \| 'E' \| 'F' \| 'a' \| 'b' \| 'c' \| 'd' \| 'e' \| 'f'`
`property :=`	a Unicode property set pattern

Legend:

`a := b`		`a` may be replaced by `b`
`a?`		zero or one instance of `a`
`a*`		one or more instances of `a`
`a \| b`		either `a` or `b`
`'a'`		the literal string between the quotes

To iterate over contents of UnicodeSet, the following are available:

to iterate over the ranges: ranges(), #rangeStream()
to iterate over the strings: strings(), #stringStream()
to iterate over the code points: #codePoints(), #codePointStream()
to iterate over the entire contents in a single loop: this class itself is Iterable, or use #stream().
All of these method are, however, not particularly efficient, since they convert each individual code point to a String.

The iterators and streams methods work as expected in idiomatic Java usage.
The UnicodeSetIterator cannot be used in for loops, and it is not very Java-idiomatic, because it is old. But it might be faster in certain use cases. We recommend that you measure in performance sensitive code.

To replace, count elements, or delete spans, see UnicodeSetSpanner.

Summary

Nested classes
	`ComparisonStyle` Comparison style enums used by `UnicodeSet.compareTo(UnicodeSet, ComparisonStyle)`.
open	`EntryRange` A struct-like class used for iteration through ranges, for faster iteration than by String.
	`SpanCondition` Argument values for whether span() and similar functions continue while the current character is contained vs.

Constants
static Int	`ADD_CASE_MAPPINGS` Adds all case mappings for each element in the set.
static Int	`CASE` Alias for `CASE_INSENSITIVE`.
static Int	`CASE_INSENSITIVE` Enable case insensitive matching.
static Int	`IGNORE_SPACE` Bitmask for constructor and applyPattern() indicating that white space should be ignored.
static Int	`MAX_VALUE` Maximum value that can be stored in a UnicodeSet.
static Int	`MIN_VALUE` Minimum value that can be stored in a UnicodeSet.
static Int	`SIMPLE_CASE_INSENSITIVE` Enable case insensitive matching.

Public constructors
`UnicodeSet()` Constructs an empty set.
`UnicodeSet(other: UnicodeSet!)` Constructs a copy of an existing set.
`UnicodeSet(start: Int, end: Int)` Constructs a set containing the given range.
`UnicodeSet(vararg pairs: Int)` Quickly constructs a set from a set of ranges <s0, e0, s1, e1, s2, e2, .
`UnicodeSet(pattern: String!)` Constructs a set from the given pattern.
`UnicodeSet(pattern: String!, ignoreWhitespace: Boolean)` Constructs a set from the given pattern.
`UnicodeSet(pattern: String!, options: Int)` Constructs a set from the given pattern.
`UnicodeSet(pattern: String!, pos: ParsePosition!, symbols: SymbolTable!)` Constructs a set from the given pattern.
`UnicodeSet(pattern: String!, pos: ParsePosition!, symbols: SymbolTable!, options: Int)` Constructs a set from the given pattern.

Public methods
open StringBuffer!	`_generatePattern(result: StringBuffer!, escapeUnprintable: Boolean)` Generate and append a string representation of this set to result.
open StringBuffer!	`_generatePattern(result: StringBuffer!, escapeUnprintable: Boolean, includeStrings: Boolean)` Generate and append a string representation of this set to result.
UnicodeSet!	`add(c: Int)` Adds the specified character to this set if it is not already present.
open UnicodeSet!	`add(start: Int, end: Int)` Adds the specified range to this set if it is not already present.
UnicodeSet!	`add(s: CharSequence!)` Adds the specified multicharacter to this set if it is not already present.
open UnicodeSet!	`add(source: MutableIterable<*>!)` Add the contents of the collection (as strings) into this UnicodeSet.
open UnicodeSet!	`addAll(c: UnicodeSet!)` Adds all of the elements in the specified set to this set if they're not already present.
open UnicodeSet!	`addAll(start: Int, end: Int)` Adds all characters in range (uses preferred naming convention).
UnicodeSet!	`addAll(s: CharSequence!)` Adds each of the characters in this string to the set.
open UnicodeSet!	`addAll(source: MutableIterable<*>!)` Add a collection (as strings) into this UnicodeSet.
open UnicodeSet!	`addAll(vararg collection: T)`
open T	`addAllTo(target: T)` Add the contents of the UnicodeSet (as strings) into a collection.
open Unit	`addMatchSetTo(toUnionTo: UnicodeSet!)` Implementation of UnicodeMatcher API.
open UnicodeSet!	`applyIntPropertyValue(prop: Int, value: Int)` Modifies this set to contain those code points which have the given value for the given binary or enumerated property, as returned by UCharacter.
UnicodeSet!	`applyPattern(pattern: String!)` Modifies this set to represent the set specified by the given pattern.
open UnicodeSet!	`applyPattern(pattern: String!, ignoreWhitespace: Boolean)` Modifies this set to represent the set specified by the given pattern, optionally ignoring whitespace.
open UnicodeSet!	`applyPattern(pattern: String!, options: Int)` Modifies this set to represent the set specified by the given pattern, optionally ignoring whitespace.
open UnicodeSet!	`applyPropertyAlias(propertyAlias: String!, valueAlias: String!)` Modifies this set to contain those code points which have the given value for the given property.
open UnicodeSet!	`applyPropertyAlias(propertyAlias: String!, valueAlias: String!, symbols: SymbolTable!)` Modifies this set to contain those code points which have the given value for the given property.
open Int	`charAt(index: Int)` Returns the character at the given index within this set, where the set is ordered by ascending code point.
open UnicodeSet!	`clear()` Removes all of the elements from this set.
open Any	`clone()` Return a new set that is equivalent to this one.
open UnicodeSet!	`cloneAsThawed()` Clone a thawed version of this class, according to the Freezable interface.
open UnicodeSet!	`closeOver(attribute: Int)` Close this set over the given attribute.
open UnicodeSet!	`compact()` Reallocate this objects internal structures to take up the least possible space, without changing this object's value.
open Int	`compareTo(other: UnicodeSet!)` Compares UnicodeSets, where shorter come first, and otherwise lexicographically (according to the comparison of the first characters that differ).
open Int	`compareTo(o: UnicodeSet!, style: UnicodeSet.ComparisonStyle!)` Compares UnicodeSets, in three different ways.
open Int	`compareTo(other: MutableIterable<String!>!)`
open UnicodeSet!	`complement()` This is equivalent to `complement(MIN_VALUE, MAX_VALUE)`.
UnicodeSet!	`complement(c: Int)` Complements the specified character in this set.
open UnicodeSet!	`complement(start: Int, end: Int)` Complements the specified range in this set.
UnicodeSet!	`complement(s: CharSequence!)` Complement the specified string in this set.
open UnicodeSet!	`complementAll(c: UnicodeSet!)` Complements in this set all elements contained in the specified set.
UnicodeSet!	`complementAll(s: CharSequence!)` Complement EACH of the characters in this string.
open Boolean	`contains(c: Int)` Returns true if this set contains the given character.
open Boolean	`contains(start: Int, end: Int)` Returns true if this set contains every character of the given range.
Boolean	`contains(s: CharSequence!)` Returns true if this set contains the given multicharacter string.
open Boolean	`containsAll(b: UnicodeSet!)` Returns true if this set contains all the characters and strings of the given set.
open Boolean	`containsAll(collection: MutableIterable<T>!)`
open Boolean	`containsAll(s: String!)` Returns true if there is a partition of the string such that this set contains each of the partitioned strings.
open Boolean	`containsNone(b: UnicodeSet!)` Returns true if none of the characters or strings in this UnicodeSet appears in the string.
open Boolean	`containsNone(start: Int, end: Int)` Returns true if this set contains none of the characters of the given range.
open Boolean	`containsNone(s: CharSequence!)` Returns true if this set contains none of the characters of the given string.
open Boolean	`containsNone(collection: MutableIterable<T>!)`
Boolean	`containsSome(s: UnicodeSet!)` Returns true if this set contains one or more of the characters and strings of the given set.
Boolean	`containsSome(start: Int, end: Int)` Returns true if this set contains one or more of the characters in the given range.
Boolean	`containsSome(s: CharSequence!)` Returns true if this set contains one or more of the characters of the given string.
Boolean	`containsSome(collection: MutableIterable<T>!)`
open Boolean	`equals(other: Any?)` Compares the specified object with this set for equality.
open UnicodeSet!	`freeze()` Freeze this class, according to the Freezable interface.
open static UnicodeSet!	`from(s: CharSequence!)` Makes a set from a multicharacter string.
open static UnicodeSet!	`fromAll(s: CharSequence!)` Makes a set from each of the characters in the string.
open Int	`getRangeCount()` Iteration method that returns the number of ranges contained in this set.
open Int	`getRangeEnd(index: Int)` Iteration method that returns the last character in the specified range of this set.
open Int	`getRangeStart(index: Int)` Iteration method that returns the first character in the specified range of this set.
open Boolean	`hasStrings()`
open Int	`hashCode()` Returns the hash code value for this set.
open Int	`indexOf(c: Int)` Returns the index of the given character within this set, where the set is ordered by ascending code point.
open Boolean	`isEmpty()` Returns true if this set contains no elements.
open Boolean	`isFrozen()` Is this frozen, according to the Freezable interface?
open MutableIterator<String!>	`iterator()` Returns a string iterator.
open Int	`matches(text: Replaceable!, offset: IntArray!, limit: Int, incremental: Boolean)` Implementation of UnicodeMatcher.
open Boolean	`matchesIndexValue(v: Int)` Implementation of UnicodeMatcher API.
open MutableIterable<UnicodeSet.EntryRange!>!	`ranges()` Provide for faster iteration than by String.
UnicodeSet!	`remove(c: Int)` Removes the specified character from this set if it is present.
open UnicodeSet!	`remove(start: Int, end: Int)` Removes the specified range from this set if it is present.
UnicodeSet!	`remove(s: CharSequence!)` Removes the specified string from this set if it is present.
open UnicodeSet!	`removeAll(c: UnicodeSet!)` Removes from this set all of its elements that are contained in the specified set.
UnicodeSet!	`removeAll(s: CharSequence!)` Remove EACH of the characters in this string.
open UnicodeSet!	`removeAll(collection: MutableIterable<T>!)`
UnicodeSet!	`removeAllStrings()` Remove all strings from this UnicodeSet
UnicodeSet!	`retain(c: Int)` Retain the specified character from this set if it is present.
open UnicodeSet!	`retain(start: Int, end: Int)` Retain only the elements in this set that are contained in the specified range.
UnicodeSet!	`retain(cs: CharSequence!)` Retain the specified string in this set if it is present.
open UnicodeSet!	`retainAll(c: UnicodeSet!)` Retains only the elements in this set that are contained in the specified set.
UnicodeSet!	`retainAll(s: CharSequence!)` Retains EACH of the characters in this string.
open UnicodeSet!	`retainAll(collection: MutableIterable<T>!)`
open UnicodeSet!	`set(other: UnicodeSet!)` Make this object represent the same set as `other`.
open UnicodeSet!	`set(start: Int, end: Int)` Make this object represent the range `start - end`.
open Int	`size()` Returns the number of elements in this set (its cardinality) Note than the elements of a set may include both individual codepoints and strings.
open Int	`span(s: CharSequence!, spanCondition: UnicodeSet.SpanCondition!)` Span a string using this UnicodeSet.
open Int	`span(s: CharSequence!, start: Int, spanCondition: UnicodeSet.SpanCondition!)` Span a string using this UnicodeSet.
open Int	`spanBack(s: CharSequence!, spanCondition: UnicodeSet.SpanCondition!)` Span a string backwards (from the end) using this UnicodeSet.
open Int	`spanBack(s: CharSequence!, fromIndex: Int, spanCondition: UnicodeSet.SpanCondition!)` Span a string backwards (from the fromIndex) using this UnicodeSet.
open MutableCollection<String!>!	`strings()` For iterating through the strings in the set.
open String!	`toPattern(escapeUnprintable: Boolean)` Returns a string representation of this set.
open String	`toString()` Return a programmer-readable string representation of this object.

Properties
static UnicodeSet!	`ALL_CODE_POINTS` Constant for the set of all code points.
static UnicodeSet!	`EMPTY` Constant for the empty set.

Parameters
`start`	Int: first character, inclusive, of range
`end`	Int: last character, inclusive, of range

Parameters
`pattern`	String!: a string specifying what characters are in the set
`ignoreWhitespace`	Boolean: if true, ignore Unicode Pattern_White_Space characters

Parameters
`pattern`	String!: a string specifying what characters are in the set
`options`	Int: a bitmask indicating which options to apply. Valid options are `IGNORE_SPACE` and at most one of `CASE_INSENSITIVE`, `ADD_CASE_MAPPINGS`, `SIMPLE_CASE_INSENSITIVE`. These case options are mutually exclusive.

Parameters
`pattern`	String!: a string specifying what characters are in the set
`pos`	ParsePosition!: on input, the position in pattern at which to start parsing. On output, the position after the last character parsed.
`symbols`	SymbolTable!: a symbol table mapping variables to char[] arrays and chars to UnicodeSets

Parameters
`result`	StringBuffer!: the buffer into which to generate the pattern
`escapeUnprintable`	Boolean: escape unprintable characters if true

Parameters
`start`	Int: The index of where to start on adding all characters.
`end`	Int: The index of where to end on adding all characters.

Parameters
`prop`	Int: a property in the range UProperty.BIN_START..UProperty.BIN_LIMIT-1 or UProperty.INT_START..UProperty.INT_LIMIT-1 or. UProperty.MASK_START..UProperty.MASK_LIMIT-1.
`value`	Int: a value in the range UCharacter.getIntPropertyMinValue(prop).. UCharacter.getIntPropertyMaxValue(prop), with one exception. If prop is UProperty.GENERAL_CATEGORY_MASK, then value should not be a UCharacter.getType() result, but rather a mask value produced by logically ORing (1 << UCharacter.getType()) values together. This allows grouped categories such as [:L:] to be represented.

Parameters
`propertyAlias`	String!: A string of the property alias.
`valueAlias`	String!: A string of the value alias.
`symbols`	SymbolTable!: if not null, then symbols are first called to see if a property is available. If true, then everything else is skipped.

Exceptions
`java.lang.NullPointerException`	if the specified object is null
`java.lang.ClassCastException`	if the specified object's type prevents it from being compared to this object.

Parameters
`start`	Int: first character, inclusive, of the range
`end`	Int: last character, inclusive, of the range

Parameters
`obj`	the reference object with which to compare.
`o`	Object to be compared for equality with this set.

Parameters
`text`	Replaceable!: the text to be matched
`offset`	IntArray!: on input, the index into text at which to begin matching. On output, the limit of the matched text. The number of matched characters is the output value of offset minus the input value. Offset should always point to the HIGH SURROGATE (leading code unit) of a pair of surrogates, both on entry and upon return.
`limit`	Int: the limit index of text to be matched. Greater than offset for a forward direction match, less than offset for a backward direction match. The last character to be considered for matching will be text.charAt(limit-1) in the forward direction or text.charAt(limit+1) in the backward direction.
`incremental`	Boolean: if true, then assume further characters may be inserted at limit and check for partial matching. Otherwise assume the text as given is complete.

Parameters
`start`	Int: first character in the set, inclusive
`end`	Int: last character in the set, inclusive

Parameters
`s`	CharSequence!: The string to be spanned
`spanCondition`	UnicodeSet.SpanCondition!: The span condition

Parameters
`s`	CharSequence!: The string to be spanned
`start`	Int: The start index that the span begins
`spanCondition`	UnicodeSet.SpanCondition!: The span condition

Parameters
`s`	CharSequence!: The string to be spanned
`fromIndex`	Int: The index of the char (exclusive) that the string should be spanned backwards
`spanCondition`	UnicodeSet.SpanCondition!: The span condition

UnicodeSet

Summary

Constants

ADD_CASE_MAPPINGS

CASE

CASE_INSENSITIVE

IGNORE_SPACE

MAX_VALUE

MIN_VALUE

SIMPLE_CASE_INSENSITIVE

Public constructors

UnicodeSet

UnicodeSet

UnicodeSet

UnicodeSet

UnicodeSet

UnicodeSet

UnicodeSet

UnicodeSet

UnicodeSet

Public methods

_generatePattern

_generatePattern

add

add

add

add

addAll

addAll

addAll

addAll

addAll

addAllTo

addMatchSetTo

applyIntPropertyValue

applyPattern

applyPattern

applyPattern

applyPropertyAlias

applyPropertyAlias

charAt

clear

clone

cloneAsThawed

closeOver

compact

compareTo

compareTo

compareTo

complement

complement

complement

complement

complementAll

complementAll

contains

contains

contains

containsAll

containsAll

containsAll

containsNone

containsNone

containsNone

containsNone

containsSome

containsSome

containsSome

containsSome

equals

freeze

from

fromAll

getRangeCount

getRangeEnd

getRangeStart

hasStrings

hashCode

indexOf

isEmpty