UnicodeSet.SpanCondition


public static final enum UnicodeSet.SpanCondition
extends Enum<UnicodeSet.SpanCondition>

java.lang.Object
   ↳ java.lang.Enum<android.icu.text.UnicodeSet.SpanCondition>
     ↳ android.icu.text.UnicodeSet.SpanCondition


Argument values for whether span() and similar functions continue while the current character is contained vs. not contained in the set.

The functionality is straightforward for sets with only single code points, without strings (which is the common case):

  • CONTAINED and SIMPLE work the same.
  • CONTAINED and SIMPLE are inverses of NOT_CONTAINED.
  • span() and spanBack() partition any string the same way when alternating between span(NOT_CONTAINED) and span(either "contained" condition).
  • Using a complemented (inverted) set and the opposite span conditions yields the same results.
When a set contains multi-code point strings, then these statements may not be true, depending on the strings in the set (for example, whether they overlap with each other) and the string that is processed. For a set with strings:
  • The complement of the set contains the opposite set of code points, but the same set of strings. Therefore, complementing both the set and the span conditions may yield different results.
  • When starting spans at different positions in a string (span(s, ...) vs. span(s+1, ...)) the ends of the spans may be different because a set string may start before the later position.
  • span(SIMPLE) may be shorter than span(CONTAINED) because it will not recursively try all possible paths. For example, with a set which contains the three strings "xy", "xya" and "ax", span("xyax", CONTAINED) will return 4 but span("xyax", SIMPLE) will return 3. span(SIMPLE) will never be longer than span(CONTAINED).
  • With either "contained" condition, span() and spanBack() may partition a string in different ways. For example, with a set which contains the two strings "ab" and "ba", and when processing the string "aba", span() will yield contained/not-contained boundaries of { 0, 2, 3 } while spanBack() will yield boundaries of { 0, 1, 3 }.
Note: If it is important to get the same boundaries whether iterating forward or backward through a string, then either only span() should be used and the boundaries cached for backward operation, or an ICU BreakIterator could be used.

Note: Unpaired surrogates are treated like surrogate code points. Similarly, set strings match only on code point boundaries, never in the middle of a surrogate pair.

Summary

Enum values

UnicodeSet.SpanCondition  CONDITION_COUNT

One more than the last span condition. 

UnicodeSet.SpanCondition  CONTAINED

Spans the longest substring that is a concatenation of set elements (characters or strings). 

UnicodeSet.SpanCondition  NOT_CONTAINED

Continues a span() while there is no set element at the current position. 

UnicodeSet.SpanCondition  SIMPLE

Continues a span() while there is a set element at the current position. 

Public methods

static UnicodeSet.SpanCondition valueOf(String name)
static final SpanCondition[] values()

Inherited methods

final Object clone()

Throws CloneNotSupportedException.

final int compareTo(UnicodeSet.SpanCondition o)

Compares this enum with the specified object for order.

final boolean equals(Object other)

Returns true if the specified object is equal to this enum constant.

final void finalize()

enum classes cannot have finalize methods.

final Class<UnicodeSet.SpanCondition> getDeclaringClass()

Returns the Class object corresponding to this enum constant's enum type.

final int hashCode()

Returns a hash code for this enum constant.

final String name()

Returns the name of this enum constant, exactly as declared in its enum declaration.

final int ordinal()

Returns the ordinal of this enumeration constant (its position in its enum declaration, where the initial constant is assigned an ordinal of zero).

String toString()

Returns the name of this enum constant, as contained in the declaration.

static <T extends Enum<T>> T valueOf(Class<T> enumClass, String name)

Returns the enum constant of the specified enum class with the specified name.

Object clone()

Creates and returns a copy of this object.

boolean equals(Object obj)

Indicates whether some other object is "equal to" this one.

void finalize()

Called by the garbage collector on an object when garbage collection determines that there are no more references to the object.

final Class<?> getClass()

Returns the runtime class of this Object.

int hashCode()

Returns a hash code value for the object.

final void notify()

Wakes up a single thread that is waiting on this object's monitor.

final void notifyAll()

Wakes up all threads that are waiting on this object's monitor.

String toString()

Returns a string representation of the object.

final void wait(long timeoutMillis, int nanos)

Causes the current thread to wait until it is awakened, typically by being notified or interrupted, or until a certain amount of real time has elapsed.

final void wait(long timeoutMillis)

Causes the current thread to wait until it is awakened, typically by being notified or interrupted, or until a certain amount of real time has elapsed.

final void wait()

Causes the current thread to wait until it is awakened, typically by being notified or interrupted.

abstract int compareTo(UnicodeSet.SpanCondition o)

Compares this object with the specified object for order.

Enum values

CONDITION_COUNT

Added in API level 24
public static final UnicodeSet.SpanCondition CONDITION_COUNT

One more than the last span condition.

CONTAINED

Added in API level 24
public static final UnicodeSet.SpanCondition CONTAINED

Spans the longest substring that is a concatenation of set elements (characters or strings). (For characters only, this is like while contains(current)==true).

When span() returns, the substring between where it started and the position it returned consists only of set elements (characters or strings) that are in the set.

If a set contains strings, then the span will be the longest substring for which there exists at least one non-overlapping concatenation of set elements (characters or strings). This is equivalent to a POSIX regular expression for (OR of each set element)*. (Java/ICU/Perl regex stops at the first match of an OR.)

NOT_CONTAINED

Added in API level 24
public static final UnicodeSet.SpanCondition NOT_CONTAINED

Continues a span() while there is no set element at the current position. Increments by one code point at a time. Stops before the first set element (character or string). (For code points only, this is like while contains(current)==false).

When span() returns, the substring between where it started and the position it returned consists only of characters that are not in the set, and none of its strings overlap with the span.

SIMPLE

Added in API level 24
public static final UnicodeSet.SpanCondition SIMPLE

Continues a span() while there is a set element at the current position. Increments by the longest matching element at each position. (For characters only, this is like while contains(current)==true).

When span() returns, the substring between where it started and the position it returned consists only of set elements (characters or strings) that are in the set.

If a set only contains single characters, then this is the same as CONTAINED.

If a set contains strings, then the span will be the longest substring with a match at each position with the longest single set element (character or string).

Use this span condition together with other longest-match algorithms, such as ICU converters (ucnv_getUnicodeSet()).

Public methods

valueOf

public static UnicodeSet.SpanCondition valueOf (String name)

Parameters
name String

values

public static final SpanCondition[] values ()

Returns
SpanCondition[]