Class UScript


  • public final class UScript
    extends java.lang.Object
    Constants for ISO 15924 script codes, and related functions.

    The current set of script code constants supports at least all scripts that are encoded in the version of Unicode which ICU currently supports. The names of the constants are usually derived from the Unicode script property value aliases. See UAX #24 Unicode Script Property (http://www.unicode.org/reports/tr24/) and http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt .

    In addition, constants for many ISO 15924 script codes are included, for use with language tags, CLDR data, and similar. Some of those codes are not used in the Unicode Character Database (UCD). For example, there are no characters that have a UCD script property value of Hans or Hant. All Han ideographs have the Hani script property value in Unicode.

    Private-use codes Qaaa..Qabx are not included, except as used in the UCD or in CLDR.

    Starting with ICU 55, script codes are only added when their scripts have been or will certainly be encoded in Unicode, and have been assigned Unicode script property value aliases, to ensure that their script names are stable and match the names of the constants. Script codes like Latf and Aran that are not subject to separate encoding may be added at any time.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  UScript.ScriptUsage
      Script usage constants.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean breaksBetweenLetters​(int script)
      Returns true if the script allows line breaks between letters (excluding hyphenation).
      static int[] getCode​(ULocale locale)
      Gets a script codes associated with the given locale or ISO 15924 abbreviation or name.
      static int[] getCode​(java.lang.String nameOrAbbrOrLocale)
      Gets the script codes associated with the given locale or ISO 15924 abbreviation or name.
      static int[] getCode​(java.util.Locale locale)
      Gets a script codes associated with the given locale or ISO 15924 abbreviation or name.
      static int getCodeFromName​(java.lang.String nameOrAbbr)
      Returns the script code associated with the given Unicode script property alias (name or abbreviation).
      static java.lang.String getName​(int scriptCode)
      Returns the long Unicode script name, if there is one.
      static java.lang.String getSampleString​(int script)
      Returns the script sample character string.
      static int getScript​(int codepoint)
      Gets the script code associated with the given codepoint.
      static int getScriptExtensions​(int c, java.util.BitSet set)
      Sets code point c's Script_Extensions as script code integers into the output BitSet.
      static java.lang.String getShortName​(int scriptCode)
      Returns the 4-letter ISO 15924 script code, which is the same as the short Unicode script name if Unicode has names for the script.
      static UScript.ScriptUsage getUsage​(int script)
      Returns the script usage according to UAX #31 Unicode Identifier and Pattern Syntax.
      static boolean hasScript​(int c, int sc)
      Do the Script_Extensions of code point c contain script sc? If c does not have explicit Script_Extensions, then this tests whether c has the Script property value sc.
      static boolean isCased​(int script)
      Returns true if in modern (or most recent) usage of the script case distinctions are customary.
      static boolean isRightToLeft​(int script)
      Returns true if the script is written right-to-left.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • getCode

        public static final int[] getCode​(java.util.Locale locale)
        Gets a script codes associated with the given locale or ISO 15924 abbreviation or name. Returns MALAYAM given "Malayam" OR "Mlym". Returns LATIN given "en" OR "en_US"
        Parameters:
        locale - Locale
        Returns:
        The script codes array. null if the the code cannot be found.
      • getCode

        public static final int[] getCode​(ULocale locale)
        Gets a script codes associated with the given locale or ISO 15924 abbreviation or name. Returns MALAYAM given "Malayam" OR "Mlym". Returns LATIN given "en" OR "en_US"
        Parameters:
        locale - ULocale
        Returns:
        The script codes array. null if the the code cannot be found.
      • getCode

        public static final int[] getCode​(java.lang.String nameOrAbbrOrLocale)
        Gets the script codes associated with the given locale or ISO 15924 abbreviation or name. Returns MALAYAM given "Malayam" OR "Mlym". Returns LATIN given "en" OR "en_US"

        Note: To search by short or long script alias only, use getCodeFromName(String) instead. That does a fast lookup with no access of the locale data.

        Parameters:
        nameOrAbbrOrLocale - name of the script or ISO 15924 code or locale
        Returns:
        The script codes array. null if the the code cannot be found.
      • getCodeFromName

        public static final int getCodeFromName​(java.lang.String nameOrAbbr)
        Returns the script code associated with the given Unicode script property alias (name or abbreviation). Short aliases are ISO 15924 script codes. Returns MALAYAM given "Malayam" OR "Mlym".
        Parameters:
        nameOrAbbr - name of the script or ISO 15924 code
        Returns:
        The script code value, or INVALID_CODE if the code cannot be found.
      • getScript

        public static final int getScript​(int codepoint)
        Gets the script code associated with the given codepoint. Returns UScript.MALAYAM given 0x0D02
        Parameters:
        codepoint - UChar32 codepoint
        Returns:
        The script code
      • hasScript

        public static final boolean hasScript​(int c,
                                              int sc)
        Do the Script_Extensions of code point c contain script sc? If c does not have explicit Script_Extensions, then this tests whether c has the Script property value sc.

        Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/.

        Parameters:
        c - code point
        sc - script code
        Returns:
        true if sc is in Script_Extensions(c)
      • getScriptExtensions

        public static final int getScriptExtensions​(int c,
                                                    java.util.BitSet set)
        Sets code point c's Script_Extensions as script code integers into the output BitSet.
        • If c does have Script_Extensions, then the return value is the negative number of Script_Extensions codes (= -set.cardinality()); in this case, the Script property value (normally Common or Inherited) is not included in the set.
        • If c does not have Script_Extensions, then the one Script code is put into the set and also returned.
        • If c is not a valid code point, then the one UNKNOWN code is put into the set and also returned.
        In other words, if the return value is non-negative, it is c's single Script code and the set contains exactly this Script code. If the return value is -n, then the set contains c's n>=2 Script_Extensions script codes.

        Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/.

        Parameters:
        c - code point
        set - set of script code integers; will be cleared, then bits are set corresponding to c's Script_Extensions
        Returns:
        negative number of script codes in c's Script_Extensions, or the non-negative single Script value
      • getName

        public static final java.lang.String getName​(int scriptCode)
        Returns the long Unicode script name, if there is one. Otherwise returns the 4-letter ISO 15924 script code. Returns "Malayam" given MALAYALAM.
        Parameters:
        scriptCode - int script code
        Returns:
        long script name as given in PropertyValueAliases.txt, or the 4-letter code
        Throws:
        java.lang.IllegalArgumentException - if the script code is not valid
      • getShortName

        public static final java.lang.String getShortName​(int scriptCode)
        Returns the 4-letter ISO 15924 script code, which is the same as the short Unicode script name if Unicode has names for the script. Returns "Mlym" given MALAYALAM.
        Parameters:
        scriptCode - int script code
        Returns:
        short script name (4-letter code)
        Throws:
        java.lang.IllegalArgumentException - if the script code is not valid
      • getSampleString

        public static final java.lang.String getSampleString​(int script)
        Returns the script sample character string. This string normally consists of one code point but might be longer. The string is empty if the script is not encoded.
        Parameters:
        script - script code
        Returns:
        the sample character string
      • isRightToLeft

        public static final boolean isRightToLeft​(int script)
        Returns true if the script is written right-to-left. For example, Arab and Hebr.
        Parameters:
        script - script code
        Returns:
        true if the script is right-to-left
      • breaksBetweenLetters

        public static final boolean breaksBetweenLetters​(int script)
        Returns true if the script allows line breaks between letters (excluding hyphenation). Such a script typically requires dictionary-based line breaking. For example, Hani and Thai.
        Parameters:
        script - script code
        Returns:
        true if the script allows line breaks between letters
      • isCased

        public static final boolean isCased​(int script)
        Returns true if in modern (or most recent) usage of the script case distinctions are customary. For example, Latn and Cyrl.
        Parameters:
        script - script code
        Returns:
        true if the script is cased