Description
Characters can be matched by mentioning their Unicode character
properties inside of \p{}
:
Proposal
RegExp Unicode Property Escapes1
Syntax
Properties
Examples
Name
A unique name, composed of uppercase letters, digits, hyphens and spaces.
- A: Name = LATIN CAPITAL LETTER A
- 🙂: Name = SLIGHTLY SMILING FACE
GeneralCategory
categorizes characters
- x: General
Category= LowercaseLetter - $: General
Category= CurrencySymbol
WhiteScpace
Used for marking invisible spacing characters, such as spaces, tabs and newlines.
- :͡ White
Space= True - π: White
Space= False
Age
Version of the Unicode Standard in which a character was introduced. For example: The Euro sign € was added in version 2.1 of the Unicode standard.
- €: Age = 2.1
Block
A contiguous range of code points. Blocks don’t overlap and their names are unique.
- S: Block = Basic
Latin(range U+0000..U+007F) - 🙂: Block = Emoticons (range U+1F600..U+1F64F)
Script
A collection of characters used by one or more writing systems.
- Some scripts support several writing systems. For example, the Latin script supports the writing systems English, French, German, Latin, etc.
- Some languages can be written in multiple alternate writing systems that are supported by multiple scripts. For example, Turkish used the Arabic script before it transitioned to the Latin script in the early 20th century.
-
Examples
- α: Script = Greek
- Д: Script = Cyrillic