前程's profileOrange-ChⅢPhotosBlogLists Tools Help

Blog


    May 02

    正则表达式

    元字符     描述



    .      匹配任何单个字符。例如正则表达式r.t匹配这些字符串:rat、rut、r t,但是不匹配root。 

    $      匹配行结束符。例如正则表达式weasel$ 能够匹配字符串"He's a weasel"的末尾,但是不能匹配字符串"They are a bunch of weasels."。 

    ^           匹配一行的开始。例如正则表达式^When in能够匹配字符串"When in the course of human events"的开始,但是不能匹配"What and When in the"。

    *      匹配0或多个正好在它之前的那个字符。例如正则表达式.*意味着能够匹配任意数量的任何字符。

    \      这是引用府,用来将这里列出的这些元字符当作普通的字符来进行匹配。例如正则表达式\$被用来匹配美元符号,而不是行尾,类似的,正则表达式\.用来匹配点字符,而不是任何字符的通配符。

    [ ]
    [c1-c2]
    [^c1-c2]  匹配括号中的任何一个字符。例如正则表达式r[aou]t匹配rat、rot和rut,但是不匹配ret。可以在括号中使用连字符-来指定字符的区间,例如正则表达式[0-9]可以匹配任何数字字符;还可以制定多个区间,例如正则表达式[A-Za-z]可以匹配任何大小写字母。另一个重要的用法是“排除”,要想匹配除了指定区间之外的字符——也就是所谓的补集——在左边的括号和第一个字符之间使用^字符,例如正则表达式[^269A-Z] 将匹配除了2、6、9和所有大写字母之外的任何字符。

    \< \>   匹配词(word)的开始(\<)和结束(\>)。例如正则表达式\<the能够匹配字符串"for the wise"中的"the",但是不能匹配字符串"otherwise"中的"the"。注意:这个元字符不是所有的软件都支持的。

    \( \)   将 \( 和 \) 之间的表达式定义为“组”(group),并且将匹配这个表达式的字符保存到一个临时区域(一个正则表达式中最多可以保存9个),它们可以用 \1\9 的符号来引用。

    |     将两个匹配条件进行逻辑“或”(Or)运算。例如正则表达式(him|her) 匹配"it belongs to him"和"it belongs to her",但是不能匹配"it belongs to them."。注意:这个元字符不是所有的软件都支持的。

    +     匹配1或多个正好在它之前的那个字符。例如正则表达式9+匹配9、99、999等。注意:这个元字符不是所有的软件都支持的。

    ?     匹配0或1个正好在它之前的那个字符。注意:这个元字符不是所有的软件都支持的。

    \{i\}
    \{i,j\}  匹配指定数目的字符,这些字符是在它之前的表达式定义的。例如正则表达式A[0-9]\{3\} 能够匹配字符"A"后面跟着正好3个数字字符的串,例如A123、A348等,但是不匹配A1234。而正则表达式[0-9]\{4,6\} 匹配连续的任意4个、5个或者6个数字字符。注意:这个元字符不是所有的软件都支持的。


    October 10

    An Interesting Idea!

      This idea is coming from a paper of WWW2007.
      As is often the case that people have different requirement for the same query to the search engine. A simple example, when querying 'MSRA', someone may want to know the history of this institute but the other want to know some experiences of intern in it. Unfortunately, the search engine just gives us an identical answer. How stupid the SE is! Why does it not understand me?
      Stop complaining now, this problem is so called 'personalized search' To solve this problem, we have two ways: implicit and explicit.
      Implicit way: we need to do nothing and the search engine makes use of users' query history to mining the users' interests recently. Under this method, we need a query log and the search engine will predict a search trend. This is the Google's method.
      Explicit way: We can know clearly what do we want when querying. So when querying, we just express ourselves and speak out what do we want. In their project, they put a slider under the keyword text box to express your preference.(Here are information and affection). In order to get better answer, the search engine can add more elements to help user express their preference.

    A Simple beta idea, just walk and see! :P