“【自我说明】包含敏感字符,请重新输入。” ([Profile] contains sensitive
characters, please try again.) Note that for censorship of text chat,
triggering messages do not display this warning.
Check out the...
Censorship Analysis
YY 7.1 downloads three different keywords lists:
-
Finance Keyword List (48 words) Source: http://do.yy.duowan.com/financekwordlist
These are keywords related to phishing scams. When received, YY prints the
following warning in the chat window:
YY安全提示:聊天中若有涉及财产的操作,请一定要先核实好友身份,谨防受骗!
The keywords are downloaded in plain text in UTF8-encoded XML.
-
Decoded "Normal" Keyword List (22 words) Source: http://do.yy.duowan.com/NormalKWordlist.txt
These are sensitive keywords that are asterisked out in the message before the
message is sent and that trigger a surveillance message back to YY's servers.
These keywords are downloaded as a base64-encoded list of UTF16-encoded
keywords each separated by a carriage return followed by a line feed.
-
Decoded "High" Keyword List (13,461 words) Source: http://do.yy.duowan.com/HighKWordlist.txt
These are sensitive keywords that cause the containing message to never be sent
and that trigger a surveillance message back to YY's servers. If a message
containing one of these keywords is somehow received, then that message will
show in the chat window as a blank message. These keywords are downloaded as a
base64-encoded list of UTF16-encoded keywords each separated by a carriage
return followed by a line feed.
Surveillance Analysis
When sending a word from the "Normal" or "High" lists above, a surveillance
message is sent via an HTTP GET request to a URL of the form:
http://sere.hiido.com/do.action?id=<id>&content=<content>
<id> is a hash computed as
md5(⌊<seconds since unix epoch> /
1000⌋ + ";username=report;password=pswd@1234"), hex-encoded. Note
that the username and password in the hashed string are hardcoded; these are
not the username and password of the sender or receiver of the triggering
message.
<content> is a base64-encoded string of the following
form:
type=2;uid=<sending user id #>;touid=<receiving user id #>;keyword=<triggering keyword>;txt=<triggering message in its entirety>
In the version of YY analyzed, type is hardcoded to 2.
Code
decode.py is a python script for automating the
decoding of the "normal" and "high" lists into plain text UTF8.