More Google Tech Talks

Today I watched the Erlang talk from Google videos. I thought there was a very good question raised by a member of the audience. Why should Erlang include all of these primitives at the language level? I don't think the answer was all that convincing but it was still a good talk. Erlang has some nice features such as hot code replacement. In the example, Lennart showed how you can tell a node to reload its code definition with a new version. In the code, he added a switch command. Once executed, it would switch the node over to the new version. Even if you reloaded the source, it would not take effect until you tell the node to use a non-local call. I wasn't impressed by the set_cookie part. It seems rather hokey like X11 authorization. Perhaps there are alternative ways to provide authorization. At the end, he mentions a book "Programming Erlang" which I have been thinking about reading.

I also watched You Are What You Say: Privacy Risks of Public Mentions. Based on the title, I thought it was going to be a boring talk about how you don't want to leak obviously private data to the public. However, it was actually about information retrieval when using multiple data sets. Dan looked at a site movielens.org which has movie ratings. The users rate movies which are private and cannot be viewed outside of the site. However, the movielens.org site also produces an "anonymized" list of movie ratings. I think he inferred that these lists had all the ratings for an user but the name or other identifier was anonymized. The idea was to take the anonymized list and other public information in the movie forum posts and try to link the two together. If you could do that, then you would be able to see all of the private ratings for that user.

The talk centered around attacking privacy (identification in an anonymized data set) and then also how to preserve privacy at a system and/or user level. There are a lot of intuitive points such as if you rate a rare movie then in general you're more likely to be identified. One aspect that I found surprising was a way to partially protect your privacy. In the section on misdirecting, it makes sense that if you mention more obscure movies then it won't help hide you very much. It doesn't implicate enough people. The best way was surprisingly to mention very popular movies. I guess the intuition is that even the popular movies are sparse so it's not a wash when you mention one. In effect, you're implicating a large group of people but not large enough where that movie pick could be discarded. I would probably have to read the paper to get a better insight into this.

I like Google talks because there usually aren't stupid questions. The audience is probably not an expert on the topic at hand but they can ask meaningful questions. In the talk, Dan mentioned that they discarded the idea of having users rating genres instead of individual movies because the essence of the site is gone. It's not that useful to rate genres when you are really interested in specific works in that genre. At the end, an audience member asked about grouping the users together in groups of perhaps 2 or 10. Dan said they did not think about that.