Tuesday, August 31, 2010

补记上海世博

五月份曾到上海开会。最后一天的会议在中午结束。午饭过后,直奔世博。

到达世博刚好两点。凭预约券在中国馆排队。排队的方式比较科学。首先要走过用铁栅栏围成的迂回曲折的一段路,把人与人的距离拉开,把队伍整理出来。然后,队伍被分成组。每组约百把人,在一个凉棚下等候。这即照顾了排队的需要,也考虑到天气炎热,避免人们长时间在烈日下暴晒。等候的队伍一组一组地进入中国馆楼底下等待上升降机。这一步可以认为是整个参观过程的一个环节。在这个过程中,观众充分感受到爬上中国馆的台阶和在中国馆下自下而上仰视中国馆的视觉冲击。作一个可能不太恰当的比喻,这就好像是被皇帝召见前走进皇宫,在殿外恭候的过程。

从开始排队算起,大概等了一个半小时,我们进入了升降机。升降机的设计很有创意。两边装有大屏幕。当升降机启动的时候,屏幕上放的是火车前行窗外的景色。虽然升降级是自下而上进入二楼,观众的感觉确像是升降机往前开进了中国馆。

中国馆内有几样设计很有创意。倒立的城市把城市翻了个,观众仰头看到的是倒挂在天花板上的城市楼群。低头看地面,斑马线使用中国地名的汉字组成的。至于大屏幕电影,有视觉冲击,但没有惊喜,因为没有走出张艺谋的惯常套路。清明上河图有点意思,但不耐看,稍嫌缺少变化。

走出中国馆,我们顺路走了中国省市馆。分馆很多,人流也很多。我们也事先没有计划看什么,不看什么,于是走马观花,随便看了一些馆,记得有新疆、福建等。总体上说,浏览的人不算太多,排队看电影的人很多。

从中国馆出来,已经是五点多了。不少人在世博玩了一天,初现疲态。一些热门馆的队伍也开始变短了。我们便乘机排阿联酋馆,不到两个小时便进了馆。阿联酋馆电影院的规模不大,但高清晰度的屏幕和恰当的环境布置,造成了类似三维感官的效果。整个主体是通过一个小孩的成长,讲述阿联酋立国的艰难和建国的策略,很有意思。

从阿联酋馆出来,天已经黑了。由于排队的人少了很多,我们得以看到了英国官、摩洛哥馆、白俄罗斯馆、澳大利亚馆和城市与地球馆。特别值得一记的是澳大利亚馆和城市与地球馆。澳大利亚馆的电影屏幕可以高低伸缩,配合音乐,造成了丰富的动感。我很喜欢澳大利亚,所以也对澳大利亚馆的感觉很好。城市与地球馆的主题是它的半球形的大屏幕,从上向下和从下向上分别看不同的电影,很有意思。我特别佩服多个摄像机能协调地吧图像拼合到一块,几乎看不出拼接的痕迹。

离开世博会时,已是晚上十点了。八个小时看了中国馆、中国省市馆、阿联酋馆、摩洛哥馆、英国官、白俄罗斯官、澳大利亚馆、和城市与地球馆。由于基本避开的人潮,效率挺高的,看得也比较舒服。总结一下,整个世博就象一个电影村。数以百计、大大小小、设计各异的电影院吸引了每天数以十万计的观众。我的经验是从下午开始,事半功倍。

Thursday, August 5, 2010

So you want to study IR?

The following article, written by Jonathan Elsas, was extracted from http://windowoffice.tumblr.com/post/898277337/so-you-want-to-study-ir

I occasionally get questions from aspiring IR students asking for advice on getting started as an IR researcher.* Here’s an attempt at some pointers for foundation material and resources:

Reading

There’s a lot of background material that any IR researcher should be familiar with:

1. A good textbook to get your head around the fundamentals of the field is essential. There’s a long history of finely tuned mathematical models of information retrieval, which still strongly influence most modern IR research. Understand these models and become familiar with the issues they address. Recently at CMU, the graduate-level introductory IR classes have used Introduction to Information Retrieval by Manning, Raghavan & Schütze, which I highly recommend. But, there are other good books out there, also: Search Engines: Information Retrieval in Practice by Croft, Metzler & Strohman is geared more towards the undergraduate audience. When I took the IR class at CMU, we used Moden Information Retrieval by Baeza-Yates and Ribeiro-Nero (which despite the name may be a little dated now). I still refer back to Managing Gigabytes by Whitten, Moffat & Bell when digging into the guts of an indexing problem.

2. Read some classic IR papers. There’s some gems listed in the 2005 SIGIR Forum article “Recommended reading for IR research students”. Lots of these are somewhat dated (eg. the IR world has pretty much moved beyond LSI), but some are still very heavily cited (eg. the PageRank paper, Lavrenko’s relevance models paper).

3. A solid foundation in machine learning is becoming increasingly important. Tom Mitchell’s classic book is great. I’m also a big fan of Andrew Moore’s tutorial slides. (There’s clearly a CMU bias here — I was fortunate enough to take a machine learning class taught by both Tom and Andrew.)

4. Pay attention to what’s hot. Follow the top conferences (SIGIR, CIKM, ECIR, WSDM, WWW, ASIST) and journals (Information Retrieval, TOIS, JASIST) read all the papers that get best paper awards, and attend the conferences if you can. (I know I’m missing some great conferences in this list.)

5. IR as a field has always strongly valued solid evaluation methodologies. The Text REtrieval Conference, run by NIST, has provided many researchers with invaluable datasets and a forum for testing retrieval algorithms. Familiarize yourself with the tasks, datasets and tools used at TREC.

6. Don’t forget that IR is not just a CS research topic — it has its origins in Library Science. There is a strong (and IMO under-appreciated) Information & Library Science IR research community. IR isn’t just about the mathematics behind retrieval models — we need to understand the searchers and the user interfaces.

7. Check out Video Lectures’s archive of IR talks. Not really reading, but you can see what kind of research is presented at IR conferences, as well as some IR tutorials by well-known IR researchers.

Communicating

Effective presentation of your ideas is essential in any field. As a reviewer, I’ve seen a lot of papers that haven’t been anywhere close to publication quality. As a conference attendee, I’ve heard a lot of terrible talks. This advice applies to any technical field, not just IR.

1. If English is not your first language, find (befriend, hire) a native speaker who can edit your work and/or tutor you. For better or worse, you need to write fluently in English to get published.

2. Writing well takes practice. Try to write some each week. Trevor Strohman recommends writing some every day, along with giving a lot of other good general research tips. Find other researchers to swap paper drafts with to act as a reviewer. Read Writing for Computer Science by Zobel (via @ssn).

3. Giving good presentations takes practice. If you can give a good conference presentation, people will remember you and your work. Find some advice on good presentation skills and follow it. I like this guy’s advice. I also like to keep words and equations to a minimum on my slides. But, everyone has their own presentation style. Most importantly, practice your talks. And please, don’t just read your slides.

Doing

You aren’t going to get far in IR research without getting your hands dirty. This is an applied field and its rare for someone to get a PhD without actually creating some software; dealing with some large, messy datasets; or *shudder* users!


1. Learn how to perform a retrieval experiment, and do it. You’ll need a document collection, a set of queries, and a set of relevance judgements. Terrier’s “Quick Start” guide gives a good overview of the process.

2. As mentioned above, TREC is the premiere forum for IR evaluation. Participate in a TREC track if possible, and go to the conference. TREC is an excellent venue for hands-on experimentation and is very low-risk — you can’t get rejected, and you’ll have a (non-refereed) publication on your CV.

3. You don’t need to write a search engine from scratch, although this can be a great learning exercise. There are quite a few very good open source research search engines. See Jeff Dalton’s reasonably up-to-date list. Many of these have been used over the years at TREC and support TREC-formatted document collections.

4. Understanding the searcher is still an wide-open area of IR research. Perform a user study to explore how people formulate queries, interpret results, etc.

5. There are many unsolved real-world IR problems all around us. Although we see a lot of publications from large web search companies, web search isn’t the only search problem. Find a tractable problem and attempt a solution. For example, build a search engine for your department’s publications; build a better search interface for Wikipedia; tackle people-search in Twitter.

Hope this helps answer some of the new IR researcher’s questions. And, seasoned researchers, please leave a comment if you spot any omissions.

* I’m probably not the best person to ask this question, but I do get a request every couple of months. Why they ask me is a mystery — maybe all grad students at top US universities get these requests?

Wednesday, August 4, 2010

夜泳

连续好几个星期都没有下雨,天气又热又闷。吃过晚饭,实在热得受不了,一头扎到泳池里,舒服啊!

这个夏天,小小的池子给家里带来了很多乐趣。连我妈也经常下午游上一会儿,身体也感觉利索了一些。去年冬天以来照料游泳池的所有功夫都值了。