How would social deduction games look like with data?

Introduction: Board Game!

Avalon, or The Resistance, is an exceptionally popular board game. Players are randomly and secretly assigned a role belonging to one of two possible forces (Good or Evil in Avalon; Resistance or Spy in the original Resistance game), and should act based on their affiliation in order to win.

Although the game can sometimes be quite intricate, the underlying concept is straightforward, enabling players to learn the rules and enjoy the gameplay in just a few minutes. …

What kind of story does the crime data tell about NYC in 2020 so far?

Introduction: What is going on with NYC?

These days, the crime rate in New York City got very conflicting ideas. Some politicians said violent crime is rampaging in the big apple, but some others said it is safer than ever before due to the lockdown. With the upcoming election, I think this is just one of the public topics that have always been used for political interests. But I also got many versions of the story personally.

I have friends and families said it becomes inhabitable. Meanwhile, I haven’t heard anything different from my foreign friends or colleagues still in the city. …

Cleaning and feature engineering data based on multiple columns in both single commands and ETL processes

Let’s suppose your manager gives you a random dataset and told you to do a “basic” cleaning: “Keep only the records that have values in column A, B, and C or the records don’t have any value in those three columns at all“. How would you approach that?

Photo by ThisisEngineering RAEng on Unsplash


Feature engineering can be really basic such as scaling and encoding, but sometimes mysterious. Recently, I came across this type of cross-column-based requirement and this got me thinking, how could a cleaning logics sound so simple in words but be inherently complicated in the data cleaning and feature engineering?

Straightforward in words, not so intuitive in codes

Most of the…

An attempt to investigate the “curve” through visualization

Photo by Fabian Quintero on Unsplash

[2020/6/4 Update | I have built a web app about tracking the latest numbers with Dash and Plotly. I will write up another article but please feel free to try it now.]

At the time of the writing (26th March 2020), Coronavirus (COVID-19) has wreaked havoc on our society. Though there was already some news about a new kind of virus spotted in Wuhan, China at the end of 2019, The outbreak in western countries just started. During the self-quarantine, it came to my mind that it would be interesting and possibly inspiring if we use the data to…

A Review for Python/DS Online Courses in Mandarin

2018年底,我上完了入門的 Python for Everybody 系列課程 (推薦心得),對 Python 有一些了解。在那時間點、我的碩士申請也已經告一段落,因此這時的學習目標轉變了一點。因為我不需要再充實我的履歷內容,所以當時不再強調快速有效率,而是要「增加 coding 能力與實際操作 Data Science 的經驗」。 也因為上一次在 Coursera 的 Python 課有不錯的經驗,我就直接選擇了同所學校 UNM 的下一個系列,我接著繼續上 Cousera 上 UMN 的下一個系列 Applied Data Science with Python Specialization


這篇與上一篇介紹課程心得的文章稍微不同,會不提線上課程的選擇,而會較專注在課程本身的內容 (難度 …

Using NYPD Complaint Data to analyze crime events in different boroughs.

Photo by Mark Asthoff on Unsplash


When I moved to New York City, I kept hearing different people saying the same thing: I should watch out for the city in terms of personal safety. Even though, historically, NYC has become a much safer place than before (The troll tourist guide about New York in the 1970s: ‘Welcome to Fear City’ — the inside story of New York’s civil war, 40 years on), New York City leaves a full-of-fun but somewhat-terrifying image to the locals and people all around the world.
E.g. …

2018年中,為了要申請 Business Analytics/BA/MIS 碩士,我就開始 Data Science/Business Analytics 的自學。但身為一個大學不太碰數學,只有基礎統計和 R 知識的商院人,一開始面對滿滿的學習資源真的很選擇障礙…。我相信隨機挑一個課程,效果都不會太差,但最重要的是:如何最快最有效率的幫助我的 MS 申請也增加我的 BA 能力?

Photo by Clément H on Unsplash


這篇主要分享當時選擇線上課程的心得和 Python for Everybody on Coursera 課程,適合同樣剛進入這個領域的初心者。必須先說,同樣身為新手,我選擇在自己也非完全理解的階段就分享,是希望我能在還記得入門者的感受、困惑下,能趕快紀錄起在這階段我覺得有價值的觀點想法。(如果有任何指教或想補充的 …

