[an error occurred while processing this directive]

Modeling Linguistic Variation in Online Social Media

Language on the Internet and social media varies due to time, geography and social factors. For example, consider an online chat forum where people from different regions across the world interact. In such scenarios, it is important to track and detect regional variation in language. A person from the UK, who is in conversation with someone from the USA could say “he is stuck in the lift” to mean ”he is stuck in an elevator”, since the word "lift" means an elevator in the UK. Note that in the US, "lift" does not refer to an elevator. Modeling such variation can allow for applications to prompt or suggest the intended meaning to the other participants of the conversation.

In this talk, I will present computational methods to track and detect changes in word usage, including semantic and syntactic variation. I examine two modalities of linguistic variation: time and geography. Specifically I outline methods to use distributional word representations (word embeddings) to detect semantic variation in word usage. Our methods are scalable to large datasets, making them particularly suited for online social media.
 
These methods have broad applications in several fields like information retrieval, semantic web applications, socio-variational linguistics and computational social science.

 

Bio

Vivek is a PhD candidate in the Department of Computer Science at Stony Brook University, advised by Prof.Steven Skiena. Vivek's research interests lie at the intersection of Text Mining, Machine Learning and Computational Social Science. Vivek is particularly interested in statistical models to detect and analyze linguistic variation in online social media and is the author of 8 peer reviewed publications. Visit http://vivekkulkarni.net for more details.

Speaker

Vivek Kulkarni

Date

Wednesday, October 5, 2016

Time

1:15 pm - 2:15 pm

Location

IACS Seminar Room