A new service launched this week by a web developer named Dan Zarrella called TweetPsych. Zarrella is also a marketing manager for HubSpot, an online marketing firm. Zarrella calls himself a “scientist,” because I guess it sounds sexier than “web developer” or “marketing manager,” but he doesn’t list any academic credentials. (I wouldn’t mention the scientist or credentials part except that Zarrella makes specific scientific claims about his new service.)
The interesting new service is marketed as offering “psychological profiling” based upon what you post to Twitter. But it’s really just a content analysis service, using two psychological dictionaries and your past 1,000 tweets. Zarrella claims this analysis “builds a psychological profile of a person.” Real psychological profiling is a science, and is usually done with a lot more than just one piece of a person’s life (such as what they write on a micro-blogging service). TweetPsych then makes the contradictory claim that it is for “entertainment purposes only.” Which is it?
There are problems with one of the dictionaries Zarrella is using in the analysis as well. One dictionary — the LIWC — is a valid psychological linguistics database. But the other, the English Regressive Imagery Dictionary (RID), is far less so. The RID is composed of about 3,200 words and roots assigned to 43 categories of thought and mood. The primary problem with the RID is that is has basically no research backing (despite Zarrella’s quoting of a website making it seem as if it does). It was developed by a single professional, who then went on to write a bunch of books about it and other psychoanalytic processes. A book isn’t the same as a peer-reviewed research journal article (as researchers know), and the RID is completely lacking any empirical backing. This suggests that half the analysis is invalid before we even begin.
The second dictionary, the Linguistic Inquiry and Word Count (LIWC), is based primarily on the written word — people’s writings — or the spoken word — like a therapy session or conversation between two people. It was not developed to analyze artificially short 140-character entries, such as those found on Twitter. People abbreviate words when tweeting, because of the character limit, and it’s not clear that simple stemming is going to accurately analyze all of the words written with on-the-fly nonstandard abbreviations. What about re-tweets as well? A person who retweets something isn’t necessarily “speaking,” but instead acting as a mouthpiece for someone else’s words. Does the service differentiate? Without knowing the extent of the problem, you would have no idea whether your analysis is artificially biased in some manner (unless you specifically studied these kinds of issues first). These issues are addressable, but haven’t been addressed in this service.
Zarrella’s ability to quickly analyze 1,000 tweets and compare all the text contained therein to these two dictionaries in a few seconds is an admirable feat of linguistic programming. The challenge then faced is, “How do I present the results of the analysis in a thoughtful, intuitive and actionable manner?” This is the part where TweetPsych simply fails to deliver.
Since Zarrella apparently has little psychology background, the psychology results are pretty unsatisfying. You receive a list of “features” (of your personality? your tweeting?) that include things like “Occupation and Work.” Next to it is the helpful description, “You talk a lot about jobs and your work” and a score.
Gee, thanks for the great insight.
You have no idea what the score means, because there’s no context for it. Is a 47.87 for work good or bad? What’s the average? Other features include “Present Tense,” “Upward Motion,” “Positive Emotions,” “Negative Emotions,” and three dozen other categories.
This part of the analysis, based upon the LIWC, is also only as good as the LIWC’s basic dictionary. While categories such as work, achievement and leisure are all “current concerns” that the LIWC can identify, it has no category for something like “relationship concerns.” But you wouldn’t know that unless you knew the LIWC. It might be something you’d mention to people who take the analysis. Other popular topical content tweeted about regularly — like politics, technology and celebrity — are also not a part of the LIWC. So again, they’ll never show up in the analysis, even if that’s all you talk about. So the information the LIWC — and by extension, TweetPsych — can provide is limited. (The use of a customized dictionary solves some of these issues, but it’s not a dictionary TweetPsych offers.)
The “Primordial, Conceptual and Emotional Content” from the RID comes with absolutely no descriptions, and again, nothing to put your scores into any type of context or understanding. But since it’s not a scientific dictionary to begin with, you can pretty much ignore the scores anyway. They could’ve been developed randomly and provide just as much helpful information.
The last part of the current analysis is “Others like you,” a common component of any social networking service. Curiously, this component was missing from the first version of this tool. Based solely upon what you tweet, it claims “Some people that think like you” and then provides you a list of other people who’ve gone to TweetPsych and entered in their username to analyze.
Of course it’s not people that think like you — it’s people that tweet like you. This is an important distinction. A service that analyzes a tiny portion of what you write every day, and based upon an analysis that may be flawed by its users’ widespread use of abbreviations, surely cannot claim to analyze what you think.
How reliable is TweetPsych? Well, today as I was writing this article, I noticed that all of Dan Zarrella’s own scores changed because of a single tweet (he’s only tweeted once today). His “occupation and work” score dropped 20%, and his “present tense” score went up 16%. His abstract thought score went down 16%. How could all of this happen from just one tweet? One tweet — compared to his 999 other tweets — shouldn’t be able to affect one’s score so much. Unless something else is going on. (Compare the screenshot below, taken at 2:55 pm ET, to the one above, taken at 9:00 am ET today.)
My own results in analyzing Zarrella’s past 1,000 tweets in the LIWC2007 program shows something very different. I don’t know Zarrella’s methodology (since he didn’t share it), but I took the text of his past 1,000 tweets and processed them in two forms through the LIWC — stemmed and “as is.” Neither produced scores anything close to what appears on TweetPsych. This could be due to his using an older version of the dictionary, or some sort of transformation variable he’s adding to TweetPsych that he didn’t disclose. You can view the results of this LIWC2007 analysis here. (I’ve highlighted in yellow things TweetPsych has highlighted, and in green other areas no longer highlighted by TweetPsych; note the significant difference in scoring.) It makes you wonder, though, exactly what is going on with the service. If its psychometric reliability and validity are questionable, how useful is it?
TweetPsych is getting plenty of positive press, with only a dash of skepticism thrown in. CNet’s Josh Lowensohn wrote about the service and only noted in passing, “This makes it less about psychology and more about your personal lexicon, but the results are still quite fun.” Yay, fun! Ben Patterson over at Yahoo! Tech said, “Unfortunately, the psychological profiles that TweetPsych dispenses aren’t the coherent, narrative variety you might hear from the resident psychiatrist on “Law & Order: Criminal Intent.”” And yet, isn’t a coherent narrative far more useful than some nondescript categories? None of the reporters, nor the original article on Mashable (where, surprise!, Zarrella is a contributor), note the lack of psychology background Zarrella brings to the table. None connected the dots as to why the results are so unsatisfying in their interpretations. Apparently tech journalists are great at redistributing their own positive press releases, but not so great at being actual journalists who dig into the claimed science of such a service.
Of course, Zarrella himself admits he didn’t put much thought into the service, as he told the NY Post, “”People just love to compare themselves against other people and to try to ‘get inside’ other people’s heads,” says Zarrella. “It’s like being a fly on the wall at a therapy session.”” A therapy session? Is it really that insightful to find someone is talking about “upward motion?” In the rush to put the service online, Zarrella apparently never asked the question, “Is any of this information actually useful?” The service, as it exists today, is an unfinished thought that few will revisit.
TweetPsych, despite its limitations, has opened the door to future services that actually provide usable, useful and actionable information that would likely have greater validity. Imagine taking not only a person’s tweets, but information contained within their Facebook profile, blog, etc., and have it placed all into one huge analysis engine… Such an engine might have then have the capability of providing true psychological insight into an individual based upon what they say online.
Until that time, we have freshmen efforts like TweetPsych, which really should be called “TweetFun!” Because while it is indeed fun to play with, it provides little psychological insight — except of the most shallow kind — into anyone.
(You can read additional concerns of TweetPsych by Tyler Hayes here.)