A new service launched this week by a web developer named Dan Zarrella called TweetPsych. Zarrella is also a marketing manager for HubSpot, an online marketing firm. Zarrella calls himself a “scientist,” because I guess it sounds sexier than “web developer” or “marketing manager,” but he doesn’t list any academic credentials. (I wouldn’t mention the scientist or credentials part except that Zarrella makes specific scientific claims about his new service.)
The interesting new service is marketed as offering “psychological profiling” based upon what you post to Twitter. But it’s really just a content analysis service, using two psychological dictionaries and your past 1,000 tweets. Zarrella claims this analysis “builds a psychological profile of a person.” Real psychological profiling is a science, and is usually done with a lot more than just one piece of a person’s life (such as what they write on a micro-blogging service). TweetPsych then makes the contradictory claim that it is for “entertainment purposes only.” Which is it?
There are problems with one of the dictionaries Zarrella is using in the analysis as well. One dictionary — the LIWC — is a valid psychological linguistics database. But the other, the English Regressive Imagery Dictionary (RID), is far less so. The RID is composed of about 3,200 words and roots assigned to 43 categories of thought and mood. The primary problem with the RID is that is has basically no research backing (despite Zarrella’s quoting of a website making it seem as if it does). It was developed by a single professional, who then went on to write a bunch of books about it and other psychoanalytic processes. A book isn’t the same as a peer-reviewed research journal article (as researchers know), and the RID is completely lacking any empirical backing. This suggests that half the analysis is invalid before we even begin.
The second dictionary, the Linguistic Inquiry and Word Count (LIWC), is based primarily on the written word — people’s writings — or the spoken word — like a therapy session or conversation between two people. It was not developed to analyze artificially short 140-character entries, such as those found on Twitter. People abbreviate words when tweeting, because of the character limit, and it’s not clear that simple stemming is going to accurately analyze all of the words written with on-the-fly nonstandard abbreviations. What about re-tweets as well? A person who retweets something isn’t necessarily “speaking,” but instead acting as a mouthpiece for someone else’s words. Does the service differentiate? Without knowing the extent of the problem, you would have no idea whether your analysis is artificially biased in some manner (unless you specifically studied these kinds of issues first). These issues are addressable, but haven’t been addressed in this service.
Zarrella’s ability to quickly analyze 1,000 tweets and compare all the text contained therein to these two dictionaries in a few seconds is an admirable feat of linguistic programming. The challenge then faced is, “How do I present the results of the analysis in a thoughtful, intuitive and actionable manner?” This is the part where TweetPsych simply fails to deliver.
Since Zarrella apparently has little psychology background, the psychology results are pretty unsatisfying. You receive a list of “features” (of your personality? your tweeting?) that include things like “Occupation and Work.” Next to it is the helpful description, “You talk a lot about jobs and your work” and a score.
Gee, thanks for the great insight.
You have no idea what the score means, because there’s no context for it. Is a 47.87 for work good or bad? What’s the average? Other features include “Present Tense,” “Upward Motion,” “Positive Emotions,” “Negative Emotions,” and three dozen other categories.
This part of the analysis, based upon the LIWC, is also only as good as the LIWC’s basic dictionary. While categories such as work, achievement and leisure are all “current concerns” that the LIWC can identify, it has no category for something like “relationship concerns.” But you wouldn’t know that unless you knew the LIWC. It might be something you’d mention to people who take the analysis. Other popular topical content tweeted about regularly — like politics, technology and celebrity — are also not a part of the LIWC. So again, they’ll never show up in the analysis, even if that’s all you talk about. So the information the LIWC — and by extension, TweetPsych — can provide is limited. (The use of a customized dictionary solves some of these issues, but it’s not a dictionary TweetPsych offers.)
The “Primordial, Conceptual and Emotional Content” from the RID comes with absolutely no descriptions, and again, nothing to put your scores into any type of context or understanding. But since it’s not a scientific dictionary to begin with, you can pretty much ignore the scores anyway. They could’ve been developed randomly and provide just as much helpful information.
The last part of the current analysis is “Others like you,” a common component of any social networking service. Curiously, this component was missing from the first version of this tool. Based solely upon what you tweet, it claims “Some people that think like you” and then provides you a list of other people who’ve gone to TweetPsych and entered in their username to analyze.
Of course it’s not people that think like you — it’s people that tweet like you. This is an important distinction. A service that analyzes a tiny portion of what you write every day, and based upon an analysis that may be flawed by its users’ widespread use of abbreviations, surely cannot claim to analyze what you think.
How reliable is TweetPsych? Well, today as I was writing this article, I noticed that all of Dan Zarrella’s own scores changed because of a single tweet (he’s only tweeted once today). His “occupation and work” score dropped 20%, and his “present tense” score went up 16%. His abstract thought score went down 16%. How could all of this happen from just one tweet? One tweet — compared to his 999 other tweets — shouldn’t be able to affect one’s score so much. Unless something else is going on. (Compare the screenshot below, taken at 2:55 pm ET, to the one above, taken at 9:00 am ET today.)
My own results in analyzing Zarrella’s past 1,000 tweets in the LIWC2007 program shows something very different. I don’t know Zarrella’s methodology (since he didn’t share it), but I took the text of his past 1,000 tweets and processed them in two forms through the LIWC — stemmed and “as is.” Neither produced scores anything close to what appears on TweetPsych. This could be due to his using an older version of the dictionary, or some sort of transformation variable he’s adding to TweetPsych that he didn’t disclose. You can view the results of this LIWC2007 analysis here. (I’ve highlighted in yellow things TweetPsych has highlighted, and in green other areas no longer highlighted by TweetPsych; note the significant difference in scoring.) It makes you wonder, though, exactly what is going on with the service. If its psychometric reliability and validity are questionable, how useful is it?
TweetPsych is getting plenty of positive press, with only a dash of skepticism thrown in. CNet’s Josh Lowensohn wrote about the service and only noted in passing, “This makes it less about psychology and more about your personal lexicon, but the results are still quite fun.” Yay, fun! Ben Patterson over at Yahoo! Tech said, “Unfortunately, the psychological profiles that TweetPsych dispenses aren’t the coherent, narrative variety you might hear from the resident psychiatrist on “Law & Order: Criminal Intent.”” And yet, isn’t a coherent narrative far more useful than some nondescript categories? None of the reporters, nor the original article on Mashable (where, surprise!, Zarrella is a contributor), note the lack of psychology background Zarrella brings to the table. None connected the dots as to why the results are so unsatisfying in their interpretations. Apparently tech journalists are great at redistributing their own positive press releases, but not so great at being actual journalists who dig into the claimed science of such a service.
Of course, Zarrella himself admits he didn’t put much thought into the service, as he told the NY Post, “”People just love to compare themselves against other people and to try to ‘get inside’ other people’s heads,” says Zarrella. “It’s like being a fly on the wall at a therapy session.”” A therapy session? Is it really that insightful to find someone is talking about “upward motion?” In the rush to put the service online, Zarrella apparently never asked the question, “Is any of this information actually useful?” The service, as it exists today, is an unfinished thought that few will revisit.
TweetPsych, despite its limitations, has opened the door to future services that actually provide usable, useful and actionable information that would likely have greater validity. Imagine taking not only a person’s tweets, but information contained within their Facebook profile, blog, etc., and have it placed all into one huge analysis engine… Such an engine might have then have the capability of providing true psychological insight into an individual based upon what they say online.
Until that time, we have freshmen efforts like TweetPsych, which really should be called “TweetFun!” Because while it is indeed fun to play with, it provides little psychological insight — except of the most shallow kind — into anyone.
(You can read additional concerns of TweetPsych by Tyler Hayes here.)
11 comments
Thanks for the credit at the end John!
Your analysis & commentary is way more in-depth than mine, and I wish I could add something beneficial to it. But, you nailed it! Spot on good sir.
John:
Ad hominems aside (my apologies for not conforming to your defintion of what a scientist should be) I appreciate the time you’ve put in analyzing tweetpsych, and I apologize for not responding right away to your offers to help (I’ve been busy trying to make the app scale to the traffic its been getting).
A few points of clarification. Tweetpsych compares individual user’s results with a baseline and displays the variation of each user from that baseline, so naturaly the output of LIWC2007 would not be the same.
Scores are changing because I’ve made changes to how the baseline is calculated as well as some minor changes to improve how I was applying the two dictionaries.
I completely agree that the service is a “freshman” effort because its beta and I plan on evolving it. I started to add some descriptions for each of the codes, and will continue to do so as I make the system stable at high traffic levels.
I do plan to follow up with you on your offer to help make the output more readable.
Thank you again for your thoughts, critical attention is invaluable to improving the service.
@Dan – As I said, I wouldn’t have mentioned your background except that you’re making specific scientific claims regarding the service. So one would expect such claims be based in the scientific method. It’s not my definition, it’s the definition that’s widely accepted within society. (Like social networking itself and its rebranding of “friends,” you’re welcomed to redefine words as you like. But then don’t act surprised when others ask, “So what’s your scientific training and background? Published anything in a journal?”)
What “baseline”? From all users who try the service? From some other population? From a random sampling of “normal” twitter users? Could you describe the specific methodology a little more, because I’m not sure I understand what you’re referring to here.
I think it’s fine to keep refining the analysis engine and what-not as you’re in “beta,” but wouldn’t it be nice to let your users know that you’re the one actually manipulating their scores at this point — that it’s not a result of their additional tweets or anything. A content analysis service is a lot less meaningful if someone is constantly tweaking the analysis algorithm which can result in such significant changes in one day.
Last, I didn’t add this to the post because it wasn’t central to my analysis, but I noticed you don’t have a privacy policy on the site either. Are you keeping a copy of every analysis done? And if so, are you using said data (or plan to use said data) for other marketing purposes? Such a database could be invaluable to a marketing company…
PS – Let me again emphasize the valuable contribution Zarrella has made with the introduction of this service. Despite its current limitations, he has shown the potential of what could be done with this sort of analysis, and that’s a very important step.
Dan, I’d love to see you release the analysis engine component itself as open source. You could open the door to other researchers conducting their own analyses on the tweetstream.
Really great dialogue about this new tool. John, you bring up some excellent points and criticisms about TweetPsych. It is good to be skeptical. It will be interesting to see where this goes.
Wow, a real psychologist versus one of a million web developers who are running a “business” out of their parents’ basement. Professionals don’t understand the internet, or kids these days (and who are so wrapped up in themselves and their media-gluttony-induced abstractions of the world that they who don’t give a damn about the former generation’s idea of protocol). Kids these days, especially web developers, are trying to strike it rich with silly little programs, but for every Twitter, there are a thousand emulators, and for every Twitter aggregation service, there are a thousand web developers who should be working at Burger King instead of spewing their code all over the internet.
Final take: Professionals don’t understand that young people don’t take anything seriously. (Come on, this is the generation AFTER Generation X. The post-nihilists.) And the liberating, wonderful, decentralized nature of the internet lets every other hack with a downloaded copy of Photoshop set up a “service” to get his fifteen minutes of fame.
I get annoyed when people use the word psychology as something anyone can do. No. It is a four year plus degree at university and even then you need more training. No random person can just randomly say ‘I’m going to psychoanalyse you and tell you your personality’. It doesn’t work like that.
John, as always an insightful and thoughtful analysis. My take is that some of the claims — and in particular the name — are less than ideal; I imagine they are made and the name chosen specifically because of the valence the term psychology has in our culture. Putting those things aside (and Dan, I think you could and still be successful), what I like about the tool (and a few others, like Klout) is that they are stepping away from the old ways of measuring things. That has it’s drawbacks, but so does having to defend a web-log generated measure that one could argue has even less direct correlation to the kinds of behaviors that seem to be populating the 2.0 space. So Dan, a little more transparency and maybe a little less trading on psychology as your anchor point, but keep trying to push the envelope re: measurement of online social behavior.
I’m not sure why anyone would take this more seriously than those online personality tests, horoscope predictions, career assessments, etc.