Tagged: word cloud

He Said, She Said

Words Men Use More Than Women And Vice Versa

For nine in ten of us, the most fundamental mystery of life is that of the Opposite Sex.

We’d like to understand, but it just doesn’t make sense!  Women wonder: what’s the story with the Ultimate Fighting and Star Trek, and why can’t he put the toilet seat down?  The men puzzle likewise.  Um, aromatherapy?  Greeting cards?  Why does she care what I do with the toilet seat?  And so on…

Yup, we don’t entirely get it – but keep trying we must!  Each hard-earned insight, no matter how tiny, could be the difference between, in feminine terms, a spiritual connection and crossed arms, or, in male-speak, nookie and the dog house.

So, with Valentine’s Day fast approaching, we rolled our combine into the ripe fields of the Internet Dot Com and harvested 14,000,000 words from over 2,000 randomly-selected weblogs – per their profiles, all stateside, half written by women and the rest by men.  Then, we fed them into the Corpusculator, our custom in-house suite of text analysis software.  For hours, it jittered and hummed, as if tenderizing meat and smashing atoms, then out popped two lists: one of words that ladies use more than gentlemen, and the other vice versa.

That night, we left a kettle of chamomile on the stove, set out a plate of snickerdoodles, put the words under our pillow, and slipped into slumber.  The next morning, lo and behold, the kettle was empty, the plate barren, and Glory Be!  The Data Fairy had come during the night and replaced the words with an infographic, entitled He Said, She Said: Words That Men Bloggers Use More Than Women, and Vice Versa:

He Said, She Said.  Click it to see the full-sized version!

He Said, She Said. Click it to see the full-sized version!

The Data Fairy left a note, explaining that she’d scaled each word by the degree of preference and omitted very common words (and, it, my, etc.) and contractions (I’ll, we’re, etc.).  She also noticed a seasonal bias: since we sampled the blogs on February 5th, the data reflects the psychology and events of the month or two beforehand, more or less.

From the lead off of love, the women’s words rollick past Christmas and then bounce about the warm, fuzzy territory of family, food, and fun.  They’re utterly heart-warming, like that classic second-season episode of Friends where Phoebe told Monica that she overheard Chandler say to Rachel that Ross had kissed Joey in the meat locker and hahaha hahah hah haha!  Clearly, if throwing a party, you’ll want to invite as many women as possible.

Quite honestly, I expected the men’s words to be just as entertaining: maybe a corny mix of sports phrases and beer terminology, or something of that genre.  Could I have been less right?  Witness a dry mélange of American politics, government, business, power, influence, and money, spritzled with the cardinal directions, counting numbers, place names, Biblical material, references to other men, and to top it all off, a smatterin’ of the fightin’ and the killin’ words.

As a whole, even to my masculine self, it’s so unyieldingly, analytically, megalomaniacally weird that I’ve gotta say:  Whoa.  Hold on a sec, guys.  Let’s take a few deep breaths, lighten up, and mellow out, lest we involuntarily commandeer a banana republic or something awful like that.

Yes, yes, I over-dramatize, but the men’s words really do read like the uptight offspring of a G8 meeting and the Dubuque City Council!

So, at this point, Dude Association Bylaws require me to inform Better Halves about how they might best achieve the aforementioned “Relaxation” of their Significant Others.  However, I cannot, for the DA recently revoked my advisor privileges because of an unfortunate misunderstanding: on February 7th, GenderAnalyzer.com reported that this blog is written by a woman!

GenderAnalysis's verdict.

The GenderAnalyzer's verdict.

As I’ve assured the Association, my behavior falls within the Safe Harbor carved out in Subsection 4.3.6(a):

Dudes may be female only on Halloween and/or within the privacy of their primary domicile.

I hope to prevail soon, and counsel advises me to refrain from any further comment.  Now, my lovely wife, do you know what happened to that size-24 black spandex mini?

Of Mason And Dixon

Yankees And Southerners Are Different!

The six formative years that I spent in the Southern U.S. gave me many things: a deep understanding of cockroaches, impeccable water skills, and a year-round tan.  And, last but not least, the precious, lingering gift of the word y’all.

I could sing the praises of y’all ’til the end of time!  Short for “you all,” it’s a simple, mono­syllabic utterance that evokes lemonade on the veranda, strolls through oaks and Spanish moss, and warm, uncomplicated, friendly times.  The essence of the South wrapped into four tidy letters and an apostrophe!  How could you not help but to love y’all, y’all?

Despite these feelings, my thoughts sometimes wander, and I find myself asking: could there be another such quirky little word buried in the Southern lexicon?

At such questions, I’m predisposed to throwing algorithms, and always on the lookout for an excuse to do some hard-core statistical data-mining.  So, as they say, the game was on!  An urgent signal went out to my crack team of computer scientists, and at our first meeting, we formulated a slightly-more-scientific query:

Could we quantify the differences between Southerner and Yankee, by analyzing the everyday communications of the average Joe?

Hell yeah!  First, we defined the Northeast as New Jersey, New York, Maine, and everything in between, and the Deep South as Louisiana, Mississippi, Alabama, Georgia, and the Carolinas.  Then, we gathered our raw data, on sale at a discount, from the aisles of the Internet Dot Com, in the form of 4,000 random blog feeds from a major social networking site, tied to our regions via user profiles.  After a bit of text extraction and some filtering to handle the degenerate cases (e.g. a post with a thousand repeats of “I love guinea pigs!”), we had a 5,000,000-word sample from the Yankees, and another of similar size for the Southerners.

We fed these into the Corpusculator, a custom suite of text-analysis software.  For several minutes, it rumbled, as regional differences percolated, and our bloggy inputs, in mutual opposition, slowly neutralized the smells of teen spirit.

Then, Eureka!  Out popped two lists: one for North and one for South, each cataloging the words that appeared in excess, as relative to the frequencies of the other region.

Via the wondrous Wordle, I built a word cloud for each, and assembled them into a two-chapter novella that I call “A Tale Of Two Regional, Multi-State Areas.”  Click on the picture below to see the whole thing, with the caveat that Northeasterners are quite fond of dropping the F-bomb, which appears prominently:

A section of "A Tale Of Two Regional, Multi-State Areas."

A section of "A Tale Of Two Regional, Multi-State Areas."

What we have here is two solid blocks of differential Zeitgeist, chock full of inter-regional revelations.  Yankees refer more to summer and winter - probably because in Dixie, the seasons are rarely more than a curiosity, but to the north, the difference between August and January is fundamental.  Northerners tend to reference books, while the South seems more preoccupied with the doctor.  Then, there’s the aforementioned profanity – with Yankees preferential to the F-word, and my dear Southerners given to damn, frankly.

As for my precious y’all?  Yup, there it is on the southern side.  A quick scan revealed that its kissin’ cousins – the other quirky Dixie colloquialisms - were all texting shorthand such as lol and omg.  Color me disappointed, but I suppose that’s that price of progress, y’all!

If you liked this post, more of the same will be coming down the pike, so stay tuned!