Facebook Application for User Profile Data Analysis

As part of a university research project a data mining tool was developed in April 2008 and deployed as an application within the popular social networking site facebook. Between March and April 2008 data was extracted from 58 facebook profiles and after analysis, relationships between individual data fields were identified. 

A few Images of the project (click on image to view)

fedaks research project screen 1 fedaks research project screen 2 fedaks research project screen 3 fedaks research project screen 3

Analysis of collected research data

When analysis of collected data began in April, a total of 52 facebook users had installed this application. Work began by categorising the collected information into quantitative and qualitative data. Quantitative data included date of birth, gender, wall count, note count, relationship status and religion. Qualitative fields such as activities, interests and favourite music were also processed quantitatively by only assessing if they actually contained data.

After browsing though the raw data collected and noticing trends in gender, this became the prime focus of data analysis. While 88% of users had published their gender in their profile only 37% were female. This trend diversity in gender of participation in the research project may be effective in proving that men are more likely disclose identity information than women. Unfortunately as little is known about the distribution of the application from user to user there is no evidence to test this theory. The application could have just simply been distributed by more men than women. Again, however this is a theory impossible to prove without necessary data. Casting aside the trend in gender it was used as a static variable and tested against other data variables.

Firstly gender was tested in relation to religion and analysis uncovered another trend relating to gender. While 44% of men published religion in their profile only 26% of women did. This could suggest that like before males are more likely to disclose information than women, but again this theory is hard to prove given that the majority of user profiles analysed were male. Perhaps a more deciding factor in the testing the validity of the statement is that only 34% users actually published a religion in their profile.

Secondly gender was tested in relation to date of birth, the most significant other variable in regard to identity disclosure. While 80% of users published the date of their birthday in their profile, 60% also published the year in which they were born. However, this time analysis uncovered that 84% of females published their birthday in their profile as appose to only 75% of males. Despite trends in gender participation this was understandable as it had already been highlighted in previous facebook research that females were more likely to disclose their full birthday. Another interesting discovery was that an equal 63% of both male and females entered their full birthday including the year they were born in their profile.

As notes count and wall count were made exempt from analysis, so too were fields the qualitative information fields activities, TV, music, movies, quotes, books, about me and interests. Although these were originally quantitatively processed and represented graphically during analysis, as the data was not consistent, no regular patterns in the data could be found using the current analysis method. Another important variable whose relation to user gender was not tested is relationship status. Over 70% of users had published their relationship status in their profile and nearly half (48%) were listed as either in a relationship or engaged. Although no tests were officially ran on the data an in depth look at raw profile data gathered suggested that a user listed as single were more likely to publish qualitative data such as interests and about me in their profile. Provided this theory had been tested it would no doubt have uncovered an insight into how Facebook profile data is related to relationship status and possible theories as to why.