dating profile examples for men:8 Impressive Courting Profile Suggestions Regarding Adult men

We will create a Jupiter notebook to follow the exercise step by step and we will import a csv input file. We will use the scikit-learn, pandas, matplotlib and numpy packages.

We import the csv file -for simplicity, we assume that the file is in the same directory as the notebook- and we see the first 5 records of the file tabulated.

In this case we select 3 dimensions: op, ex and ag and cross them to see if they give us any clue about their grouping and the relationship with their categories.

Reviewing the graph, it does not appear that there is any kind of grouping or correlation between users and their categories.

We specify the data structure that we will use to feed the algorithm. As you can see, we only load the columns op, ex and ag in our variable X.

We are going to find the value of K by making a graph and trying to find the “cubit point” that we discussed earlier. This is our result:

The curve is actually quite “smooth”. I consider 5 as a good number for K. In your opinion it could be another.

Now we will see this in a 3D graph with colors for the groups and we will see if they differ: (the stars mark the center of each cluster)

Here we can see that the K-Means Algorithm with K = 5 has grouped the 140 Twitter users by their personality, taking into account the 3 dimensions we use: Openess, Extraversion and Agreeablenes. It seems that there is not necessarily a relationship in the groups with their Celebrity activities.

We will make 3 2-dimensional graphs with the projections from our 3D graph to help us visualize the groups and their classification:

And we can see the diversity in the work areas of each one. For example, in group 0 (red), we see that there are all work activities although activity 1 and 2 predominate, corresponding to Actors and Singers with 11 and 15 celebrities.

We will look for the users that are closest to the centroids of each group that we could say have the characteristic personality traits that represent each cluster:

In the centers we see that we have a model, a politician, a TV presenter, a radio host, and an athlete.

And finally we can group and tag new twitter users with their characteristics and classify them. We see the example with David Guetta’s user and it returns that he belongs to group 1 (green).

The K-means algorithm will help us to create clusters when we have large groups of unlabeled data, when we want to try to discover new relationships between features or to test or decline hypotheses that we have about our business.

Leave a Reply

Your email address will not be published. Required fields are marked *