Getting started with Python for data science


Payal Singh


are excellent FOSS tools for beginners and experts alike. SQL is great for querying databases, but to perform complex and resource-intensive data science operations, storing data in ndarray boosts efficiency and speed (but make sure you have ample RAM when dealing with large datasets). ASSAULT’ ‘LARCENY FROM AUTO’ ‘HOMICIDE’ ‘BURGLARY’ ‘AUTO THEFT’ ‘ROBBERY - RESIDENCE’ ‘ROBBERY - COMMERCIAL’ ‘ROBBERY - CARJACKING’ ‘ASSAULT BY THREAT’ ‘SHOOTING’ ‘RAPE’ ‘ARSON’] which returns a NumPy array (ndarray):  »> type(crime_stats[‘Description’].unique()) <class ‘numpy.ndarray’> Next let’s feed this data into a neural network to see how accurately it can predict the type of weapon used, given data such as the time the crime was committed, the type of crime, and the neighborhood in which it happened:  »> from sklearn.neural_network import MLPClassifier  »> import numpy as np  »>  »> prediction = crime_stats[[‘Weapon’]]  »> predictors = crime_stats[‘CrimeTime’, ‘CrimeCode’, ‘Neighborhood’]  »>  »> nn_model = MLPClassifier(solver=’lbfgs’, alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)  »>  »>predict_weapon = nn_model.fit(prediction, predictors) Now that the learning model is ready, we can perform several tests to determine its quality and reliability. In this example, we can use the inverse_transform function of LabelEncoder() to see what Weapons 0 and 4 are:  »> preprocessing.LabelEncoder().inverse_transform(encoded_weapons) array([‘HANDS’, ‘FIREARM’, ‘HANDS’, …, ‘FIREARM’, ‘FIREARM’, ‘FIREARM’] This is fun to see, but to get an idea of how accurate this model is, let’s calculate several scores as percentages:  »> nn_model.score(X, y) 0.81999999999999995 This shows that our neural network model is ~82% accurate. Although our model has high accuracy, it is not very useful for general crime datasets as this particular dataset has a disproportionate number of rows that list ‘FIREARM’ as the weapon used.


Visit Link


Tags: