Sunday, 30 December 2018

Naive Bayes in Machine Learning


Dataset of patients who had undergone surgery for breast cancer.
Features of dataset:
  • Age - Age of patient at time of operation.
  • Year - Patient's year of operation (year - 1900).

  • Nodes - Number of positive axillary nodes detected.
  • Class(Survived):
    1 - the patient survived 5 years or longer
    2 - the patient died within 5 year
  • Given the details of the patient we need to predict whether the patient survived or not.  

    Import required libraries

    
    # For mathematical calculation
    import numpy as np
    
    # For handling datasets
    import pandas as pd
    
    # For plotting graphs
    from matplotlib import pyplot as plt
    
    # Import the sklearn library for Naive bayes
    from sklearn.naive_bayes import GaussianNB
    

    Import dataset

    
    # Import the csv file
    df = pd.read_csv('data.csv')
    
    print df.head()
    '''
    Output:
       Age  Year  Nodes  Survived
    0   30    64      1         1
    1   30    62      3         1
    2   30    65      0         1
    3   31    59      2         1
    4   31    65      4         1
    '''
    

    Plot the classes against features.

    
    # We plot the data to see dependency of any 
    # feature on the class
    plt.xlabel('Feature')
    plt.ylabel('Survived') 
    
    X = df.loc[:,'Age']
    Y = df.loc[:,'Survived']
    plt.scatter(X, Y,color='blue',label='Age')
    
    X = df.loc[:,'Year']
    Y = df.loc[:,'Survived']
    plt.scatter(X, Y,color='green',label='Year')
    
    X = df.loc[:,'Nodes']
    Y = df.loc[:,'Survived']
    plt.scatter(X, Y,color='red',label='Nodes')
    
    plt.legend(loc=4, prop={'size': 7})
    plt.show()
    

     Prepare data for training

    
    # Prepare the training set
    X = df.loc[:,'Age':'Nodes']
    Y = df.loc[:,'Survived']
    

    Train the model

    
    clf = GaussianNB()
    
    # Train the model
    clf.fit(X,Y)
    

    Test the model

    
    # Test the model(returns the class)
    prediction = clf.predict([[12,70,12],
                              [13,20,13]])
    
    print prediction
    '''
    Output:
    [1 2]
    '''

No comments:

Post a Comment

Popular Posts