NumPy: NaN As A Placeholder

In this guide we will briefly rehash NaN values. As a reminder, NaN stands for “not a number” and its primary function is to act as a placeholder for any missing numerical values in an array. 

While we already covered a couple different ways to handle NaN values I’d like to go into a little more depth on some of the NaN functions in NumPy. The majority of the data you’ll be working with will be given to you, and as we’ve seen when we use Pandas to import a data frame any missing value is automatically replaced with NaN as a placeholder. But we can also mimic the same behavior directly in NumPy. So, let’s start by importing both the Pandas and NumPy libraries, then importing a data frame with some missing data and converting it to a NumPy array. 


Now, let’s make an identical array that we’ll call array underscore grades then we’ll use the array function to pass in the corresponding integers. Starting with 82 and now to create a NaN element we’ll pass in np dot nan, we’ll continue by passing in the next five integers, another NaN element, and then our last two integers.


The important thing I’d like you to take away from this is that all of our integers have been converted to floats. And that’s because NumPy has defined the NaN data type as a float, and due to implicit upcasting, all our elements have been converted to floats. 

NaN elements also take precedence over every other element when mathematical operations are used on them as well. And I’ll show you what I mean by taking the sum of all the elements in our array and also by multiplying a NaN element with a float. And as you can see, in both cases what we have returned is NaN.   


Now, there are a couple ways you can check your data frame for any unexpected NaN values, one of which returns a boolean array, and the other an indexed array. We’ll start with the boolean array and we’ll assign that to the NumPy function isnan and we’ll pass in array_grades. Next we’ll assign index_array to the NumPy function argwhere, and we’ll pass in exactly what we have above, so i’ll just copy and paste. 


What we have returned makes sense, as they confirm one another. Our boolean array returns a True value for the second and eighth element in the array, which corresponds to the first and seventh indexed position.

As you can imagine, it's incredibly important to first check for, then make sure you have either removed or replaced every NaN element during the preprocessing phase. Missing a single NaN element can cause major problems in your final result because of its ability to propagate throughout your data.