Micron’s 3D XPoint Could Revolutionize AI – Micron Technology Inc. (NASDAQ:MU)

On May 21, 2018, Micron (MU) held their annual analyst and investor conference. The conference had many exciting bombshells including a $10 billion stock buyback program and much discussion about Micron’s positioning in the DRAM and NAND market. Many articles on Seeking Alpha have already covered the conference in depth so we are not going to do so here. However, we do encourage you to read the conference transcript that can be found here, and an excellent article by Joe Albano entitled “Micron: The $10 Billion Shot Heard ‘Round The World.”

The one thing that was conspicuously missing from the investor conference was any details or specifics about the company’s 3D XPoint technology. Sanjay Mehrotra, Micron’s CEO, in his presentation did address 3D XPoint and hinted that Micron intends to start shipping 3D XPoint in 2019.

In terms of 3D XPoint, it’s a technology that has exciting potential 10 times – better chip density is achievable compared to DRAM and thousand times better endurance light capability compared to NAND, and thousand times faster than NAND as well. These specifications really create a significant value proposition for 3D XPoint for solutions that are well placed between the DRAM and the NAND in memory and the storage hierarchy. We are working with our customers in terms of product development. And as we have said earlier, we will be having products in 3D XPoint in 2019, launching those products in the latter part of 2019 timeframe.

– Sanjay Mehrotra, CEO, Conference Transcript

Sumit Sadana, the company’s chief business officer, also reiterated that they’re not ready to talk about the technical details behind 3D XPoint implementation for now as they are still working with partners on implementation.

Our 3D XPoint products, I am not going to provide more details on them today, because we will be introducing these products next year and for competitive reasons I don’t want to preempt some of the work that is work going on between us and our customers.

Sumit Sadana, Chief Business Officer, Conference Transcript

In this article we are going to briefly explain how artificial intelligence and more specifically machine learning works in real life and speculate about the future of 3D XPoint. Our speculation is based on our personal, in-depth knowledge of machine learning combined with public statements made by the Micron team.

What is AI and Machine Learning?

First, let’s get on the same page on what we mean by artificial intelligence and machine learning. AI is a general purpose term that applies to any kind of technology that allows a computer to perform tasks that are usually performed by a human. Examples of this can include anything from playing chess to sorting mail to recognizing pictures of cats and dogs to driving a vehicle.

Machine learning, or ML for short, is a subset of AI. It’s a technique for creating AI by showing a computer sets of inputs and desired outputs and allowing the computer to “learn” from these sets about how to perform a specific task.

For instance, let’s say you want to train a computer to recognize photographs of cats. One way to approach this task would be to describe very detailed heuristic rules for determining whether something is or isn’t a picture of a cat.

So for instance, you might specify that cats have fur and pointy ears. But then of course, not every cat has pointy ears or even fur. This makes the rule based approach to AI rather fragile, especially when it comes to edge cases.

By contrast, the ML approach relies on showing the computer thousands and thousands of pictures of different cats and having the computer come up with its own rules of what a cat is. For image recognition the most common models in use today rely on a neural network technique. Detailed discussion on how neural networks work is beyond the scope of this article, however a simplified diagram of one such network is provided below for reference and to help you think about how this can relate to memory and storage.

Diagram of a simple neural network with one input layer, two hidden layers, and one output layer. Prepared by Zynath Capital.

The above diagram is simplified to show a picture of 9×9 pixels. In reality much larger input sets are used with hundreds of thousands or even millions of features.

So now that we have a bit of an idea of how a neural network looks, at least conceptually, let’s jump back to our cat example.

The cat pictures from the training example are normalized to a specific size and then broken down into their individual pixels and the values of those pixels are fed into an ML model. The model performs forward propagation (the model considers whether or not the inputs given to it correspond to a picture of a cat) and outputs a probability of the likelihood that the picture provided to it is of a cat.

During the training phase of the model, it is told whether or not it has answered correctly. So if the picture given to the model was indeed of a cat and the model has gotten the answer correctly, then the model is reinforced. If the model has gotten the answer wrong then the penalty (difference between the correct answer and the answer given by the model) is back propagated through the model to adjust the individual weights, so hopefully it can do better next time.

Hardware Requirements

Math involved in calculating the cost of a given prediction in a neural network. Photo provided by Zynath Capital.

The rather complicated mathematics involved in forward and back propagation are beyond the scope of this article, but the important thing to note is that the calculations involved require thousands of linear algebra operations. If you remember linear algebra from high school or college, it’s basically mathematical operations with large data sets organized into matrices and vectors. Hopefully, this gives you some intuition on why GPUs have been so popular of late for machine learning applications. Linear algebra can be easily parallelized and GPUs are excellent at parallel mathematical computations.

What’s a little bit less intuitive is the amount of memory that’s involved in this process. Take, for instance, our picture of a cat. Let’s say it’s a picture of 1,000 x 1,000 pixels, a pretty small picture by today’s standards, but even that small of a picture has over one million individual features (pixels) and each of those pixels has to be processed by the CPU in order to assess the “catness” of the picture.

Now that you’ve pictured just how much computation and processing the model has to do on one picture, imagine doing the same on datasets on millions and millions of pictures. In the real world, it’s not uncommon to have datasets that are as large as 2 or 3 Terabytes or more, especially if we are talking about fields like genetics and astrophysics.

To train the model quickly you need to load as much of your data set into memory (RAM) so that you can load up your powerful GPUs and CPUs with parallelized computational tasks. CPUs are so powerful nowadays that we are getting to the point that feeding the CPU with data is becoming the bottleneck. To date we have been solving this problem by increasing the DRAM contents of the system and preloading the DRAM with the datasets we are working with. Sumit Sadana addressed this exact issue in his remarks during the conference:

It’s a well-known fact inside cloud companies that processors spend a lot of their time simply waiting for data. And as the core count in all of these newer processors has increased substantially over the years, the amount of memory you can attach to these processors hasn’t gone up as much and consequently the amount of bandwidth available by per core has actually fallen.

DRAM further has one significant drawback – it’s volatile. Imagine spending several days and ungodly amounts of CPU and power resources to compute new weights for your new and revolutionary cat recognition ML model only to have the power in your building be interrupted or the computer halting for some hardware or software related reasons. With DRAM you would lose everything and your model will be back to thinking that tables are cats, they both have four legs, after all. This is exactly where 3D XPoint comes in.

The Beauty of 3D XPoint

3D XPoint bridges the chasm between NAND memory (SSD storage) and DRAM memory (RAM). As Sumit Sadana puts it: “3D XPoint is persistent memory, it’s not as fast as DRAM but substantially faster than NAND and unlike DRAM it retains its state without power.”

Test results screenshot from Linus Tech Tips video.

When it comes to raw read and write speeds 3D XPoint is much closer, almost identical to, regular NAND memory. In the tests performed by Linus Tech Tips, a popular YouTube hardware review channel, Intel’s (INTC) Optane drives which use the same 3D XPoint technology scored roughly 2GB/s read and write speeds, which is inline with the latest Samsung (OTC:SSNLF) NAND SSDs. By contrast, a RAMdisk (a virtual disk created from a DRAM module) can read or write at speeds exceeding 8GB/s. However, where 3D XPoint behaves a lot more like DRAM is in its latency.

Latency is a measure of how fast a given storage media can respond to requests. So if a CPU requests a picture of a cat, NAND and 3D XPoint will both be able to provide the CPU with that picture at a rate of roughly 2GB/s, but the 3D XPoint module will start the transfer of information much, much, sooner (on a CPU time scale) than a comparable NAND module. 3D XPoint’s response time is close to that of DRAM.

Test results screenshot from Linus Tech Tips video.

Another way to think about it is this way. If you want to read 60 GB of contiguous data from a storage both NAND and 3D XPoint will perform roughly equivalently in terms of their raw speed. However, if you want to make 120,000 individual read requests in random order from storage to read for instance, 120,000 individual 500 KB cat pictures, a 3D XPoint module will finish processing those 120,000 requests far faster than a NAND module.

The other significant advantage of 3D XPoint is its durability. While modern NAND can be written to a few hundred thousand to a million times before they experience degradation, 3D XPoint’s durability is much more comparable to that of DRAM. 3D XPoint does not degrade with repeated writes.

3D XPoint Implementation

By now we know a little bit about how machine learning works and understand the performance characteristics of 3D XPoint. Now, let’s look at how 3D XPoint can be used very effectively to speed up, and dare I say revolutionize, machine learning. But first, let’s look at another quote from Sumit Sadana for hints of just what Micron might be working on when it comes to 3D XPoint (emphasis ours):

It’s a well-known fact inside cloud companies that processors spend a lot of their time simply waiting for data. And as the core count in all of these newer processors has increased substantially over the years, the amount of memory you can attach to these processors hasn’t gone up as much and consequently, the amount of bandwidth available by per core has actually fallen.

And that is the reason why, having the ability to use 3D XPoint to expand the addressable memory for these processes, is so important because it actually gives you a bigger payoff and performance than simply going to the next version of the processor or faster speed of processor alone. Future processors are going to allow for more memory to be attached to the processor and that is going to be another driver of average capacity in these servers.

The key phrase in the above quote is “addressable memory.” What exactly does that mean? You see, a CPU can’t directly address all the memory in the computer. It can talk directly to the DRAM, but not to the hard drives or SSD (NAND) drives.

Diagram of memory access in a system with and without 3D XPoint. In the diagram above a photo of Intel’s Optane module is used for illustration purposes only. Currently available modules do not have adequately fast interfaces and cannot be directly addressed by the CPU. Diagram provided by Zynath Capital.

In the above diagram notice how while a CPU can directly address any data store in a DRAM module it cannot do the same with an SSD hard drive. Instead to access data on the SSD the CPU has to communicate with the storage controller, ask the storage controller to take a chunk of data from the hard drive and place it into RAM. Only after that operation is performed can the CPU access the requested data by reaching out and grabbing it from RAM. Writing to the SSD is the reverse of the reading procedure. The CPU has to write some data into RAM and then ask the storage controller to grab that data from RAM and write it back to SSD. As you can see, there’s a significant level of overhead involved.

By contrast, you can see the right hand side of the diagram showing what an implementation that uses both DRAM memory and 3D XPoint memory in conjunction could look like. In that model the CPU can directly access memory pages in both DRAM and 3D XPoint storage.

Linus Tech Tips did a video testing just this concept where they used Intel’s Optane drive to supplant the memory on a test machine. The results showed that even in its current implementation, without special OS level kernel provisions, and connected over an M2 interface, the Optane drive which uses 3D XPoint memory is fast enough to fully saturate a top of the line CPU.

Test results screenshot from Linus Tech Tips video.

To properly implement this system for maximum performance Micron would have to work with OS developers (Linux and Windows, with the emphasis on Linux as it is used on most machine learning tasks) to develop essentially a new level of memory. In a computer system, you have Level 1, 2, and sometimes 3 caching memory followed by what’s commonly known as RAM or DRAM memory that we all know and love. Micron would have to work on drivers that would implement another layer of memory, slightly slower, but persistent and less expensive than DRAM and that memory of course be based on the 3D XPoint technology.

This can be implemented relatively transparently to the rest of the system where the system would see the entirety of the random access memory but the kernel would allocate memory pages for actively running applications in DRAM while pushing data and less often used, but still currently running applications, in to the 3D XPoint range of the total addressable memory.

This would be extremely beneficial for machine learning models by allowing the server to load the entirety of the dataset into addressable memory and then have the CPU go at it and start forward and back propagation on the training set for training purposes.

More specifically, if you refer to the neural network diagram in the AI section above, the ideal implementation would load the dataset represented by X1, X2….and so on into the 3D XPoint memory while keeping the main parts of the model, hidden layers 2 and 3 in our diagram, in the main DRAM. The weights of the model commonly represented by theta, Θ, would be stored in DRAM and mirrored to 3D XPoint for backup in case of a hardware or software crash.

Direct access by the CPU to vast amounts of fast and low latency memory will allow the CPU to be fully loaded most of the time. This translates into better return on investment, shorter model training sessions, and overall significant improvements in machine learning tasks.

Investor Takeaway

Micron has shown with this latest conference that they can execute and execute well. They are firing on all cylinders and we believe that they deserve a much higher multiple than the market is giving them right now. If they are able to execute on 3D XPoint in the manner discussed above I think we should see a significant renewed interest in Micron and a departure from the old narrative of “cycle, cycle, commodity supplier.” If they can deliver on nonvolatile addressable memory which is well integrated with operating systems such as Linux and Windows they can create an entirely new memory class and address the growing demands of machine learning.

Our previous price target for Micron has been $70 to $100. Those targets have not changed for now as much depends on the software support and implementation of 3D XPoint as an addressable system memory. However, with smooth software implementation and AI industry uptake Micron can easily skyrocket much like Nvidia (NVDA) has in the past few years. We try to stay away from making short-term predictions because Mr. Market is maniacal and often acts insanely especially when it comes to this stock but our long term outlook on Micron remains extremely positive.

Disclosure: I am/we are long MU, INTC.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.