Friday, September 27, 2024

My Software Developer Machine for LLM Inference

I had to assemble a new PC for LLM inference work since my existing development lacked a discrete GPU so running any local LLM was extremely slow.  My aim is a dedicated development machine running Linux and will not be use for any gaming.  I wanted to keep the budget between $1500 and $2500 and wanted it to be quiet and not take up much space.  In the end, I had to make some compromises and this is what I ended up with:

Components

Gigabyte GeForce RTX 4070 Ti Super with 16GB GPU

Traditionally, a development machine didn't need a graphics card but that's different when it comes to working with LLMs so I started with the GPU and picked the .  The two makers are Nvidia and AMD but it seems like AMD cards are less supported and would require more effort to get it working.  The general consensus is that VRAM is the most important thing for running LLMs and you'd want to try to load as much of the model into the VRAM as possible so I aimed for 12-16GB*.  I picked the Gigabyte card because its length is on the shorter side and wouldn't require as large of a case to support it.

*Rough estimating the amount of memory needed by a model is to take the number of parameters and multiply it by 2 bytes (using 16 bit parameter type = 2 bytes).  A 7B model would use 14GB, 8B use 16GB, etc.  

**"How come my GPU only have 12GB of VRAM but I can a 8B model which would use 16GB?"  You might be using a quantitized/compressed model or you're using both the GPU VRAM and system RAM.  For example, if you download the default Llama 8B parameters model from Ollama, it's quantization is 4 so it doesn't take the full 16GB of memory.   If you don't have enough VRAM, then Ollama will use both system ram and VRAM:

Loading the 27B parameter Gemma model with quantization of 4 shows that it requires 18GB of memory and Ollama loaded 82% of it into the GPU's VRAM (~14.7 GB):

> ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL              
gemma2:27b      53261bc9c192    18 GB   18%/82% CPU/GPU 4 minutes from now

Nvidia shows that 14GB of its 16GB is being used matching what Ollama says:

> nvidia-smi
...
 0   N/A  N/A      2338      C   ...unners/cuda_v12/ollama_llama_server    14020MiB 
...

AMD Ryzen 7 7700X CPU (8-core, 16-threads, 4.5GHz base, 5.5Ghz Max Boost)

A solid performer for development work with integrated graphics.  AMD's integrated graphics are pretty good and by running my windows manager and GUI through the integrated graphics allows me to save the GPU for the LLMs and not use any of the GPU's VRAM.

ASUS B650M-PLUS WIFI AM5 Motherboard 

I didn't want a fancy motherboard with RGB but I did want something that is compatible and can support modern peripheral.  

Corsair Vengeance DDR5 64GB Memory

Got the DDR5 memory for the speed and 64GB since when the VRAM isn't enough for the LLM some of it will be loaded into memory.

be quiet! Pure Rock 2 CPU Cooler

I don't plan to over clock this system and the Pure Rock is well rated for being quiet and affordable.

Corsair RM750e (2023) Power Supply

Corsair 4000D Airflow Mid-Tower ATX Case

Although I would much prefer a small form factor case, doing so limits the options for the GPU and other components so this is a compromize.  The 4000D case comes with two fans and has good airflow.  When the system doesn't get as hot, the fans don't have to work as hard and the system is quieter.

Crucial P3 Plus 2TB M.2 SSD

The 1TB was out-of-stock and the 2TB was still relatively well priced.

The total price for the system came under $1900 so I was able to stay in my budget range.

Assembly and Usage

All the components came together with no compatibility problems.  I was able to get it POST and installed a fresh copy of Fedora 40.  Wifi, sounds, video, etc. all worked on the first boot.

When I first installed Ollama, it installed with support for the AMD integrated GPU since I haven't installed the Nvidia drivers yet.  Once the drivers were installed, Ollama recognized and used the NVidia GPU.  Make sure you plug your display cable to the motherboard's video port and not the Nvidia card's ports.  If you do the latter, the window manager and everything else will use the Nvidia card by default.

You can check which GPU is used using nvidia-smi to see what is running on the Nvidia card.  For AMD:

glxinfo | grep "OpenGL renderer" # See what the system is using
sudo lsof /dev/dri/* # Shows what is running on it.

Saturday, September 21, 2024

Installing Nvidia and CUDA drivers on Fedora for Ollama

Ollama, a tool that lets you run Large Language Models (LLMs) on a local machine, supports the use of GPUs for better performance if you have a supported graphics card and corresponding drivers.  Having recently gotten an supported Nvidia card, I wanted to get it working with Ollama but found the available documentations on how to install the Nvidia and CUDA drivers confusing because there are multiple ways to install.  Depending on where you started your search for instructions, it can take you down different paths.

If you started on the Ollama install instruction page directs you to the Nvidia CUDA Toolkit Downloads page to have you add their CUDA repository to your Fedora instance.  From the repository you can install the CUDA toolkit, modules and drivers (CUDA and Nvidia).   For some reason, the repository currently are tagged for Fedora 37 and 39, but they seem to work for Fedora 40.  I'm not sure if that will always be the case or will work with future versions of Fedora.

If you first go to Nvidia's site to search for the driver, it will direct you to their drivers download page where you can download a .run script to install the Nvidia drivers (not CUDA).  This works but bypasses your package manager so I'm not sure if conflicts will arise in the future.  It also seems to be separate from what is in the CUDA repository so I'm not certain if there might be conflicts now or in the future.  As of this writing, installing the drivers from from the .run script and installing the CUDA toolkit from the repository does work, but I didn't install the Nvidia drivers from the repository.  

If you start with a web search or Fedora forums, the answer there is to install Nvidia from RPMFusion which has both the Nvidia and CUDA drivers.  This seems to be the most compatible version for Fedora.  If you're already using RPMFusion then it is really your only option since RPMFusion and Nvidia's repo are not compatible and will require you to do some DNF magic to get the two working together.  I also like this option because Ollama only needs the CUDA drivers and not the whole toolkit (I think you might be able to just grab the CUDA part from Nvidia's repo but their instruction directs you to download the whole toolkit).

Installing Nvidia and CUDA for Fedora

Here is how I installed a fresh new instance of Fedora 40 with Nvidia and CUDA drivers to work with Ollama.

I created a Fedora 40 Cinnamon Spin boot drive with the Fedora Media Writer and booted up the machine with it to do a clean install.  Once it finished with the installation, I rebooted the machine and set up a network connection so I can download updates and the drivers.

Open up a terminal and change the run level to 3 (multi-user command line --no GUI--)

sudo init 3

Because the first time you run sudo dnf update it'll probably update a whole bunch of the windowing systems and might cause your current window manager to crash, this avoids having the GUI and windowing system from running when you're doing the update.

Once in command-line mode, update the system with the latest packages and kernel:

sudo dnf update

Once it's been updated, reboot the system to be running on the latest kernel.  

sudo /sbin/reboot now

I went back into the level 3 since I'll be updating the graphics drivers but this time I did at the GRUB boot menu.  When the boot menu comes up, hit the 'e' key to edit and at the end of the linux line add the '3' and then CTRL-X to continue booting.   This change is not permanent. 

Install the developer tools needed to compile the modules:

sudo dnf groupinstall "Development Tools"

Now it's time to add the RPMFusion free and nonfree repos so Fedora knows where to download the drivers and modules.

You want to import the GPG key for the RPMFusion free and nonfree repos to verify that repo install packages are the actual ones:

sudo dnf install distribution-gpg-keys

sudo rpmkeys --import /usr/share/distribution-gpg-keys/rpmfusion/RPM-GPG-KEY-rpmfusion-free-fedora-$(rpm -E %fedora)

sudo rpmkeys --import /usr/share/distribution-gpg-keys/rpmfusion/RPM-GPG-KEY-rpmfusion-nonfree-fedora-$(rpm -E %fedora)

Add the repository to Fedora:

sudo dnf --setopt=localpkg_gpgcheck=1 install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm

sudo dnf --setopt=localpkg_gpgcheck=1 install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm

Install the Nvidia drivers with:

sudo dnf install amkod-nvidia

Make sure that to give the system time to compile the modules AFTER the package install!  

Check that the amkod-nvidia is fully built with:

modinfo -F version nvidia

Install the CUDA drivers

sudo dnf install xorg-x11-drv-nvidia-cuda

Reboot the machine again and let it automatically go to the GUI (runlevel 5).

In a terminal, check that the Nvidia driver is being used:

nvidia-smi

Now you can install Ollama which should tell you that it has Nvidia GPU support at the end of the install.

Thursday, May 30, 2024

Upgrading from Fedora 39 to 40

Upgraded to Fedora 40 following the normal procedure  hoping that it might resolve an annoying issue that started in April where SELinux kept alerting that dns_resolver is trying to setattr.  The issue is very similar to this bug except it's setting a different key.

The upgrade went without any problems but the same alerting continued.  It seems like it is caused when trying to mount an smb shared drive.  For these issues, I can usually wait for a few days and a fix is issued but in this case the bug remains open and the frequency of the alert seemed to have increased.  :-(


Saturday, April 13, 2024

Upgrading Fedora 38 to Fedora 39

 Even though Fedora 40 is coming out in a few days (or because of it), it was time to upgrade from 38 to 39.  

No immediately noticeable problem.  It downloaded the packages, installed and rebooted with no problems. 

One reason is that my Fedora 38 ran into problems with NUT being able to load the driver to the UPS.  This seemed to be because Fedora 38 had upgraded the NUT packages and there's an issue.  I noticed that Fedora 39 actually uses the previous package version and when the system came up after the upgrade, NUT was working again.

Friday, April 12, 2024

Remembering Multi-Monitor Layout on Linux

Problem

I currently have the three monitors daisy-chained so the PC is connect to one monitor through USB-C DisplayPort and then daisy-chained to the other through standard DisplayPort cables.  This set up has a problem when the PC suspends or restarts.  Sometimes the first display gets no signals from the PC (thus none of the others does either) or just the last display gets no signals.  In either case, turning off-and-on will bring up all the displays (which I'm guessing allows the PC and monitor to do their handshake properly), but the layout of the displays are forgotten.  

Diagnoses

There seems to be two problems:  

  1. One is that that when the system comes up that the monitor is not able communicate with it causing it to not wake up.  This could be some kind of race condition, but I'm not certain.
  2. The display output names gets changed where Monitor A is sometimes referred to as DisplayPort-4 and another time it can be named DisplayPort-1.  The info I've found seems to point to this being caused by the video drivers who is responsible for setting the output names (e.g. DisplayPort-1, DP-1, etc.).

Solution(s)

I don't have a solution for #1, but the solution/workaround for #2 can mitigates it.

Linux uses the RandR (Resize and Rotate X Windows Extension) and the xrandr tool can be used but the parameters becomes long when there's multiple monitors, different rotations and positions that are relative to each other.  Another tool that helps is arandr which is a GUI front-end to xrandr that has a feature to save layout into a script that can be run.  Once you have your layout setup, it can export it to a script that you can re-run, but I ran into two problems:

  1. It doesn't capture the order of the displays (which is on the right, middle and left) so while all three might be in the right rotation, they are not in the right order.
  2. The name of the display sometimes changes (e.g. a display might once be called DisplayPort-1 get tuned into DisplayPort-4) and then the script also doesn't work.
Finally, I found a tool called autorandr which will finger print the display so that even though the name changes, it still knows which monitor is which.  It can also save different profiles such as a laptop with no external monitor and another one with one external monitor which it will then recognize and load the appropriate profile automatically.  On Fedora, installing the autorandr package will install /usr/lib/udev/rules.d/40-monitor-hotplug.rules which tells it to run autorandr. 

Since the issue I'm having is that the output names change, I needed to tell autorandr to use the fingerprint signature which is done with the '--match-edid' flag.   One option is to modify the udev rule with this flag (remember to use systemd-delta to check future differences with the distribution package version) or manually run autorandr with the flag.   I did the latter (/usr/bin/autorandr -c --default default) since I'm still testing to see if it does what I hope it'll do.

Wednesday, March 6, 2024

Gyudon (Beef Bowl)

 Originally from Adam Liaw:

  • Thinly sliced beef
  • 3/4 cup chicken stock
  • 1/4 cup soy sauce
  • 2 tbsp sake
  • 2tbsp mirin
  • 3 tsp sugar
  • 1 brown onion
Add chicken stock, soy sauce, sake, mirin and sugar to top and bring to boil.
Add onion until soften.
Add beef and stir until beef is cooked.

Monday, March 4, 2024

Buying Plywood - Cuts, Cores, Matches and More

Buying lumber from a lumber yard can be intimidating, but surely plywood is simpler... right?  Plywood is a manufactured product that has a more controlled process and standardization then harvesting lumber, but there are still a lot of variations in plywood that makes buying plywood more complicated then if you were to buy a PlayStation off the shelf.  

I don't buy plywood frequently, each time I do I have to refresh myself on all the different terminology and options that I get back from the lumber dealer so I decided to write a post to myself to save my time re-searching the internet on what each thing means.

Core Materials

Plywood is made of layers of wood materials sandwiched between the wood veneers that gives it's look.  Walnut plywood is made from two walnut veneers with enough the material ("core") between them until it is the thickness desired.  The layers gives plywood its stability and not have the wood movement that lumber typically have.

Face and Back Grades

The face and back of plywood have grades that describes its quality:

Face grades:

  • AA - Premium, architectural quality for interiors, case goods and high end furniture.
  • A - Not as high as AA but still excellent appearance. 
  • B - Less perfect and consistent than A panels but more economical.
  • C - More defects and variations.  Not as attractive so good for less visible applications.
  • "Shop" grade - Panels that have some imperfection that causes the sheet to not meet the grade (e.g. A1 or C2).

Back grades


These goes from 1 (best) to 4 (worst).  Grade 3 & 4 allows for open defects.

Baltic Birch Grades

Baltic Birch uses a different grading system:

  • B/BB - one face free of "footballs".
  • BB/BB - An average of 4 to 6 footballs per face
  • BB/CP - An average of 4 to 6 football on one face and unlimited footballs on the back
(BB/CP example)


Veneer Core


Veneer core means the odd-number of layers between the face veneers are made of sheets of wood layered in alternating grain direction for stability.  Lighter and have strong screw holding power, but might not be as flat if there are imperfections in the core layers which can show through (aka "telegraph").

There are processes to address this such as Columbia Forest Product's MPX core to make veneer core smooth and reduce telegraphing.  MPX is Columbia's registered trademark for basically using smooth hardwood crossbands in the core to smooth out the veneer.

MDF Core

MDF core consists of using medium density fiberboard between the face veneers which is very stable and uniform.  MDF can be heavy and can swell up and dissolve when wet.

Combination Core

This core uses a combination of MDF and wood veneers between the faces.  

Veneer Cuts

The way the veneer is cut effects the appearance, properties and cost of the veneer.  For example, plain sliced cuts produces veneers with the "cathedral" patterns while a quarter cut produces a more thin line pattern.



A rotary cut can produce Whole Piece Face (WPF) veneer where the entire face of the plywood is a single piece of veneer and can be more economical to produce.  In order to make a full sheet of plywood from the other cuts, the strips of veneer of placed side-by-side so there can be a fine seam between the strips.  How the strips are placed is called matching which is discussed more below. 


Veneer Match

Unless it is a whole piece face, the veneer panels needs to be placed side-by-side in order to crate a full piece of plywood.  How these panels are ordered is what is called veneer matching.  There are multiple ways to do this and I provide some links in the references below that describe them but the common ones are:

Slip Matching

Slip matching places each panel next to each other without turning or flipping them over.  This creates a repeated look.

Sequence Match 

Sequence match requires that the panels come from the same log and be more consistent panel-to-panel.


(start here)



Book Matching

Book matching turns the panels over so that two adjacent panels mirrors each other much like how you open a book.

References

  1. https://www.decorativehardwoods.org/sites/default/files/2022-02/HWPW%20Handbook.pdf
  2. https://chesapeakeplywood.com/architectural-plywood/
  3. https://www.columbiaforestproducts.com/library/reference-guides/grading-guide/veneer-cuts-and-matching/
  4. https://www.columbiaforestproducts.com/2015/08/29/matching-confusion-uncomplicating-an-overused-term/
  5. https://www.columbiaforestproducts.com/library/reference-guides/core-types/
  6. https://www.archtoolbox.com/wood-veneer-matching/
  7. https://awiqcp.org/news-and-blog/wood-veneers-matching/ -- sequence matching have a higher standard for matching more then slip (which is also layers it in sequence.
  8. https://www.decorativehardwoods.org/pdfs-available-download

Sunday, January 7, 2024

Reacher Season 2 - My Reaction So Far

Season 1 of Amazon's Reacher was a surprisingly entertaining show with a great cast that showed clear chemistry with each other.  The banter between the characters were fun to watch rather than annoying and the pacing at which each character's background is revealed kept me engaged through the entire season.   Unfortunately, season 2 has not had that same ingredients.

Most of the new characters already had a developed relationship so the character development happened mainly through flashbacks and the chemistry between them were lacking or lacked tension.  The pacing also feels more off this season there lacks any mystery to events and each episode felt a bit like the previous episodes.

Two more episode remains in this season and hopefully it picks up pacing and provide a satisfactory ending that will hold over until season 3.

Friday, January 5, 2024

Anime to Start the New Year - The Apothecary Diaries

For the first post of 2024, I'm starting with a positive review of The Apothecary Diaries.

Originally a Japanese light novel and then a manga before being released as an anime starting in October, 2023.  The Apothecary Diaries takes place in a fictional imperial China and follows a young Chinese girl who loves studying and making medicine.  With a pragmatic acceptance of realities of social norms of feudal China, the protagonist nevertheless ends up rising in prominence within the imperial court.

I enjoyed the characters and mysteries surrounding our heroine and the relationships she establishes with members across the social spectrum.

Unlike many modern anime, The Apothecary Diaries immediately secured not just a one season but two seasons of episodes (24) and as of this writing is half way through the initial 24 episode run.  I've been fully enjoying the anime and would recommend.


Friday, December 22, 2023

Upgrading to Fedora 38

Upgraded from Fedora 37 to 38 following the standard instructions.  There were no errors indicated during the upgrade.  The packages was downloaded, installed and the system rebooted.   When I came back to the machine after the upgrade and got to the login screen, my USB mouse was functioning sporadically.  I could still log in with the keyboard but it felt slow and Fedora had a warning that something didn't load correctly.

I powered down the system and then turned in back on (the usual if the hardware isn't working first give it a "kick") and everything seemed to be working normally.  It might be because my mouse is connected to the desktop through the monitor's USB input?

Once I was back on, I did a 

sudo dnf update

to see if I was current and it gave the following error

Problem 1: cannot install the best update candidate for package libheif-freeworld-1.15.1-4.fc38.x86_64
  - nothing provides libheif(x86-64) = 1.17.5 needed by libheif-freeworld-1.17.5-1.fc38.x86_64 from rpmfusion-free-updates
 Problem 2: problem with installed package libheif-freeworld-1.15.1-4.fc38.x86_64
  - package libheif-freeworld-1.15.1-4.fc38.x86_64 from @System requires libheif(x86-64) = 1.15.1, but none of the providers can be installed
  - package libheif-freeworld-1.15.1-4.fc38.x86_64 from rpmfusion-free requires libheif(x86-64) = 1.15.1, but none of the providers can be installed
  - cannot install both libheif-1.16.2-2.fc38.x86_64 from updates and libheif-1.15.1-2.fc38.x86_64 from @System
  - cannot install both libheif-1.16.2-2.fc38.x86_64 from updates and libheif-1.15.1-2.fc38.x86_64 from fedora
  - cannot install the best update candidate for package libheif-1.15.1-2.fc38.x86_64
  - nothing provides libheif(x86-64) = 1.17.5 needed by libheif-freeworld-1.17.5-1.fc38.x86_64 from rpmfusion-free-updates
================================================================================
 Package             Arch     Version            Repository                Size
================================================================================
Skipping packages with conflicts:
(add '--best --allowerasing' to command line to force their upgrade):
 libheif             x86_64   1.16.2-2.fc38      updates                  298 k
Skipping packages with broken dependencies:
 libheif-freeworld   x86_64   1.17.5-1.fc38      rpmfusion-free-updates    59 k

Transaction Summary
================================================================================
Skip  2 Packages

Nothing to do.
Complete!
The output's suggestion of using --best and --allowerasing didn't work so search yielded two threads about this issue from months ago:
  • https://discussion.fedoraproject.org/t/unknown-update-error-with-libheif/81302
  • https://discussion.fedoraproject.org/t/rpmfusion-free-updates-libheif-freeworld-and-libheif-version-conflict/82240/7
Although the threads implied that it's been resolved and people had proposed different workarounds the simplest solution for me was simply to remove libheif-freeworld package:

sudo dnf remove libheif-freeworld

This seemed to resolve the issue.

The above threads indicated that a fix was submitted but it might now be broken again since when I searched for the various package versions on Fedora and rpmfusion, the working versions of the packages weren't there:

sudo dnf search --showduplicates libheif
Maybe it's the timing of my upgrade that ran into this problem.  Fortunately, it was a quick and easy fix but sadly I can't say that this was a completely seamless upgrade.