How to achieve live encoder stability

Stability is everything and more

This article focuses on how you can interpret error codes and use them to fix stability issues.

This article will help you address stability issues on your live encoder. Having a stable encoder means that you will be able to better focus on the event you are live streaming and to direct attention to actually producing great live content. Make sure that you have read the article about how to spot instability issues on a live encoder. It helps us discover those potential issues that we are not about to fix. Also, if you are not tech savvy yourself, team up with someone who is.

You might not have experienced system failures on your encoder, so why worry about system stability? The thing is that live streaming can put your system under tremendous strain. It turns out that taking in an uncompressed video signal, encoding it and live streaming it, while recording a local version, are tasks that are quite resource intensive. Whether you are using Wirecast, vMix, OBS or closed box solutions from Digital Rapids, Imagine Communication or ATEME, it is of utmost importance that the underlying system is completely stable.

We earlier covered how to stress test a live encoder in order to recognize signs of instability. Now we will take a look at how to address instability issues reported by the stress testing software.

MemTest86 errors

We will start out by having a look at potential errors reported by MemTest86. MemTest86 will report errors as “address errors” and quite often – if not always – such errors come in numbers. Errors are a sign of the RAM being configured incorrectly, not being compatible with the system or simply bad memory sticks. Check the sticks for their specs and then compare against the settings in the BIOS/UEFI of your motherboard.

Check the QVL

If they are correct, move on check if the memory sticks are listed in the Qualified Vendor List (QVL) provided by the motherboard manufacturer. Sadly, quite often such QVL’s are not maintained very well, which means that not a lot of memory sticks that could potentially by perfectly compatible, are tested with the particular mainboard. What you should take away from this is that your memory is not automatically incompatible if it is not listed – it might simply not have been officially tested. To save yourself the headache of a potential incompatibility issue, it is our recommendation to simply stick to buying memory from the QVL.

Next steps

Now if you find that the memory is indeed compatible with motherboard and that it has been configured correctly, it is safe to assume that the memory is defective. In rare cases though it can also be the motherboard that is faulty. Replace the memory and run the test again to make sure that the issues are no longer present. If they are, replace the motherboard.

Understanding Prime95 errors

When stress testing in Prime95 you might see errors like the following:

“rounding was 0.5 less than 0.4”

“hardware failure detected consult stress.txt file”

Those errors are often caused by incorrect configuration of the CPU (e.g. frequency set too high, or voltage levels set too low). It might also suggest incorrect configuration of RAM or a faulty motherboard. We recommend testing extensively in MemTest86 to rule out that the memory is causing the errors.

Next steps

Start by ensuring that the CPU is set according to specs in the BIOS/UEFI of your motherboard. If it is, then replace it with another. If the issue prevails, replace your motherboard. At this phase you have most likely corrected the error but if you haven’t the issue might relate to a defective PSU that fails to deliver stable power.

How to troubleshoot a “Bluescreen of death” (BSOD)

The cause of a BlueScreen can often be pin-pointed by looking at when it happens. If it happens during stress testing session in Prime95, it is likely to be caused by malfunctioning CPU, RAM or motherboard. If it doesn’t happen in Prime95 but instead happens consistently in FurMark or Dota 2, then it is likely related to a malfunctioning graphics card or a bad driver.

Error codes

Error codes provides a great way to narrow down what is causing the BlueScreen. It is also often the only way of troubleshooting if the BSOD happens in general usage scenarios, and not during a stress test An error code will look like the following:

  • BSOD 0x0000007B
  • BSOD 0x00000001
  • BSOD 0x00000024
  • BSOD 0x000000116

Make sure that Windows is set to not restart after a BlueScreen. You do this by following these steps that are largely the same on Windows 7 through 10:

  1. Click Start.
  2. Right click Computer. Select Properties.
  3. Select Advanced system properties to the left.
  4. In the tab Advanced you click Start up and recovery and then Settings.
  5. On System error disable Automatic restart .

These settings will give you time to note the error code and the relevant driver. Searching Google for those two will often help you identifying what is causing the issue, and to find out what you should do to troubleshoot it. If you are using a closed system, like a laptop or a prebuilt laptop by a known manufacturer, you can Google the error code together with name of the model, eg. “0x0000007B dell inspiron 2200”. You will often see that others have experienced very similar errors.

Performance issues on a live encoder can be caused by limited heat dissipation

If you experience that the CPU og GPU frequency in consistenly lowered while under heavy load, it is often a sign of inadequate cooling. If your live encoder genereates more heat than the cooling can cope with, the speed of the components will be lowered to protect the system. That can be a serious problem, because you might have tested something like a Wirecast project to put 60 % load on the CPU. If the CPU-speed is then throttled down from 3000 MHz to 1800 MHz, then that load will rise above what is recommended. Don’t try to disable such throttling –  instead improve the cooling. That might not be possible in a laptop where you might need to replace it, but you have plenty of options of aftermarket coolers and fans if you are using a workstation.

Copenhagen Streaming are ready to help with your live-encoder

We have many years of experience in building live encoders and we are constantly stress testing encoders for our self and our partners. If you need our help testing your live encoder, or assistance in building one, do not hesitate to contact us. Use the below contact form to reach out to us.



About the author and Copenhagen Streaming:

Johan is a live streaming expert and have built and stress tested countless of encoders. Copenhagen Streaming is a video production bureau with focus on live streaming, corporate video and og video content tailored to sociale platforms. Copenhagen Streaming has more than  20 years of experience with live streaming for companies like Danske Bank and Danmarks Radio.

MemTest86

Fejl i MemTest86. Copenhagen Streaming benytter MemTest86 til at stabilitetsteste RAM i live-encodere. Billedet viser MemTest86+ (plus) som tidligere har været meget populær, men desværre ikke længere vedligeholdes.

An example of an error in MemTest86. The red lines are showing fault addresses in the memory modules.

Prime95

Fejl i Prime95. Prime95 et af vores favoritværktøjer til at stabilitetsteste encodere. Det stresser både CPU, RAM og bundkort.

An example of a fatal error in Prime95.

Furmark

Fejl i FurMark vises ofte som artifacts. Det er ofte et tegn på forkerte settings på grafikkortet, men kan også vise sig ved hardwarefejl.

The colored artifacts represents a faulting graphics card. In this case the maker has decided to factory overclock a graphics card, resulting in instability.

Dota 2

Dota 2 udmærker sig ved at afsløre ustabilitet på driver-niveau, og I Copenhagen Streaming ser vi det som en glimrende måde at komme lidt væk fra arbejdet på en ganske effektiv måde.

BlueScreen's på en live-encoder viser sig ofte pga. driver ustabilitet. I tilfældet her refererer nvlddmkm.sys til en ustabil Geforce driver-version.

Games are great at testing graphics cards.

In this case Dota 2 is failing which is the results of a driver error. The BSOD is related to nvlddmkm.sys. A Google search shows that this file is from the Nvidia Geforce drivers package. Downgrading to an earlier version of the driver fixed the issue.

 

Efter en BlueScreen vil Windows vise denne dialogboks på encoderen.

When Windows recovers after a BSOD you will see a message like this.