TechnoCurmudgeon's Lair

Home   Wailings   Journal   Tech Larn'in   Links  


Wailings

Here's where pontification comes into play.

Date/Index Topic
Microsoft Microsoft Woes
Industrial Site Review Thoughts on Industrial Website Design. Website Reviews.
Mind Melt Mind-melting Documentation
UEM Index Page Useless Error Messages
05/06/2000 Hell in a Hand Basket (a modern fable)
04/17/2000 The State of Industrial Controls

 

 

 

 

 

 

 

05/06/2000 - Hell in a Hand Basket
                      (a modern fable)

Different system; same systemic problems.

In plastics processing, a 'blender' is used to ratiometrically mix materials - virgin, and color concentrate pellet stock, potentially other additives, and reclaimed materials - and feed them into an extruder. A typical blender system can run in either volumetric mode (where the speed of the conveyor augers or vibrators is set in ratio), or gravimetric mode (where each station is weighed with a load cell, and the loss in weight of each station is set in ratio).

In this particular fable, the names have been changed to protect the innocent, and to keep the litigious faction of society at bay.

I'm among those attempting to find out why one such blender is shutting itself down seemingly at random, and hence stopping an entire process line. 

This system uses a computer controller, digital I/O cards, load cell (strain gauge) -to-serial communication modules, and auger drive boards - all of which are considered proprietary. The user interface consists of several 16 pad keyboards, and a multi-line VF display mounted on the CPU board. 

There are three different comm networks from the CPU - the keyboards, and digital I/O are on one I2C link, a second I2C link goes to the drive motor controllers, and a third serial link called the 'global' comm - I'm not quite sure what it really is - goes to the weigh boards.

I've learned through experience that the load cell modules, I/O cards, and keyboards are OK, and don't give much in the way of trouble. The CPU isn't bad, either, except that load cell calibration, and various setpoint data is stored in battery backed SRAM, and there's no way of transferring it to a replacement CPU when one is swapped. This makes for extra work - the stations must be drained, then a full calibration performed, and all the setpoint information re-entered - but isn't overwhelming. 

However, the motor drive boards don't appear to have any current limiting, and chances of the traces burning up rather than the armature fuse cutting loose is on the order of 1 in 8 (or worse) when the motor goes into a locked rotor condition. 

They have several other flaws, but one that has some currency in our situation is, if they lose the 120 VAC supply (even for a little while), the communications network between them and the master controller hangs up. There are probably other phenomena that can cause the same result. The computer apparently doesn't check for this condition because, using the 'Mind of God' troubleshooting approach, one would reasonably assume if the computer knew, it would attempt to re-establish contact, and, if it could not reconnect, would generate and display a suitable error message (i.e. - "drive board #2 comm fault").

What we have is a system that runs OK for anywhere from several minutes to perhaps a day before locking up. We see that the watchdog LEDs on the keyboards and CPU stop functioning, so suspect it's a comm error. 

If the system hasn't locked up with this symptom we can disconnect the comm line going from the keyboards, and digital I/O from the CPU, and communications are  re-established immediately after plugging it back in. However, this is ineffective after the lockup symptom has manifested itself.

The system has been run in volumetric mode with the comm line to the load cell boards disconnected, and the symptom persisted, so this is unlikely to be relevant to the problem.

We've rewired the cabinet to conform more closely with standard practices (it was originally a mass of daisy-chained grounds), installed a CVT on the control voltage side, and now use the original standard duty control transformer to feed only the drive controllers. Other steps too numerous to mention here have also been taken.

Nothing we have done so far has significantly affected the core shutdown symptom. We have reworked part of the control circuit so we can monitor the watchdog status using a brick PLC, and drop power to the CPU to reboot it in the event of a stuck watchdog condition. This works, but now the proportion of "weird things" has increased - receiving messages pointing to load cell overload (with what sometimes appears to be an actually high weight, and other times not).

This evening I had a conversation regarding some of the symptoms with the blender company's system programmer, and learned a number of facts that aren't mentioned at all in the documentation, but if they had been, would have made understanding the some of these (and other) symptoms, and possible solutions clear to me.

One of these things was the CPU needs a certain amount of time to 'settle' into a defined state after power up, for example, to establish communications with the load cell boards. The PLC monitor (programmed by another individual at the blender company) wasn't programmed to do this, and he suspects that waking the CPU up with the 'run' input enabled is causing the new symptoms.

When I mentioned that he should write up a white paper explaining basic system operation (both for end users, and for others within his company) I was rebutted with the comment (paraphrased) that he only wrote the control code, and that the board manufacturer didn't think such documentation was necessary, or allow him to disseminate it.

Huh?

This is the kind of crap that is annoying me to the point of hysterical screaming laughter, fits of crying rage, and a flub-a-dub-dub of the lips.

I don't want to know secret information ... particular algorithms used to calculate mass flow, for example, but just a detailed overview of the system to make effective troubleshooting possible. 

I'd settle for the commented C code (sans any aforementioned 'secrets'), and puzzle out the details myself. In any case, with the exception of things like the calculation of mass flow in the above example, there aren't any secrets! 

The small display used on this system allows the user to see only one point of information at a time, so it is next to impossible for the user to develop a 'real time' sense of what the system is doing. For instance, ideally a user should be able to see at least the following values simultaneously for all stations,

and preferentially have access to trend graphs for these things, in order to see how they vary over time.

To be fair, the control system in question is the 'budget' offering, so I don't expect any bells and whistles - in fact, given the display hardware, they can't do much more than they already do. I do, however, expect it to work reliably!

The thing is, if the manufacturer is unwilling to adequately document their system, and intrinsically the system cannot present a cogent view of the process to the end user, then that end user is in a bad place.

 

04/17/2000 - The State of Industrial Controls

And it's a sorry state, indeed. 

Here's how it used to be ...

It was a mixed blessing, because, due to the limited potential of relay logic, advanced functions like 'self-diagnostic' logic to trap errors (intermittent e. stop or guard door opens, among a host of other things) couldn't be done in any practical way, so you didn't expect to see them.

On the other hand, if the troubleshooter had a grip on basic relay logic, and the nature of the limit switches, and other components used in the system it was possible, if not easy, to troubleshoot without documentation. It was all there in front of him, so (although wire pulling, tracing using a meter, etc. is ugly and time consuming) he could develop an understanding of how the parts all worked together.

Currently, we seem to be in a 'middle world' between what traditional relay logic was, and what PLC/computer/etc. 'high tech' tools promise to become.

I was working on a piece of equipment today that has partially incorrect electrical prints, partially complete component documentation, and an essentially undocumented control system software. In truth, many of the control systems shipped with OEM built equipment (many different types of equipment, mind you) seem to have been built with the same philosophy in mind.

I don't have any meta-answers, and this thread of thought needs to be continued, but not tonight ... I have to come up with some specific answers to what it was I debugged today!

My belief is, with computers and PLCs being the logic of newer control systems,  there needs to be a concomitant increase in the amount and quality of process and diagnostic information presented to the end user.


Created and maintained by Bob Welker for his own personal amusement. All trademarks, and so on that appear belong to their respective owners. None of the information contained within is guaranteed in any way.

Original work copyright 1999-2005 by Robert A. Welker.