Resumen del contenido incluido en la página 1 
                    
                        Application Report 
SPRAA56 – September 2004 
DSP/BIOS Real-Time Analysis (RTA) and Debugging 
Applied to a Video Application 
Brian Jeff DSP Field Software Applications 
Arnie Reynoso Software Development Systems 
ABSTRACT 
DSP/BIOS and the Reference Frameworks allow developers to non-intrusively instrument 
real-time applications. The software provided with this application note applies real-time 
analysis (RTA) services to a working applicationa H.263 encode/decode loopback 
example for the 
                    
                    Resumen del contenido incluido en la página 2 
                    
                        SPRAA56  Figures  Figure 1. Basic Data Flow of the Video Application...................................................................... 4  Figure 2. Detailed Application Data Flow Showing Memory Buffers........................................... 8  Figure 3. Task Partitioning in the Modified Application ............................................................... 9  Figure 4. CPU Load Measurement at Run-Time .......................................................................... 15  Fig
                    
                    Resumen del contenido incluido en la página 3 
                    
                        SPRAA56  Quantization is the process of dividing a continuous range of input values into a finite number of  subranges. Each subrange is assigned a specific output value. The Q factor, or quantization  factor, describes the level of quantization used to store the frequency domain representation of  the encoded image. Q factor often varies dynamically in an encoder when a constant bitrate is  targeted, so it is useful to display the Q factor dynamically with the video stream.  Frame type designat
                    
                    Resumen del contenido incluido en la página 4 
                    
                        SPRAA56    Figure 1 shows a simplified view of the sequential flow of capture, processing, and display tasks  in the application.  Camera   TSK TSK TSK  tskInput tskVideoProcess tskOutput    Device   Device   Driver  Driver  SCOM   Figure 1. Basic Data Flow of the Video Application  Before video data reaches the first stage, it must be converted to digital data, a process that is  managed by the input device driver. Analog video input is converted by an on-board NTSC  decoder chip into a digital
                    
                    Resumen del contenido incluido en la página 5 
                    
                        SPRAA56  2.1 DSP/BIOS and RF5 Components Used  The base application leverages various DSP/BIOS real-time analysis components to support  debugging capabilities that are not intrusive to the system performance. The following three  modules are included with the core DSP/BIOS library, and can be used in any application that  uses DSP/BIOS and on any TI DSP supported by DSP/BIOS:   • LOG  Logging events  • STS  Statistics accumulators  • TRC  Control of real-time capture  In addition to these DS
                    
                    Resumen del contenido incluido en la página 6 
                    
                        SPRAA56  2.1.2 STS  An STS object accumulates the following statistical information about an arbitrary 32-bit wide  data series: count, total, and maximum.   Statistics are accumulated in 32-bit variables on the target DSP and in 64-bit variables on the  host PC. When the host polls the target for real-time statistics, it resets the variables on the  target. This minimizes space requirements on the target, while allowing you to keep statistics for  long test runs.  As part of using the DSP/BIOS 
                    
                    Resumen del contenido incluido en la página 7 
                    
                        SPRAA56  2.2 Requirements for Viewing RTA Benchmarks  In order for any of the DSP/BIOS-based RTA tools to be visible, the DSP/BIOS components in  Code Composer Studio version 2.30 or earlier and version 3.0 require that the applications .cdb  configuration file be accessible and consistent with the executable .out file.  This requirement is easily met during development. It can also be satisfied in demonstrations or  delivered test examples. If you do not want to deliver source code with the ap
                    
                    Resumen del contenido incluido en la página 8 
                    
                        SPRAA56  720x576 YAfter420 y Device D De evi vice ce bitBuf 414 KB 414 KB Driver Dr Driiv ve err 512 KB Buffer B Bu uffe ffer r Yuv Yuv 422to 422to H.263 H.263 3 frames 3 f 3 fr ram ames es Cr 420 CbAfter420 CbArrau 420 enc dec Shared Scratch CrAfter420 Cb 207 KB 207 KB 6 KB 92 KB 1.5 KB scratch1 scratch2 Instance Instance 14 KB = 20 lines memory memory 14 KB Ke Key y In Inte tern rna all M Me em mo ory ry D DM MA A R Read/ ead/W W r riitte ( e (bac back kg gr roun ound) d) E Ex xtte ern rna all
                    
                    Resumen del contenido incluido en la página 9 
                    
                        SPRAA56  if(controlVideoProc.frameRateChanged) {      txMsg.cmd  = FRAMERATECHANGED;     txMsg.arg1 = chanNum;     txMsg.arg2 = controlVideoProc.frameRateTarget;      controlVideoProc.frameRateChanged = FALSE;     MBX_post( &mbxProcess, &txMsg, 0 );  }  While implementing control via the host PC did not specifically require a separate task in the  modified application, adding a discrete control task makes the application more scalable. For  example, a user interface or communications link from a
                    
                    Resumen del contenido incluido en la página 10 
                    
                        SPRAA56  This call returns a status structure of type IH263ENC_Status that contains the number of bits  sent to the encoder, the frame type, and other data.  The features implemented in the control API can vary widely from one algorithm to another. The  bitrate and frame type measured by this API may not be available with all third-party video  algorithms unless specifically requested. Thus, it is important that the encoder and decoder  algorithms used by your application have the necessary hook
                    
                    Resumen del contenido incluido en la página 11 
                    
                        SPRAA56  4 RTA Techniques for Performance Measurement  The RTA techniques described in this section are largely application-specific calls to DSP/BIOS  RTA services via APIs in the run-time code. These API calls can be added to any application  without modifying its logical structure.  In the case of the video application, performance overhead of the RTA tools is expected to be  minimal because the calls are made at the frame rate of 30 or 25 Hz, or even in some cases  every 30 or 25 frames, a v
                    
                    Resumen del contenido incluido en la página 12 
                    
                        SPRAA56  4.2 Measuring Task Scheduling Latencies  Scheduling latency is defined as the time between a wakeup signal (semaphore post) to a  pending task and the actual start of that task's execution.  DSP/BIOS provides a mechanism for measuring scheduling latency with the TSK_settime and  TSK_deltatime APIs. These functions accumulate the difference in time from when a task is  made ready to the time TSK_deltatime is called. The placement of the TSK_deltatime API  therefore determines what is act
                    
                    Resumen del contenido incluido en la página 13 
                    
                        SPRAA56  The low-resolution CLK_getltime API is used instead of the high-resolution CLK_gethtime  because the range of the latency is known to be on the order of one or more frame times, where  a frame time is 33.33 ms in NTSC systems. The low-resolution timing measurement provided by  CLK_getltime is more cycle efficient and is in milliseconds. Since the data is displayed in  milliseconds, the lower-resolution time base results in a faster measurement, with sufficient  accuracy for the latency 
                    
                    Resumen del contenido incluido en la página 14 
                    
                        SPRAA56  last30frame.current = CLK_getltime();    // check to see if we dropped any frames  benchVid.framesDropped.current = last30frame.current - last30frame.previous;  benchVid.framesDropped.current -= 1000*(frameCnt / DISPLAYRATE);    benchVid.framesDropped.current /= DISPLAYRATE;    last30frame.previous = last30frame.current;    if (benchVid.framesDropped.current > 0 && frameRateTarget == DISPLAYRATE ) {      LOG_error("Dropped %d frames", benchVid.framesDropped.current);      UTL_logDebug2(
                    
                    Resumen del contenido incluido en la página 15 
                    
                        SPRAA56      ‘minloop’ (in units of ~ cycles)  ‘count’ is # hits of  t0 t1 t0 t1  LOAD_idlefxn in the  window  Window = 500ms (default) IDL load  100 – IDLload gives  App CPU Load  cpuload = (100 - ((100 * (count * minloop)) / total))    Figure 4. CPU Load Measurement at Run-Time  The LOAD module relies on an IDL thread to be inserted in an application to calibrate the  amount of time needed to run a single iteration of the DSP/BIOS idle loop. It estimates the CPU  load by dividing the idled tim
                    
                    Resumen del contenido incluido en la página 16 
                    
                        SPRAA56  In video applications that handle the full resolution of 720x480, each from contains about 675 KB  of data. Such applications must constantly move video frames from internal working memory  buffers to external frame buffers and back. This often results in several MB of memory transfers  through the external bus for each frame. At 30 frames per second, the memory transfer  bandwidth requirement can be a significant CPU resource requirement. As resolutions increase  to high-definition siz
                    
                    Resumen del contenido incluido en la página 17 
                    
                        SPRAA56  These estimates are fairly accurate for the color conversion functions in the input and display  tasks, but the estimates are less accurate for the encoder and decoder algorithms in the  processing task. Ideally, the memory bus utilization should be available in the status structure or  estimated on the data sheet of an algorithm. It is recommended that you request this information  from third-party algorithm providers during application development, particularly for applications  above
                    
                    Resumen del contenido incluido en la página 18 
                    
                        SPRAA56  Most current encoders use three primary frame types: Intracoded frames, Predicted frames, and  Bidirectional predicted frames. These are referred to as I, P, and B frames. The H.263 encoder  supplied with the example application encodes I and P frames only, but you can configure the  ratio of I to P frames. Often this ratio is used in the quality vs. bitrate tradeoff. The H.263  encoder has hooks to allow for monitoring or selecting the frame type. This example application  only monitor
                    
                    Resumen del contenido incluido en la página 19 
                    
                        SPRAA56  The benchmarking routines send out selected benchmark data at a prescribed interval: every  th 30 frame, every I (Intracoded) frame, or only on a dropped frame. The interval can be selected  by controlling the .rtaMode variable within the control structure.  Benchmark data is transmitted to the CCStudio on the host PC via RTDX (Real-Time Data  eXchange), which is used behind the scenes by the DSP/BIOS RTA tools. RTDX allows Code  Composer Studio to read from or write to target buffers i
                    
                    Resumen del contenido incluido en la página 20 
                    
                        SPRAA56  The application supplied with this note references board support software and libraries installed  with the DM642 EVM. The project options assume this software is installed in  $TI_DIR$\boards\evmdm642.  The project also references the H.263 encoder algorithm, which is provided as object code with  the DM642 EVMs Board Support Package. Therefore, that package and all its associated  components must be installed before running or building the supplied example as delivered.  Tconf script