32-PARALLELSADTREE HARDWIRED ENGINEFOR VARIABLEBLOCK SIZEMOTION ESTIMATIONINHDTV1080PREAL-TIMEENCODINGAPPLICATION

memorypartitions than Propagate Partial SADandSADTreecounterparts. When256PEsareconfigured, Propagate Partial SADhas H.264/AVC coding standard incorporates variable block size (VBS) themostefficient datapath, soitissuitable forlowandmiddle resmotion estimation (ME)toimprove thecompression efficiency For olution video sequences. ForHDTVapplications, themassive comHDTV-1080p latin th me c ion andhugemem- putation makes thehigh degree ofparallelism essential. Forexamorybandwidth bythelarge video frame size andthewidesearchple, inref. [5], 8sets ofSADTreearerequired forthereal-time range aretwocritical impediments tothereal-time hardwired VB- HDTV72Op encoder whentheclock speed is108MHz. Because the SMEengine design. Inthis paper, wepresent sixtechniques tocir- HT70 noe hntecoksedi 0M z eas h SME reference pixel registers canbeshared byadjacent SADTrees, paralcumvent these difficulties. First, theinter modesbellow 8x8are lel SADTree hasless chip area thanparallel Propagate Partial SAD. eliminated inourdesign toreduce thehardware cost. Second, the However, incomparison withupper twocounterparts, theoriginal low-pass filter based 4:1down-sampling algorithm successfully re- parallel SADTreearchitecture hassomedrawbacks. First, its critical duces about 75%arithmetic computation ineachsearch position, pathislonger thanPropagate Partial SADandParallel Sub-Tree, Third, thecoarse tofine search scheme ismadeuseoftoreduce 25o- whichprevents itfromworking athigh clock frequency. Second, 50osearch candidates. Fourth, C+memoryorganization isadoptedfrom the analysis ofsection 3,weobserve thatits snake scanmethod toreduce theexternal IObandwidth. Fifth, horizontal zigzag scan will cause thedilemma oflarge partitions andlowIOutilization for modeoptimizes thesearch window memories. Finally, incircuit de- thesearch windowmemories. sign, 4:2compressor based CSAtree, multi-cycle path delay and2 Ourdesign target isHDTV1080p@30Hz with192x128search pipeline stage SADtree techniques areutilized toimprove thespeed range andonereference frame. Withexhaustive integer motion esandreduce thehardware ofeachSADtree. Thehardwired inte-timation (IME) search algorithm, thecomputation complexity is6.8 germotion estimation (IME)engine with192x128search range times ofthedesign of[5]. Thestraightforward implementation is forHDTVl080p@~-i,3OHz