论文信息 - INFORMATION-THEORETICCONTENT SELECTIONFOR AUTOMATED HOME VIDEOEDITING

INFORMATION-THEORETICCONTENT SELECTIONFOR AUTOMATED HOME VIDEOEDITING

Inautomated homevideo editing, selecting outthemostin- formative contents fromtheredundant footage ischalleng- ing.Thispaper proposes aninformation-theoretic approach tocontent selection byexploring thedependence relations be- tweenwho(characters) andwhere(scenes) inthevideo. First thefootage issegmented into basic units about thesamechar- acters atthesamescene. Tocompactly represent thedepen- dencerelations between scenes andcharacters, contingency table isusedtomodeltheir co-occurrence statistics. Suppose thecontents about whichcharacters atwhichscene aredomi- nating bytworandom variables, anoptimal selection criterion isproposed based onjoint entropy. Toimprove thecomputa- tionefficiency, apruned N-Bestheuristic algorithm ispre- sented tosearch themostinformative video units. Experi- mental results demonstrated theproposed approach isflexible andeffective forautomated content selection. Personal video authoring hasbeenamajor research issue for multimedia content indexing andmanagement. Existing au- tomated homevideo editing systems aregenerally composed ofthree mainsteps: analysis, selection, andcomposition (1). Selection, whosetarget istopick outthemostinformative and important contents fromthelongandredundant rawfootage, istheprimary andchallenging step torelease end-users from themanual-labored editing work. Forprofessional edited video, numerous attempts have beenmadeovertheyears tofind themostrepresentative con- tents fromalarge collection. Forinstance, applying Singular Value Decomposition torefine thefeature space andtocreate newsreport summary(2); using atemporal graph tomodel thedynamic evolution ofvideo stream andtoproduce car- tonhighlights (3); defining autility modelabout audio-visual complexity andgrammar forfilmskimming (4). Forunpro- fessional homevideo, although Lienhart haspointed outin earlier years that thegoodquality rules include balanced cov- erage, shortened shots, focused selection, andvariable edit- ingpatterns (5), further investigations weredeployed more recently. Forinstance, Huaetal.proposed anon-linear pro- gramming algorithm toformulate theproblem ofattentional highlight selection (1), Zhaoetal.combined audio andvisual cuestogether topickoutevent about laughter, applause, and scream astheabstraction results (6). Selecting themostimportant andinformative content from homevideo ismuchharder thanfromprofessional video like sports, newsormovie, ashomevideo suffers fromtheloose structure andabsence ofstoryline. Basically, theselection re- sult isdetermined bythree keyissues: i)selected unit, shots inhomevideo areusually toolongwithlots ofredundancy, while sub-shots still lack ofclear andagreed definition (1); ii) optimized criteria, whatcontent ismoreimportant thanothers greatly depends onthesubjective judgment, while picking out themostinformative content still lackoftheoretical solution morethan assessment rules (5); andiii) implemented strategy, without pair-wise comparison anditerative search, itishard totell howlongandwhichcontent should beincluded even forend-users (3). Withinvestigation, wefoundthat forissue i)end-users generally areinterested inwhomandwhereinvolved inthe video, therefore itisacceptable totreat thecontents about thesamecharacters atthesamesceneastheselected units without loss ofgenerality; ii) mostpersonal video capturer wouldlike thecontents about every character atevery scene becompletely anduniformly included, leaving special request behind, thus itisreasonable tomeasure theredundancy ofin- formation astheoptimized criterion; iii) finding themostin- formative content through pair-wise comparison anditerative search isacombinatorial computational problem, accordingly itisappropriate todesign aheuristic search algorithm toap- proximate theuserselection behavior. Inthis paper, wepropose anovel approach tocontent se- lection forautomated homevideo editing. Theapproach pro- ceeds bysegmenting therawfootage into scenes taken atthe sameplace, whereeachsceneisfurther segmented into sub- scenes capturing thesamecharacters. Basedonthese basic units, werepresent thedependence relations between char- acters andscenes bymodeling their co-occurrence statistics withacontingency table. Thenbytreating whomatwhere inthevideo aredominated bytworandomvariables, weap- plyjoint entropy tomeasure theredundancy ofinformation conditional onasetofvideo units. Finally, toimprove the

Jianguo Li | Yimin Zhang

[1] Lie Lu,et al. AVE: automated home video editing , 2003, ACM Multimedia.

[2] Xin Liu,et al. Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).