Items of some interest:

These are my recent Pin​board​.in links:

  • Nicholas Rombes: Punk | berfrois

    “Most iron­i­cally, being based in the hope­lessly lost cul­tural void of Ann Arbor, a noto­ri­ous mecca for the last sur­viv­ing rem­nants of the pseudo-​​intellectual street peo­ple move­ment that said much and accom­plished little…”

    punk history-​​is-​​a-​​feature-​​not-​​a-​​bug cultural-​​dynamics ha-​​ha-​​only-​​semiserious
  • [1112.5309] POWERPLAY: Train­ing an Increas­ingly Gen­eral Prob­lem Solver by Con­tin­u­ally Search­ing for the Sim­plest Still Unsolv­able Problem

    An amus­ing col­lec­tion of what seem to be half-​​remembered ideas gleaned from his visit to the GPTP work­shop in Ann Arbor two years ago, pre­sented as his own inven­tions and with­out cita­tion or men­tion of the dozen peo­ple who actu­ally do this work. His keynote, as I remem­ber it, essen­tially revolved around him point­ing out how influ­en­tial his work should have been all along, if only we had both­ered to cite him as we should have done, because he thought up the core con­cepts of genetic pro­gram­ming well before any of us claimed we had. This is pretty much a camel’s-back straw for me. If there is a bet­ter argu­ment for com­pletely boy­cotting the cita­tion sys­tem and rely­ing on per­sonal asso­ci­a­tion and named schools rather than pub­li­ca­tion, I have not yet encoun­tered it. So remem­ber poor oppressed grad­u­ate and post­doc kids: when I cite your work by sim­ply nam­ing you per­son­ally, and not your advi­sor or your insti­tu­tion, and not even your pub­li­ca­tion or jour­nal but merely YOU PERSONALLY, it’s because you per­son­ally deserve the credit, not any of those other leeches. Got that?

    now-​​this-​​really-​​pisses-​​me-​​off-​​to-​​no-​​end
  • [1203.0856] Online Dis­crim­i­na­tive Dic­tio­nary Learn­ing for Image Clas­si­fi­ca­tion Based on Block-​​Coordinate Descent Method

    “Pre­vi­ous researches have demon­strated that the frame­work of dic­tio­nary learn­ing with sparse cod­ing, in which sig­nals are decom­posed as lin­ear com­bi­na­tions of a few atoms of a learned dic­tio­nary, is well adept to recon­struc­tion issues. This frame­work has also been used for dis­crim­i­na­tion tasks such as image clas­si­fi­ca­tion. To achieve bet­ter per­for­mances of clas­si­fi­ca­tion, experts develop sev­eral meth­ods to learn a dis­crim­i­na­tive dic­tio­nary in a super­vised man­ner. How­ever, another issue is that when the data become extremely large in scale, these meth­ods will be no longer effec­tive as they are all batch-​​oriented approaches. For this rea­son, we pro­pose a novel online algo­rithm for dis­crim­i­na­tive dic­tio­nary learn­ing, dubbed textbf{ODDL} in this paper. First, we intro­duce a lin­ear clas­si­fier into the con­ven­tional dic­tio­nary learn­ing for­mu­la­tion and derive a dis­crim­i­na­tive dic­tio­nary learn­ing prob­lem. Then, we exploit an online algo­rithm to solve the derived prob­lem. Unlike the most exist­ing approaches which update dic­tio­nary and clas­si­fier alter­nately via iter­a­tively solv­ing sub-​​problems, our approach directly explores them jointly. Mean­while, it can largely shorten the run­time for train­ing and is also par­tic­u­larly suit­able for large-​​scale clas­si­fi­ca­tion issues. To eval­u­ate the per­for­mance of the pro­posed ODDL approach in image recog­ni­tion, we con­duct some exper­i­ments on three well-​​known bench­marks, and the exper­i­men­tal results demon­strate ODDL is fairly promis­ing for image clas­si­fi­ca­tion tasks.”

    image-​​analysis image-​​segmentation algo­rithms nudge-​​targets
  • [1203.3271] The ther­mo­dy­nam­ics of prediction

    “A sys­tem respond­ing to a sto­chas­tic dri­ving sig­nal can be inter­preted as com­put­ing, by means of its dynam­ics, an (implicit) model of the envi­ron­men­tal vari­ables. The system’s state retains infor­ma­tion about past envi­ron­men­tal fluc­tu­a­tions, and a frac­tion of this infor­ma­tion is pre­dic­tive of future ones. The remain­ing non­pre­dic­tive infor­ma­tion reflects model com­plex­ity that does not improve pre­dic­tive power, and rep­re­sents the inef­fec­tive­ness of the model. We expose the fun­da­men­tal equiv­a­lence between this model inef­fi­ciency and ther­mo­dy­namic inef­fi­ciency, mea­sured by the energy dis­si­pated dur­ing the inter­ac­tion between sys­tem and envi­ron­ment. Our results hold arbi­trar­ily far from ther­mo­dy­namic equi­lib­rium and are applic­a­ble to a wide range of sys­tems, includ­ing bio­mol­e­c­u­lar machines. They high­light a pro­found con­nec­tion between the effec­tive use of infor­ma­tion and effi­cient ther­mo­dy­namic oper­a­tion: any sys­tem con­structed to keep mem­ory about its envi­ron­ment and to oper­ate ener­get­i­cally effi­ciently has to be predictive.”

    mod­el­ing philosophy-​​of-​​science information-​​theory physics ther­mo­dy­nam­ics talking-​​about-​​a-​​model-​​is-​​a-​​model pragmatism-it-ain’t
  • [1203.3434] On the Impact of Infor­ma­tion Tech­nolo­gies on Soci­ety: an His­tor­i­cal Per­spec­tive through the Game of Chess

    “The game of chess as always been viewed as an iconic rep­re­sen­ta­tion of intel­lec­tual prowess. Since the very begin­ning of com­puter sci­ence, the chal­lenge of being able to pro­gram a com­puter capa­ble of play­ing chess and beat­ing humans has been alive and used both as a mark to mea­sure hardware/​software pro­gresses and as an ongo­ing pro­gram­ming chal­lenge lead­ing to numer­ous dis­cov­er­ies. In the early days of com­puter sci­ence it was a topic for spe­cial­ists. But as com­put­ers were democ­ra­tized, and the strength of chess engines began to increase, chess play­ers started to appro­pri­ate to them­selves these new tools. We show how these inter­ac­tions between the world of chess and infor­ma­tion tech­nolo­gies have been her­ald of broader social impacts of infor­ma­tion tech­nolo­gies. The game of chess, and more broadly the world of chess (chess play­ers, lit­er­a­ture, com­puter soft­wares and web­sites ded­i­cated to chess, etc.), turns out to be a sur­pris­ingly and par­tic­u­larly sharp indi­ca­tor of the changes induced in our every­day life by the infor­ma­tion tech­nolo­gies. More­over, in the same way that chess is a mod­eliza­tion of war that cap­tures the raw fea­tures of strate­gic think­ing, chess world can be seen as small soci­ety mak­ing the study of the infor­ma­tion tech­nolo­gies impact eas­ier to ana­lyze and to grasp.”

    touch­stones his­tory algo­rithms history-​​of-​​science computer-​​science
  • Share Books | berfrois

    “Libraries are a recog­ni­tion that schol­ar­ship and cul­ture are more than the busi­ness of cre­at­ing and con­sum­ing. They are a human con­ver­sa­tion, and libraries pro­vide com­mon ground where that con­ver­sa­tion can take place and be remem­bered. By tak­ing aim at the right for the pub­lic to main­tain this con­ver­sa­tion and its mem­ory, pub­lish­ers have shown us what we have to lose. It’s time we resisted the out­sourc­ing of our com­mon her­itage by occu­py­ing the library.”

    Occupy libraries intellectual-​​property open-​​access public-​​policy activism
  • [1112.3307] Poly­tope Codes Against Adver­saries in Networks

    “Net­work cod­ing is stud­ied when an adver­sary con­trols a sub­set of nodes in the net­work of lim­ited quan­tity but unknown loca­tion. This prob­lem is shown to be more dif­fi­cult than when the adver­sary con­trols a given num­ber of edges in the net­work, in that lin­ear codes are insuf­fi­cient. To solve the node prob­lem, the class of Poly­tope Codes is intro­duced. Poly­tope Codes are con­stant com­po­si­tion codes oper­at­ing over bounded poly­topes in inte­ger vec­tor fields. The poly­tope struc­ture cre­ates addi­tional com­plex­ity, but it induces prop­er­ties on mar­ginal dis­tri­b­u­tions of code vec­tors so that validi­ties of code­words can be checked by inter­nal nodes of the net­work. It is shown that Poly­tope Codes achieve a cut-​​set bound for a class of pla­nar net­works. It is also shown that this cut-​​set bound is not always tight, and a tighter bound is given for an exam­ple network.”

    cryp­tog­ra­phy pri­vacy algo­rithms nudge-​​targets network-​​theory com­mu­ni­ca­tion
  • [1203.3353] Solv­ing Struc­ture with Sparse, Randomly-​​Oriented X-​​ray Data

    “Single-​​particle imag­ing exper­i­ments of bio­mol­e­cules at x-​​ray free-​​electron lasers (XFELs) require pro­cess­ing of hun­dreds of thou­sands (or more) of images that con­tain very few x-​​rays. Each low-​​flux image of the dif­frac­tion pat­tern is pro­duced by a sin­gle, ran­domly ori­ented par­ti­cle, such as a pro­tein. We demon­strate the fea­si­bil­ity of col­lect­ing data at these extremes, aver­ag­ing only 2.5 pho­tons per frame, where it seems doubt­ful there could be infor­ma­tion about the state of rota­tion, let alone the image con­trast. This is accom­plished with an expec­ta­tion max­i­miza­tion algo­rithm that processes the low-​​flux data in aggre­gate, and with­out any prior knowl­edge of the object or its ori­en­ta­tion. The ver­sa­til­ity of the method promises, more gen­er­ally, to rede­fine what mea­sure­ment sce­nar­ios can pro­vide use­ful sig­nal in the high-​​noise regime.”

    structural-​​biology image-​​analysis crys­tal­log­ra­phy algo­rithms inverse-​​problems nudge-​​targets sta­tis­tics
  • [1203.3203] An effi­cient algo­rithm for gen­er­at­ing AoA networks

    “The activ­i­ties, in project sched­ul­ing, can be rep­re­sented graph­i­cally in two dif­fer­ent ways, by either assign­ing the activ­i­ties to the nodes ‘AoN’ directed acyclic graph (dag) or to the arcs ‘AoA dag’. In this paper, a new algo­rithm is pro­posed for gen­er­at­ing, for a given project sched­ul­ing prob­lem, an Activity-​​on-​​Arc dag start­ing from the Activity-​​on-​​Node dag using the con­cepts of line graphs of graphs.”

    sched­ul­ing operations-​​research algo­rithms graph-​​theory
  • [1203.3341] A Com­par­i­son of Multi-​​Parametric Pro­gram­ming, Mixed-​​Integer Pro­gram­ming, Gra­di­ent Descent Based, and the Embed­ding Approach on Four Pub­lished Hybrid Opti­mal Con­trol Examples

    “…Com­mon mis­con­cep­tions regard­ing the embed­ding approach are addressed includ­ing whether or not it results in an aver­age value con­trol model (no), is nec­es­sary to “tweak” the algo­rithm to get bang-​​bang solu­tions (no), requires infi­nite switch­ing (no), has real-​​time capa­bil­ity (yes), or reduc­tion to a clas­si­cal non­lin­ear opti­miza­tion prob­lem (a desir­able yes).”

    control-​​theory operations-​​research algo­rithms numerical-​​methods philosophy-​​of-​​engineering design-​​patterns nudge-​​targets
  • [1203.3270] Extrac­tion of Facial Fea­ture Points Using Cumu­la­tive Histogram

    “This paper pro­poses a novel adap­tive algo­rithm to extract facial fea­ture points auto­mat­i­cally such as eye­brows cor­ners, eyes cor­ners, nos­trils, nose tip, and mouth cor­ners in frontal view faces, which is based on cumu­la­tive his­togram approach by vary­ing dif­fer­ent thresh­old val­ues. At first, the method adopts the Viola-​​Jones face detec­tor to detect the loca­tion of face and also crops the face region in an image. From the con­cept of the human face struc­ture, the six rel­e­vant regions such as right eye­brow, left eye­brow, right eye, left eye, nose, and mouth areas are cropped in a face image. Then the his­togram of each cropped rel­e­vant region is com­puted and its cumu­la­tive his­togram value is employed by vary­ing dif­fer­ent thresh­old val­ues to cre­ate a new fil­ter­ing image in an adap­tive way. The con­nected com­po­nent of inter­ested area for each rel­e­vant fil­ter­ing image is indi­cated our respec­tive fea­ture region. A sim­ple lin­ear search algo­rithm for eye­brows, eyes and mouth fil­ter­ing images and con­tour algo­rithm for nose fil­ter­ing image are applied to extract our desired cor­ner points auto­mat­i­cally. The method was tested on a large BioID frontal face data­base in dif­fer­ent illu­mi­na­tions, expres­sions and light­ing con­di­tions and the exper­i­men­tal results have achieved aver­age suc­cess rates of 95.27%.”

    image-​​segmentation image-​​analysis face-​​recognition algo­rithms nudge-​​targets
  • [1203.3284] Effi­cient Enu­mer­a­tion of the Directed Binary Per­fect Phy­lo­ge­nies from Incom­plete Data

    “We study a character-​​based phy­logeny recon­struc­tion prob­lem when an incom­plete set of data is given. More specif­i­cally, we con­sider the sit­u­a­tion under the directed per­fect phy­logeny assump­tion with binary char­ac­ters in which for some species the states of some char­ac­ters are miss­ing. Our main object is to give an effi­cient algo­rithm to enu­mer­ate (or list) all per­fect phy­lo­ge­nies that can be obtained when the miss­ing entries are com­pleted. While a sim­ple branch-​​and-​​bound algo­rithm (B&B) shows a the­o­ret­i­cally good per­for­mance, we pro­pose another approach based on a zero-​​suppressed binary deci­sion dia­gram (ZDD). Exper­i­men­tal results on ran­domly gen­er­ated data exhibit that the ZDD approach out­per­forms B&B. We also prove that count­ing the num­ber of phy­lo­ge­netic trees con­sis­tent with a given data is #P-​​complete, thus pro­vid­ing an evi­dence that an effi­cient ran­dom sam­pling seems hard.”

    phy­lo­ge­net­ics inverse-​​problems genet­ics algo­rithms sta­tis­tics nudge-​​targets
  • [1203.0879] Design­ing and using prior knowl­edge for phase retrieval

    “In this work we develop an algo­rithm for sig­nal recon­struc­tion from the mag­ni­tude of its Fourier trans­form in a sit­u­a­tion where some (non-​​zero) parts of the sought sig­nal are known. Although our method does not assume that the known part com­prises the bound­ary of the sought sig­nal, this is often the case in microscopy: a spec­i­men is placed inside a known mask, which can be thought of as a known light source that sur­rounds the unknown sig­nal. There­fore, in the past, sev­eral algo­rithms were sug­gested that solve the phase retrieval prob­lem assum­ing known bound­ary val­ues. Unlike our method, these meth­ods do rely on the fact that the known part is on the bound­ary. Besides the recon­struc­tion method we give an expla­na­tion of the phe­nom­ena observed in pre­vi­ous work: the recon­struc­tion is much faster when there is more energy con­cen­trated in the known part. Quite sur­pris­ingly, this can be explained using our pre­vi­ous results on phase retrieval with approx­i­mately known Fourier phase.”

    image-​​analysis image-​​processing learn­ing inverse-​​problems algo­rithms nudge-​​targets
  • [1203.3415] A New Approach to Count Pat­tern Motifs Using Com­bi­na­to­r­ial Techniches

    “We pro­posed two new exact algo­rithms to detect net­work motifs of size 3 and 4. Con­sid­er­ing that motifs need to count the iso­mor­phic pat­terns in the orig­i­nal graph $G(V,E)$ and in a set of ran­dom­ized graphs, the fol­low­ing com­plex­i­ties con­cern about count iso­mor­phic pat­terns in a sin­gle graph. Let $m=|E|$ and let $a(G)$ be the arboric­ity of $G$. Assume $|E|geq|V|$. We describe a $O(a(G)m)$ time com­plex­ity algo­rithm to count iso­mor­phic pat­terns of size 3. The com­plex­ity is a $O({msqrt{m}})$ in the worst graph. The sec­ond algo­rithm is a $O(m^2)$ com­plex­ity algo­rithm to count iso­mor­phic pat­terns of size 4. The final result was expres­sive faster when com­pared with other imple­mented algorithms.”

    network-​​theory graph-​​theory algo­rithms nudge-​​targets

Items of some interest:

These are my recent Pin​board​.in links:

Items of some interest:

These are my recent Pin​board​.in links:

  • Edge Per­spec­tives with John Hagel: Finite and Infi­nite Games — Which Game Shall We Play in the New Year?

    Far bet­ter, if pos­si­ble, to avoid direct con­fronta­tion and find ways to pur­sue infi­nite game play on the mar­gins or edges of finite game insti­tu­tions or in the white spaces not yet occu­pied by finite game insti­tu­tions.  By draw­ing atten­tion to hori­zons that have not yet been explored and demon­strat­ing the abil­ity to make progress in draw­ing out more poten­tial and pos­si­bil­ity, infi­nite game play­ers have a greater chance of shift­ing the game and attract­ing other play­ers. By build­ing par­al­lel insti­tu­tions and prac­tices that pull oth­ers into their game, infi­nite game play­ers can attract enough crit­i­cal mass so that they can pur­sue their quests with lower risk of inter­ven­tion from the finite game play­ers who view such actions as deeply sub­ver­sive.  At our research cen­ter, JSB and I are now explor­ing these kinds of approaches as a way of achiev­ing orga­ni­za­tional change within large institutions.

    what-​​I-​​do
  • [1107.0056] Fixed para­me­ter algo­rithms for restricted col­or­ing problems

    In this paper, we obtain poly­no­mial time algo­rithms to deter­mine the acyclic chro­matic num­ber, the star chro­matic num­ber, the Thue chro­matic num­ber, the har­mo­nious chro­matic num­ber and the clique chro­matic num­ber of $P_4$-tidy graphs and $(q,q-4)$-graphs, for every fixed $q$. These classes include cographs, $P_4$-sparse and $P_4$-lite graphs. All these col­or­ing prob­lems are known to be NP-​​hard for gen­eral graphs. These algo­rithms are fixed para­me­ter tractable on the para­me­ter $q(G)$, which is the min­i­mum $q$ such that $G$ is a $(q,q-4)$-graph. We also prove that every con­nected $(q,q-4)$-graph with at least $q$ ver­tices is 2-​​clique-​​colorable and that every acyclic col­or­ing of a cograph is also nonrepetitive.

    algo­rithms graph-​​theory discrete-​​mathematics nudge-​​targets
  • [1112.6045] Com­par­ing inter­mit­tency and net­work mea­sure­ments of words and their depen­dency on authorship

    Many fea­tures from texts and lan­guages can now be inferred from sta­tis­ti­cal analy­ses using con­cepts from com­plex net­works and dynam­i­cal sys­tems. In this paper we quan­tify how topo­log­i­cal prop­er­ties of word co-​​occurrence net­works and inter­mit­tency (or bursti­ness) in word dis­tri­b­u­tion depend on the style of authors. Our data­base con­tains 40 books from 8 authors who lived in the 19th and 20th cen­turies, for which the fol­low­ing net­work mea­sure­ments were obtained: clus­ter­ing coef­fi­cient, aver­age short­est path lengths, and between­ness. We found that the two fac­tors with stronger depen­dency on the authors were the skew­ness in the dis­tri­b­u­tion of word inter­mit­tency and the aver­age short­est paths. Other fac­tors such as the betwee­ness and the Zipf’s law expo­nent show only weak depen­dency on author­ship. Also assessed was the con­tri­bu­tion from each mea­sure­ment to author­ship recog­ni­tion using three machine learn­ing meth­ods. The best per­for­mance was a ca. 65 % accu­racy upon com­bin­ing com­plex net­work and inter­mit­tency fea­tures with the near­est neigh­bor algo­rithm. From a detailed analy­sis of the inter­de­pen­dence of the var­i­ous met­rics it is con­cluded that the meth­ods used here are com­ple­men­tary for pro­vid­ing short– and long-​​scale per­spec­tives of texts, which are use­ful for appli­ca­tions such as iden­ti­fi­ca­tion of top­i­cal words and infor­ma­tion retrieval.

    natural-​​language-​​processing document-​​clustering clus­ter­ing feature-​​selection algo­rithms nudge-​​targets
  • [1108.1170] Con­vex Opti­miza­tion with­out Pro­jec­tion Steps

    For the gen­eral prob­lem of min­i­miz­ing a con­vex func­tion over a com­pact con­vex domain, we will inves­ti­gate a sim­ple iter­a­tive approx­i­ma­tion algo­rithm based on the method by Frank & Wolfe 1956, that does not need pro­jec­tion steps in order to stay inside the opti­miza­tion domain. Instead of a pro­jec­tion step, the lin­earized prob­lem defined by a cur­rent sub­gra­di­ent is solved, which gives a step direc­tion that will nat­u­rally stay in the domain. Our frame­work gen­er­al­izes the sparse greedy algo­rithm of Frank & Wolfe and its primal-​​dual analy­sis by Clark­son 2010 (and the low-​​rank SDP approach by Hazan 2008) to arbi­trary con­vex domains. We give a con­ver­gence proof guar­an­tee­ing {epsilon}-small dual­ity gap after O(1/{epsilon}) iter­a­tions. The method allows us to under­stand the spar­sity of approx­i­mate solu­tions for any l1-​​regularized con­vex opti­miza­tion prob­lem (and for opti­miza­tion over the sim­plex), expressed as a func­tion of the approx­i­ma­tion qual­ity. We obtain match­ing upper and lower bounds of {Theta}(1/{epsilon}) for the spar­sity for l1-​​problems. The same bounds apply to low-​​rank semi­def­i­nite opti­miza­tion with bounded trace, show­ing that rank O(1/{epsilon}) is best pos­si­ble here as well. As another appli­ca­tion, we obtain sparse matri­ces of O(1/{epsilon}) non-​​zero entries as {epsilon}-approximate solu­tions when opti­miz­ing any con­vex func­tion over a class of diag­o­nally dom­i­nant sym­met­ric matri­ces. We show that our pro­posed first-​​order method also applies to nuclear norm and max-​​norm matrix opti­miza­tion prob­lems. For nuclear norm reg­u­lar­ized opti­miza­tion, such as matrix com­ple­tion and low-​​rank recov­ery, we demon­strate the prac­ti­cal effi­ciency and scal­a­bil­ity of our algo­rithm for large matrix prob­lems, as e.g. the Net­flix dataset. For gen­eral con­vex opti­miza­tion over bounded matrix max-​​norm, our algo­rithm is the first with a con­ver­gence guar­an­tee, to the best of our knowledge.

    operations-​​research opti­miza­tion algo­rithms nudge-​​targets
  • [1112.6235] Detect­ing a Vec­tor Based on Lin­ear Measurements

    We con­sider a sit­u­a­tion where the state of a sys­tem is rep­re­sented by a real-​​valued vec­tor. Under nor­mal cir­cum­stances, the vec­tor is zero, while an event man­i­fests as non-​​zero entries in this vec­tor, pos­si­bly few. Our inter­est is in the design of algo­rithms that can reli­ably detect events (i.e., test whether the vec­tor is zero or not) with the least amount of infor­ma­tion. We place our­selves in a sit­u­a­tion, now com­mon in the sig­nal pro­cess­ing lit­er­a­ture, where infor­ma­tion about the vec­tor comes in the form of noisy lin­ear mea­sure­ments. We derive infor­ma­tion bounds in an active learn­ing setup and exhibit some sim­ple near-​​optimal algo­rithms. In par­tic­u­lar, our results show that the task of detec­tion within this set­ting is at once much eas­ier, sim­pler and dif­fer­ent than the tasks of esti­ma­tion and sup­port recovery.

    signal-​​processing sta­tis­tics algo­rithms nudge-​​targets
  • [1109.2215] Find­ing miss­ing edges and com­mu­ni­ties in incom­plete networks

    Many algo­rithms have been pro­posed for pre­dict­ing miss­ing edges in net­works, but they do not usu­ally take account of which edges are miss­ing. We focus on net­works which have miss­ing edges of the form that is likely to occur in real net­works, and com­pare algo­rithms that find these miss­ing edges. We also inves­ti­gate the effect of this kind of miss­ing data on com­mu­nity detec­tion algorithms.

    network-​​theory algo­rithms infer­ence sta­tis­tics nudge-​​targets
  • [1010.4735] Explor­ing the Energy Land­scapes of Pro­tein Fold­ing Sim­u­la­tions with Bayesian Computation

    Nested sam­pling is a Bayesian sam­pling tech­nique devel­oped to explore prob­a­bil­ity dis­tri­b­u­tions lo– calised in an expo­nen­tially small area of the para­me­ter space. The algo­rithm pro­vides both pos­te­rior sam­ples and an esti­mate of the evi­dence (mar­ginal like­li­hood) of the model. The nested sam­pling algo– rithm also pro­vides an effi­cient way to cal­cu­late free ener­gies and the expec­ta­tion value of ther­mo­dy­namic observ­ables at any tem­per­a­ture, through a sim­ple post-​​processing of the out­put. Pre­vi­ous appli­ca­tions of the algo­rithm have yielded large effi­ciency gains over other sam­pling tech­niques, includ­ing par­al­lel tem­per­ing (replica exchange). In this paper we describe a par­al­lel imple­men­ta­tion of the nested sam­pling algo­rithm and its appli­ca­tion to the prob­lem of pro­tein fold­ing in a Go-​​type force field of empir­i­cal poten­tials that were designed to sta­bi­lize sec­ondary struc­ture ele­ments in room-​​temperature sim­u­la­tions. We demon­strate the method by con­duct­ing fold­ing sim­u­la­tions on a num­ber of small pro­teins which are com­monly used for test­ing pro­tein fold­ing pro­ce­dures: pro­tein G, the SH3 domain of Src tyro­sine kinase and chy­motrypsin inhibitor 2. A topo­log­i­cal analy­sis of the pos­te­rior sam­ples is per­formed to pro­duce energy land­scape charts, which give a high level descrip­tion of the poten­tial energy sur­face for the pro­tein fold­ing sim­u­la­tions. These charts pro­vide qual­i­ta­tive insights into both the fold­ing process and the nature of the model and force field used.

    structural-​​biology bio­chem­istry mod­el­ing algo­rithms sta­tis­tics meta­mod­el­ing
  • [1109.2618] Fast and Accu­rate Mod­el­ing of Mol­e­c­u­lar Atom­iza­tion Ener­gies with Machine Learning

    We intro­duce a machine learn­ing model to pre­dict atom­iza­tion ener­gies of a diverse set of organic mol­e­cules, based on nuclear charges and atomic posi­tions only. The prob­lem of solv­ing the mol­e­c­u­lar Schr“odinger equa­tion is mapped onto a non-​​linear sta­tis­ti­cal regres­sion prob­lem of reduced com­plex­ity. Regres­sion mod­els are trained on and com­pared to atom­iza­tion ener­gies com­puted with hybrid density-​​functional the­ory. Cross-​​validation over more than seven thou­sand small organic mol­e­cules yields a mean absolute error of ~10 kcal/​mol. Applic­a­bil­ity is demon­strated for the pre­dic­tion of mol­e­c­u­lar atom­iza­tion poten­tial energy curves.

    machine-​​learning learning-​​from-​​data bio­chem­istry computational-​​science nudge-​​targets
  • [1101.2135] Bounded con­fi­dence model: addressed infor­ma­tion main­tain diver­sity of opinions

    A com­mu­nity of agents is sub­ject to a stream of mes­sages, which are rep­re­sented as points on a plane of issues. Mes­sages are sent by media and by agents them­selves. Mes­sages from media shape the pub­lic opin­ion. They are unbi­ased, i.e. pos­i­tive and neg­a­tive opin­ions on a given issue appear with equal fre­quen­cies. In our pre­vi­ous work, the only cri­te­rion to receive a mes­sage by an agent is if the dis­tance between this mes­sage and the ones received ear­lier does not exceed the given value of the tol­er­ance para­me­ter. Here we intro­duce a pos­si­bil­ity to address a mes­sage to a given neigh­bour. We show that this option reduces the una­nim­ity effect, what improves the col­lec­tive performance.

    agent-​​based com­mu­ni­ca­tion network-​​theory machine-​​learning diver­sity

Items of some interest…

These are my recent Pin​board​.in links:

  • [1110.0585] Dis­crim­i­nately Decreas­ing Dis­crim­inabil­ity with Learned Image Filters

    “In machine learn­ing and com­puter vision, input images are often fil­tered to increase data dis­crim­inabil­ity. In some sit­u­a­tions, how­ever, one may wish to pur­posely decrease dis­crim­inabil­ity of one clas­si­fi­ca­tion task (a “dis­trac­tor” task), while simul­ta­ne­ously pre­serv­ing infor­ma­tion rel­e­vant to another (the task-​​of-​​interest): For exam­ple, it may be impor­tant to mask the iden­tity of per­sons con­tained in face images before sub­mit­ting them to a crowd­sourc­ing site (e.g., Mechan­i­cal Turk) when label­ing them for cer­tain facial attrib­utes. Another exam­ple is inter-​​dataset gen­er­al­iza­tion: when train­ing on a dataset with a par­tic­u­lar covari­ance struc­ture among mul­ti­ple attrib­utes, it may be use­ful to sup­press one attribute while pre­serv­ing another so that a trained clas­si­fier does not learn spu­ri­ous cor­re­la­tions between attrib­utes. In this paper we present an algo­rithm that finds opti­mal fil­ters to give high dis­crim­inabil­ity to one task while simul­ta­ne­ously giv­ing low dis­crim­inabil­ity to a dis­trac­tor task. We present results show­ing the effec­tive­ness of the pro­posed tech­nique on both sim­u­lated data and nat­ural face images.”

    machine-​​learning data-​​preparation fil­ter­ing algo­rithms nudge-​​targets
  • [1110.5183] Dif­fu­sion of Infor­ma­tion in Robot Swarms

    “This work is devoted to com­mu­ni­ca­tion approaches, which spread infor­ma­tion in robot swarms. These mech­a­nisms are use­ful for large-​​scale sys­tems and also for such cases when a lim­ited com­mu­ni­ca­tion equip­ment does not allow rout­ing of infor­ma­tion pack­ages. We focus on two approaches such as vir­tual fields and epi­demic algo­rithms, dis­cuss sev­eral aspects of hard­ware imple­men­ta­tion and demon­strate exper­i­ments per­formed with micro­ro­bots “Jasmine”.”

    agent-​​based swarms com­mu­ni­ca­tion complex-​​systems epi­demi­ol­ogy dynamical-​​systems exper­i­ment
  • [1109.5229] Dis­trib­uted Algo­rithms for Opti­mal Power Flow Problem

    “Opti­mal power flow (OPF) is an impor­tant prob­lem for power gen­er­a­tion and it is in gen­eral non-​​convex. With the employ­ment of renew­able energy, it will be desir­able if OPF can be solved very effi­ciently so its solu­tion can be used in real time. With some spe­cial net­work struc­ture, e.g. trees, the prob­lem has been shown to have a zero dual­ity gap and the con­vex dual prob­lem yields the opti­mal solu­tion. In this paper, we pro­pose a pri­mal and a dual algo­rithm to coor­di­nate the smaller sub­prob­lems decom­posed from the con­vex­i­fied OPF. We can arrange the sub­prob­lems to be solved sequen­tially and cumu­la­tively in a cen­tral node or solved in par­al­lel in dis­trib­uted nodes. We test the algo­rithms on IEEE radial dis­tri­b­u­tion test feed­ers, some ran­dom tree-​​structured net­works, and the IEEE trans­mis­sion sys­tem bench­marks. Sim­u­la­tion results show that the com­pu­ta­tion time can be improved dra­mat­i­cally with our algo­rithms over the cen­tral­ized approach of solv­ing the prob­lem with­out decom­po­si­tion, espe­cially in tree-​​structured prob­lems. The com­pu­ta­tion time grows lin­early with the prob­lem size with the cumu­la­tive approach while the dis­trib­uted one can have size-​​independent com­pu­ta­tion time.”

    operations-​​research algo­rithms network-​​theory infra­struc­ture com­po­si­tion nudge-​​targets
  • Coupon collector’s prob­lem — Wikipedia, the free encyclopedia

    “In prob­a­bil­ity the­ory, the coupon collector’s prob­lem describes the “col­lect all coupons and win” con­tests. It asks the fol­low­ing ques­tion: Sup­pose that there are n coupons, from which coupons are being col­lected with replace­ment. What is the prob­a­bil­ity that more than t sam­ple tri­als are needed to col­lect all n coupons? An alter­na­tive state­ment is: Given n coupons, how many coupons do you expect you need to draw with replace­ment before hav­ing drawn each coupon at least once. The math­e­mat­i­cal analy­sis of the prob­lem reveals that the expected num­ber of tri­als needed grows as Θ(nlog(n)). For exam­ple, when n = 50 it takes about 225 tri­als to col­lect all 50 coupons.”

    probability-​​theory slic-​​representation nudge-​​targets
  • [1107.2379] Data Sta­bil­ity in Clus­ter­ing: A Closer Look

    “This paper con­sid­ers the model intro­duced by Bilu and Linial (2010), who study prob­lems for which the opti­mal clus­ter­ing does not change when the dis­tances are per­turbed by mul­ti­plica­tive fac­tors. They show that even when a prob­lem is NP-​​hard, it is some­times pos­si­ble to obtain polynomial-​​time algo­rithms for instances resilient to large per­tur­ba­tions, e.g. on the order of $O(sqrt{n})$ for max-​​cut clus­ter­ing. Awasthi et al. (2010) extend this line of work by con­sid­er­ing center-​​based objec­tives, and Bal­can and Liang (2011) con­sider the $k$-median and min-​​sum objec­tives, giv­ing effi­cient algo­rithms for instances resilient to cer­tain con­stant mul­ti­plica­tive per­tur­ba­tions. Here, we are moti­vated by the ques­tion of to what extent these assump­tions can be relaxed while allow­ing for effi­cient algo­rithms. We show there is lit­tle room to improve these results by giv­ing NP-​​hardness lower bounds for both the $k$-median and min-​​sum objec­tives. On the other hand, we show that mul­ti­plica­tive resilience para­me­ters, even only on the order of $Theta(1)$, can be so strong as to make the clus­ter­ing prob­lem triv­ial, and we exploit these assump­tions to present a sim­ple one pass stream­ing algo­rithm for the $k$-median objec­tive. We also con­sider a model of addi­tive per­tur­ba­tions and give a cor­re­spon­dence between addi­tive and mul­ti­plica­tive notions of sta­bil­ity. Our results pro­vide a close exam­i­na­tion of the con­se­quences of assum­ing, even con­stant, sta­bil­ity in data.”

    clus­ter­ing sta­tis­tics algo­rithms robust­ness nudge-​​targets
  • [1109.5641] Energy-​​Aware Load Bal­anc­ing in Con­tent Deliv­ery Networks

    “Internet-​​scale dis­trib­uted sys­tems such as con­tent deliv­ery net­works (CDNs) oper­ate hun­dreds of thou­sands of servers deployed in thou­sands of data cen­ter loca­tions around the globe. Since the energy costs of oper­at­ing such a large IT infra­struc­ture are a sig­nif­i­cant frac­tion of the total oper­at­ing costs, we argue for redesign­ing CDNs to incor­po­rate energy opti­miza­tions as a first-​​order prin­ci­ple. We pro­pose tech­niques to turn off CDN servers dur­ing peri­ods of low load while seek­ing to bal­ance three key design goals: max­i­mize energy reduc­tion, min­i­mize the impact on client-​​perceived ser­vice avail­abil­ity (SLAs), and limit the fre­quency of on-​​off server tran­si­tions to reduce wear-​​and-​​tear and its impact on hard­ware reli­a­bil­ity. We pro­pose an opti­mal offline algo­rithm and an online algo­rithm to extract energy sav­ings both at the level of local load bal­anc­ing within a data cen­ter and global load bal­anc­ing across data cen­ters. We eval­u­ate our algo­rithms using real pro­duc­tion work­load traces from a large com­mer­cial CDN. Our results show that it is pos­si­ble to reduce the energy con­sump­tion of a CDN by more than 55% while ensur­ing a high level of avail­abil­ity that meets cus­tomer SLA require­ments and incur­ring an aver­age of one on-​​off tran­si­tion per server per day. Fur­ther, we show that keep­ing even 10% of the servers as hot spares helps absorb load spikes due to global flash crowds with lit­tle impact on avail­abil­ity SLAs. Finally, we show that redis­trib­ut­ing load across prox­i­mal data cen­ters can enhance ser­vice avail­abil­ity sig­nif­i­cantly, but has only a mod­est impact on energy savings.”

    algo­rithms load-​​balancing infra­struc­ture nudge-​​targets operations-​​research
  • [1110.0264] Face Recog­ni­tion using Opti­mal Rep­re­sen­ta­tion Ensemble

    “Recently, the face rec­og­niz­ers based on lin­ear rep­re­sen­ta­tions have been shown to deliver state-​​of-​​the-​​art per­for­mance. In real-​​world appli­ca­tions, how­ever, face images usu­ally suf­fer from expres­sions, dis­guises and ran­dom occlu­sions. The prob­lem­atic facial parts under­mine the valid­ity of the linear-​​subspace assump­tion and thus the recog­ni­tion per­for­mance dete­ri­o­rates sig­nif­i­cantly. In this work, we address the prob­lem in a learning-​​inference-​​mixed fash­ion. By observ­ing that the linear-​​subspace assump­tion is more reli­able on cer­tain face patches rather than on the holis­tic face, some Bayesian Patch Rep­re­sen­ta­tions (BPRs) are ran­domly gen­er­ated and inter­preted accord­ing to the Bayes’ the­ory. We then train an ensem­ble model over the patch-​​representations by min­i­miz­ing the empir­i­cal risk w.r.t the “leave-​​one-​​out mar­gins”. The obtained model is termed Opti­mal Rep­re­sen­ta­tion Ensem­ble (ORE), since it guar­an­tees the opti­mal­ity from the per­spec­tive of Empir­i­cal Risk Min­i­miza­tion. To han­dle the unknown pat­terns in test faces, a robust ver­sion of BPR is pro­posed by tak­ing the non-​​face cat­e­gory into con­sid­er­a­tion. Equipped with the Robust-​​BPRs, the infer­ence abil­ity of ORE is increased dra­mat­i­cally and sev­eral record-​​breaking accu­ra­cies (99.9% on Yale-​​B and 99.5% on AR) and desir­able effi­cien­cies (below 20 ms per face in Mat­lab) are achieved. It also over­whelms other mod­u­lar heuris­tics on the faces with ran­dom occlu­sions, extreme expres­sions and dis­guises. Fur­ther­more, to accom­mo­date immense BPRs sets, a boosting-​​like algo­rithm is also derived. The boosted model, a.k.a Boosted-​​ORE, obtains sim­i­lar per­for­mance to its pro­to­type. Besides the empir­i­cal supe­ri­or­i­ties, two desir­able fea­tures of the pro­posed meth­ods, namely, the training-​​determined model-​​selection and the data-​​weight-​​free boost­ing pro­ce­dure, are also the­o­ret­i­cally verified.”

    image-​​analysis face-​​recognition algo­rithms nudge-​​targets
  • [1110.0957] Dic­tio­nary Learn­ing for Deblur­ring and Dig­i­tal Zoom

    “This paper pro­poses a novel approach to image deblur­ring and dig­i­tal zoom­ing using sparse local mod­els of image appear­ance. These mod­els, where small image patches are rep­re­sented as lin­ear com­bi­na­tions of a few ele­ments drawn from some large set (dic­tio­nary) of can­di­dates, have proven well adapted to sev­eral image restora­tion tasks. A key to their suc­cess has been to learn dic­tio­nar­ies adapted to the recon­struc­tion of small image patches. In con­trast, recent works have pro­posed instead to learn dic­tio­nar­ies which are not only adapted to data recon­struc­tion, but also tuned for a spe­cific task. We intro­duce here such an approach to deblur­ring and dig­i­tal zoom, using pairs of blurry/​sharp (or low-​​/​high-​​resolution) images for train­ing, as well as an effec­tive sto­chas­tic gra­di­ent algo­rithm for solv­ing the cor­re­spond­ing opti­miza­tion task. Although this learn­ing prob­lem is not con­vex, once the dic­tio­nar­ies have been learned, the sharp/​high-​​resolution image can be recov­ered via con­vex opti­miza­tion at test time. Exper­i­ments with syn­thetic and real data demon­strate the effec­tive­ness of the pro­posed approach, lead­ing to state-​​of-​​the-​​art per­for­mance for non-​​blind image deblur­ring and dig­i­tal zoom.”

    image-​​processing image-​​analysis algo­rithms nudge-​​targets hyper­res­o­lu­tion
  • [1110.0477] Dis­trib­uted Evo­lu­tion­ary Graph Partitioning

    “We present a novel dis­trib­uted evo­lu­tion­ary algo­rithm, KaFF­PaE, to solve the Graph Par­ti­tion­ing Prob­lem, which makes use of KaFFPa (Karl­sruhe Fast Flow Par­ti­tioner). The use of our mul­ti­level graph par­ti­tioner KaFFPa pro­vides new effec­tive crossover and muta­tion oper­a­tors. By com­bin­ing these with a scal­able com­mu­ni­ca­tion pro­to­col we obtain a sys­tem that is able to improve the best known par­ti­tion­ing results for many inputs in a very short amount of time. For exam­ple, in Walshaw’s well known bench­mark tables we are able to improve or recom­pute 76% of entries for the tables with 1%, 3% and 5% imbalance.”

    algo­rithms graph-​​theory evolutionary-​​algorithms nudge-​​targets
  • [1104.1605] Effi­cient Top-​​K Retrieval in Online Social Tag­ging Networks

    “We con­sider in this paper top-​​k query answer­ing in social tag­ging sys­tems, also known as folk­sonomies. This prob­lem requires a sig­nif­i­cant depar­ture from exist­ing, socially agnos­tic tech­niques. In a network-​​aware con­text, one can (and should) exploit the social links, which can indi­cate how users relate to the seeker and how much weight their tag­ging actions should have in the result build-​​up. We pro­pose an algo­rithm that has the poten­tial to scale to cur­rent appli­ca­tions. While the prob­lem has already been con­sid­ered in pre­vi­ous lit­er­a­ture, this was done either under strong sim­pli­fy­ing assump­tions or under choices that can­not scale to even moderate-​​size real world appli­ca­tions. We first con­sider a key aspect of the prob­lem, which is access­ing the clos­est or most rel­e­vant users for a given seeker. We describe how this can be done on the fly (with­out any pre-​​computations) for sev­eral pos­si­ble choices — arguably the most nat­ural ones — of prox­im­ity com­pu­ta­tion in a user net­work. Based on this, our top-​​k algo­rithm is sound and com­plete, while address­ing the scal­a­bil­ity issues of the exist­ing ones. Impor­tantly, our tech­nique is instance opti­mal in the case when the search relies exclu­sively on the social weight of tag­ging actions. To fur­ther reduce response times, we then con­sider direc­tions for effi­ciency by approx­i­ma­tion. Exten­sive exper­i­ments on real world data show that our tech­niques can dras­ti­cally improve the response time, with­out sac­ri­fic­ing precision.”

    folk­son­omy tag­ging network-​​theory search-​​algorithms nudge-​​targets data-​​access

Items of some interest…

These are my recent Pin​board​.in links:

  • [1110.1462] Dynamic Clus­ter­ing of His­togram Data Based on Adap­tive Squared Wasser­stein Distances

    “…To clus­ter sets of his­togram data, we pro­pose to use Dynamic Clus­ter­ing Algo­rithm, (based on adap­tive squared Wasser­stein dis­tances) that is a k-​​means-​​like algo­rithm for clus­ter­ing a set of indi­vid­u­als into K classes that are apri­ori fixed. The main aim of this research is to pro­vide a tool for clus­ter­ing his­tograms, empha­siz­ing the dif­fer­ent con­tri­bu­tions of the his­togram vari­ables, and their com­po­nents, to the def­i­n­i­tion of the clus­ters. We demon­strate that this can be achieved using adap­tive dis­tances. Two kind of adap­tive dis­tances are con­sid­ered: the first takes into account the vari­abil­ity of each com­po­nent of each descrip­tor for the whole set of indi­vid­u­als; the sec­ond takes into account the vari­abil­ity of each com­po­nent of each descrip­tor in each clus­ter. We fur­nish inter­pre­ta­tive tools of the obtained par­ti­tion based on an exten­sion of the clas­si­cal mea­sures (indexes) to the use of adap­tive dis­tances in the clus­ter­ing cri­te­rion func­tion. Appli­ca­tions on syn­thetic and real-​​world data cor­rob­o­rate the pro­posed procedure.”

    clas­si­fi­ca­tion sta­tis­tics his­tograms met­rics clus­ter­ing
  • [1110.1412] Quan­ti­fy­ing loopy net­work architectures

    “Biol­ogy presents many exam­ples of pla­nar dis­tri­b­u­tion and struc­tural net­works hav­ing dense sets of closed loops. An arche­type of this form of net­work orga­ni­za­tion is the vas­cu­la­ture of dicotyle­do­nous leaves, which show­cases a hierarchically-​​nested archi­tec­ture con­tain­ing closed loops at many dif­fer­ent lev­els. Although a num­ber of meth­ods have been pro­posed to mea­sure aspects of the struc­ture of such net­works, a robust met­ric to quan­tify their hier­ar­chi­cal orga­ni­za­tion is still lack­ing. We present an algo­rith­mic frame­work, the hier­ar­chi­cal loop decom­po­si­tion, that allows map­ping loopy net­works to binary trees, pre­serv­ing in the con­nec­tiv­ity of the trees the archi­tec­ture of the orig­i­nal graph. We apply this frame­work to inves­ti­gate com­puter gen­er­ated graphs, such as arti­fi­cial mod­els and opti­mal dis­tri­b­u­tion net­works, as well as nat­ural graphs extracted from dig­i­tized images of dicotyle­do­nous leaves and vas­cu­la­ture of rat cere­bral neo­cor­tex. We cal­cu­late var­i­ous met­rics based on the Asym­me­try, the cumu­la­tive size dis­tri­b­u­tion and the Strahler bifur­ca­tion ratios of the cor­re­spond­ing trees and dis­cuss the rela­tion­ship of these quan­ti­ties to the archi­tec­tural orga­ni­za­tion of the orig­i­nal graphs. This algo­rith­mic frame­work decou­ples the geo­met­ric infor­ma­tion (exact loca­tion of edges and nodes) from the met­ric topol­ogy (con­nec­tiv­ity and edge weight) and it ulti­mately allows us to per­form a quan­ti­ta­tive sta­tis­ti­cal com­par­i­son between pre­dic­tions of the­o­ret­i­cal mod­els and nat­u­rally occur­ring loopy graphs.”

    com­plex­ol­ogy bio­physics network-​​theory met­rics
  • [1110.1393] High-​​Precision Tun­ing of State for Mem­ris­tive Devices by Adapt­able Variation-​​Tolerant Algorithm

    “Using mem­ris­tive prop­er­ties com­mon for the tita­nium diox­ide thin film devices, we designed a sim­ple write algo­rithm to tune device con­duc­tance at a spe­cific bias point to 1% rel­a­tive accu­racy (which is roughly equiv­a­lent to 7-​​bit pre­ci­sion) within its dynamic range even in the pres­ence of large vari­a­tions in switch­ing behav­ior. The high pre­ci­sion state is non­volatile and the results are likely to be sus­tained for nanoscale mem­ris­tive devices because of the inher­ent fil­a­men­tary nature of the resis­tive switch­ing. The pro­posed func­tion­al­ity of mem­ris­tive devices is espe­cially attrac­tive for ana­log com­put­ing with low pre­ci­sion data. As one rep­re­sen­ta­tive exam­ple we demon­strate hybrid cir­cuitry con­sist­ing of CMOS sum­ming ampli­fier and two mem­ris­tive devices to per­form ana­log mul­ti­ply and accu­mu­late com­pu­ta­tion, which is a typ­i­cal bot­tle­neck oper­a­tion in infor­ma­tion processing.”

    mem­ris­tors engineering-​​design sim­u­la­tion control-​​systems nudge-​​targets
  • [1110.1521] Nodal domains of a non-​​separable prob­lem — the right angled isosce­les triangle

    “Our result may be gen­er­al­ized to other domains where sim­i­lar algo­rithms may apply. Our algo­rithm is based on the fact that the eigen­func­tions are pre­sented as a lin­ear com­bi­na­tion of sim­ple plane waves. It is there­fore tempt­ing to try and gen­er­al­ize it for other drums with sim­i­lar prop­erty. The equi­lat­eral tri­an­gle is an imme­di­ate can­di­date (see [29] and ref­er­ences within). A fur­ther, and quite sur­pris­ing, result is the recur­sive for­mula for the num­ber of nodal loops. To our knowl­edge this is the first known exact for­mula for the nodal count of a non-​​separable pla­nar man­i­fold (for cer­tain eigen­func­tions of tori exact for­mu­las have been given in [22]). The for­mula was found by direct inspec­tion of large tables and has been ver­i­fied for a large bulk of data com­pu­ta­tion­ally. An obvi­ous chal­lenge is to prove this for­mula. In par­tic­u­lar, the recur­sive part of the for­mula resem­bles the famous Euclid algo­rithm for the great­est com­mon divi­sor. A fur­ther inves­ti­ga­tion of the men­tioned for­mula might there­fore expose some new num­ber the­o­ret­i­cal prop­er­ties of the nodal count.”

    physics algo­rithms analytical-​​results open-​​questions geom­e­try acoustics exact-​​form nudge-​​targets
  • [1110.1485] A Face Recog­ni­tion Scheme using Wavelet Based Dom­i­nant Features

    “In this paper, a multi-​​resolution fea­ture extrac­tion algo­rithm for face recog­ni­tion is pro­posed based on two-​​dimensional dis­crete wavelet trans­form (2D-​​DWT), which effi­ciently exploits the local spa­tial vari­a­tions in a face image. For the pur­pose of fea­ture extrac­tion, instead of con­sid­er­ing the entire face image, an entropy-​​based local band selec­tion cri­te­rion is devel­oped, which selects high-​​informative hor­i­zon­tal seg­ments from the face image. In order to cap­ture the local spa­tial vari­a­tions within these high­in­for­ma­tive hor­i­zon­tal bands pre­cisely, the hor­i­zon­tal band is seg­mented into sev­eral small spa­tial mod­ules. Dom­i­nant wavelet coef­fi­cients cor­re­spond­ing to each local region resid­ing inside those hor­i­zon­tal bands are selected as fea­tures. In the selec­tion of the dom­i­nant coef­fi­cients, a thresh­old cri­te­rion is pro­posed, which not only dras­ti­cally reduces the fea­ture dimen­sion but also pro­vides high within-​​class com­pact­ness and high between-​​class sep­a­ra­bil­ity. A prin­ci­pal com­po­nent analy­sis is per­formed to fur­ther reduce the dimen­sion­al­ity of the fea­ture space. Exten­sive exper­i­men­ta­tion is car­ried out upon stan­dard face data­bases and a very high degree of recog­ni­tion accu­racy is achieved by the pro­posed method in com­par­i­son to those obtained by some of the exist­ing methods.”

    face-​​recognition algo­rithms image-​​processing wavelets nudge-​​targets
  • [1110.1553] Hier­ar­chi­cal QR fac­tor­iza­tion algo­rithms for multi-​​core clus­ter systems

    “This paper describes a new QR fac­tor­iza­tion algo­rithm which is espe­cially designed for mas­sively par­al­lel plat­forms com­bin­ing par­al­lel dis­trib­uted multi-​​core nodes. These plat­forms make the present and the fore­see­able future of high-​​performance com­put­ing. Our new QR fac­tor­iza­tion algo­rithm falls in the cat­e­gory of the tile algo­rithms which nat­u­rally enables good data local­ity for the sequen­tial ker­nels exe­cuted by the cores (high sequen­tial per­for­mance), low num­ber of mes­sages in a par­al­lel dis­trib­uted set­ting (small latency term), and fine gran­u­lar­ity (high parallelism).”

    parallel-​​computing operations-​​research fac­tor­iza­tion algo­rithms nudge-​​targets meta-​​algorithms
  • [1110.1560] On the Col­or­ing of Grid Wire­less Sen­sor Net­works: the Vector-​​Based Col­or­ing Method

    “Graph col­or­ing is used in wire­less net­works to opti­mize net­work resources: band­width and energy. Nodes access the medium accord­ing to their color. It is the respon­si­bil­ity of the col­or­ing algo­rithm to ensure that inter­fer­ing nodes do not have the same color. In this research report, we focus on wire­less sen­sor net­works with grid topolo­gies. How does a col­or­ing algo­rithm take advan­tage of the reg­u­lar­ity of grid topol­ogy to pro­vide an opti­mal peri­odic col­or­ing, that is a col­or­ing with the min­i­mum num­ber of col­ors? We pro­pose the Vector-​​Based Col­or­ing Method, denoted VCM, a new method that is able to pro­vide an opti­mal peri­odic col­or­ing for any radio trans­mis­sion range and for any h-​​hop col­or­ing, h>=1. This method con­sists in deter­min­ing at which grid nodes a color can be repro­duced with­out cre­at­ing inter­fer­ences between these nodes while min­i­miz­ing the num­ber of col­ors used. We com­pare the num­ber of col­ors pro­vided by VCM with the num­ber of col­ors obtained by a dis­trib­uted col­or­ing algo­rithm with line and col­umn pri­or­ity assign­ments. We also pro­vide bounds on the num­ber of col­ors of opti­mal gen­eral col­or­ings of the infi­nite grid, and show that peri­odic col­or­ings (and thus VCM) are asymp­tot­i­cally opti­mal. Finally, we dis­cuss the applic­a­bil­ity of this method to a real wire­less network.”

    graph-​​theory algo­rithms operations-​​research nudge-​​targets
  • [1110.1580] A Polylogarithmic-​​Competitive Algo­rithm for the k-​​Server Problem

    “We give the first polylogarithmic-​​competitive ran­dom­ized online algo­rithm for the $k$-server prob­lem on an arbi­trary finite met­ric space. In par­tic­u­lar, our algo­rithm achieves a com­pet­i­tive ratio of O(log^3 n log^2 k log log n) for any met­ric space on n points. Our algo­rithm improves upon the deter­min­is­tic (2k-1)-competitive algo­rithm of Kout­sou­pias and Papadim­itriou [J.ACM’95] when­ever n is sub-​​exponential in k.”

    sched­ul­ing operations-​​research algo­rithms nudge-​​targets
  • [1110.1590] PSA: The Packet Sched­ul­ing Algo­rithm for Wire­less Sen­sor Networks

    “The main cause of wasted energy con­sump­tion in wire­less sen­sor net­works is packet col­li­sion. The packet sched­ul­ing algo­rithm is there­fore intro­duced to solve this prob­lem. Some packet sched­ul­ing algo­rithms can also influ­ence and delay the data trans­mit­ting in the real-​​time wire­less sen­sor net­works. This paper presents the packet sched­ul­ing algo­rithm (PSA) in order to reduce the packet con­ges­tion in MAC layer lead­ing to reduce the over­all of packet col­li­sion in the sys­tem The PSA is com­pared with the sim­ple CSMA/​CA and other approaches using net­work topol­ogy bench­marks in math­e­mat­i­cal method. The per­for­mances of our PSA are bet­ter than the stan­dard (CSMA/​CA). The PSA pro­duces bet­ter through­put than other algo­rithms. On other hand, the aver­age delay of PSA is higher than pre­vi­ous works. How­ever, the PSA uti­lizes the chan­nel bet­ter than all algorithms.”

    sensor-​​networks distributed-​​processing sched­ul­ing rout­ing operations-​​research algo­rithms nudge-​​targets
  • [1110.0725] A Sur­vey of Dis­trib­uted Data Aggre­ga­tion Algorithms

    “Dis­trib­uted data aggre­ga­tion has been an active field of research in the last decade, and a huge diverse amount of tech­niques can be found in the lit­er­a­ture. For this rea­sons, this sur­vey intends to be an impor­tant time sav­ing instru­ment, for those that desire to get a quick and com­pre­hen­sive overview of the state of the art on dis­trib­uted data aggre­ga­tion. More­over, by care­fully high­light­ing the strength and lim­i­ta­tions of the more per­ti­nent approaches, this study can pro­vide a use­ful assis­tance to help read­ers choose which tech­nique to apply in spe­cific set­tings. Cur­rently, there is no ideal gen­eral solu­tion to the dis­trib­uted com­pu­ta­tion of an aggre­ga­tion func­tion, all exist­ing tech­niques have its pit­falls (some more than oth­ers). There­fore, more research in this field will be expected in the next few years. In par­tic­u­lar, due to the added value of com­put­ing com­plex aggre­gates, new algo­rithms might arise to esti­mate the sta­tis­ti­cal dis­tri­b­u­tion of val­ues, as the few exist­ing approaches exhibit some lim­i­ta­tions in terms of accu­racy and resource con­sump­tion. Addi­tional research efforts should be made to improve the sup­port to churn, mes­sage loss, and con­tin­u­ous esti­ma­tion of muta­ble input values.”

    sta­tis­tics reviews distributed-​​processing com­mu­ni­ca­tion coor­di­na­tion nudge-​​targets
  • [0911.3482] Com­plex­ity of Net­works (reprise)

    “Net­work or graph struc­tures are ubiq­ui­tous in the study of com­plex sys­tems. Often, we are inter­ested in com­plex­ity trends of these sys­tem as it evolves under some dynamic. An exam­ple might be look­ing at the com­plex­ity of a food web as species enter an ecosys­tem via migra­tion or spe­ci­a­tion, and leave via extinc­tion. In a pre­vi­ous paper, a com­plex­ity mea­sure of net­works was pro­posed based on the {em com­plex­ity is infor­ma­tion con­tent} par­a­digm. To apply this par­a­digm to any object, one must fix two things: a rep­re­sen­ta­tion lan­guage, in which strings of sym­bols from some alpha­bet describe, or stand for the objects being con­sid­ered; and a means of deter­min­ing when two such descrip­tions refer to the same object. With these two things set, the infor­ma­tion con­tent of an object can be com­puted in prin­ci­ple from the num­ber of equiv­a­lent descrip­tions describ­ing a par­tic­u­lar object. The pre­vi­ously pro­posed rep­re­sen­ta­tion lan­guage had the defi­ciency that the fully con­nected and empty net­works were the most com­plex for a given num­ber of nodes. A vari­a­tion of this mea­sure, called zcom­plex­ity, applied a com­pres­sion algo­rithm to the result­ing bit­string rep­re­sen­ta­tion, to solve this prob­lem. Unfor­tu­nately, zcom­plex­ity proved too com­pu­ta­tion­ally expen­sive to be prac­ti­cal. In this paper, I pro­pose a new rep­re­sen­ta­tion lan­guage that encodes the num­ber of links along with the num­ber of nodes and a rep­re­sen­ta­tion of the lin­klist. This, like zcom­plex­ity, exhibits min­i­mal com­plex­ity for fully con­nected and empty net­works, but is as tractable as the orig­i­nal measure.”

    network-​​theory com­plex­ol­ogy complex-​​systems mea­sure­ment per­form structure-​​function-​​relations discrete-​​mathematics
  • [1108.4279] Detec­tion and emergence

    “Two dif­fer­ent con­cep­tions of emer­gence are rec­on­ciled as two instances of the phe­nom­e­non of detec­tion. In the process of com­par­ing these two con­cep­tions, we find that the notions of com­plex­ity and detec­tion allow us to form a uni­fied def­i­n­i­tion of emer­gence that clearly delin­eates the role of the observer.”

    com­plex­ol­ogy emer­gence pragmatism-it-ain’t but-​​soon
  • [1001.4278] Weight Opti­miza­tion for Dis­trib­uted Aver­age Con­sen­sus Algo­rithm in Sym­met­ric, CCS & KCS Star Networks

    “This paper addresses weight opti­miza­tion prob­lem in dis­trib­uted con­sen­sus aver­ag­ing algo­rithm over net­works with sym­met­ric star topol­ogy. We have deter­mined opti­mal weights and con­ver­gence rate of the net­work in terms of its topo­log­i­cal para­me­ters. In addi­tion, two alter­na­tive topolo­gies with more rapid con­ver­gence rates have been intro­duced. The new topolo­gies are Complete-​​Cored Sym­met­ric (CCS) star and K-​​Cored Sym­met­ric (KCS) star topolo­gies. It has been shown that the opti­mal weights for the edges of cen­tral part in sym­met­ric and CCS star con­fig­u­ra­tions are inde­pen­dent of their branches. By sim­u­la­tion opti­mal­ity of obtained weights under quan­ti­za­tion con­straints have been verified.”

    operations-​​research decision-​​making network-​​theory nudge-​​targets
  • [1109.5389] Water dri­ves pep­tide con­for­ma­tional transitions

    “Tran­si­tions between metastable con­for­ma­tions of a dipep­tide are inves­ti­gated using clas­si­cal mol­e­c­u­lar dynam­ics sim­u­la­tion with explicit water mol­e­cules. The dis­tri­b­u­tion of the sur­round­ing water at dif­fer­ent moments before the tran­si­tions and the dynam­i­cal cor­re­la­tions of water with the peptide’s con­fig­u­ra­tional motions indi­cate that water is the main dri­ving force of the con­for­ma­tional changes.”

    molecular-​​design systems-​​biology sim­u­la­tion intracellular-​​dynamics kinda-​​knew-​​this-​​a-​​long-​​time-​​ago bio­chem­istry
  • [1105.1445] Vehic­u­lar traf­fic flow at an inter­sec­tion with the pos­si­bil­ity of turning

    “We have devel­oped a Nagel-​​Schreckenberg cel­lu­lar automata model for describ­ing of vehic­u­lar traf­fic flow at a sin­gle inter­sec­tion. A set of traf­fic lights oper­at­ing in fixed-​​time scheme con­trols the traf­fic flow. Open bound­ary con­di­tion is applied to the streets each of which con­duct a uni-​​directional flow. Streets are single-​​lane and cars can turn upon reach­ing to the inter­sec­tion with pre­scribed prob­a­bil­i­ties. Exten­sive Monte Carlo sim­u­la­tions are car­ried out to find the model flow char­ac­ter­is­tics. In par­tic­u­lar, we inves­ti­gate the flows depen­dence on the sig­nal­i­sa­tion para­me­ters, turn­ing prob­a­bil­i­ties and input rates. It is shown that for each set of para­me­ters, there exist a plateau region inside which the total out­flow from the inter­sec­tion remains almost con­stant. We also com­pute total wait­ing time of vehi­cles per cycle behind red lights for var­i­ous con­trol parameters.”

    cellular-​​automata com­plex­ol­ogy traffic-​​models agent-​​based sim­u­la­tion nudge-​​substrates
  • [1110.1391] A Com­par­i­son of Dif­fer­ent Machine Translit­er­a­tion Models

    “Machine translit­er­a­tion is a method for auto­mat­i­cally con­vert­ing words in one lan­guage into pho­net­i­cally equiv­a­lent ones in another lan­guage. Machine translit­er­a­tion plays an impor­tant role in nat­ural lan­guage appli­ca­tions such as infor­ma­tion retrieval and machine trans­la­tion, espe­cially for han­dling proper nouns and tech­ni­cal terms. Four machine translit­er­a­tion mod­els — grapheme-​​based translit­er­a­tion model, phoneme-​​based translit­er­a­tion model, hybrid translit­er­a­tion model, and correspondence-​​based translit­er­a­tion model — have been pro­posed by sev­eral researchers. To date, how­ever, there has been lit­tle research on a frame­work in which mul­ti­ple translit­er­a­tion mod­els can oper­ate simul­ta­ne­ously. Fur­ther­more, there has been no com­par­i­son of the four mod­els within the same frame­work and using the same data. We addressed these prob­lems by 1) mod­el­ing the four mod­els within the same frame­work, 2) com­par­ing them under the same con­di­tions, and 3) devel­op­ing a way to improve machine translit­er­a­tion through this com­par­i­son. Our com­par­i­son showed that the hybrid and correspondence-​​based mod­els were the most effec­tive and that the four mod­els can be used in a com­ple­men­tary man­ner to improve machine translit­er­a­tion performance.”

    natural-​​language-​​processing machine-​​learning review nudge-​​targets
  • [1108.5508] A Pat­tern Measure

    “In this paper we pro­pose numer­i­cal mea­sures for eval­u­at­ing the aes­thetic inter­est of sim­ple pat­terns. The pat­terns con­sist of ele­ments (sym­bols, pix­els, etc.) in reg­u­lar square arrays. The mea­sures depend on two char­ac­ter­is­tics of the pat­terns: the num­ber of dif­fer­ent types of ele­ment, and the num­ber of sym­me­tries in their arrange­ment. We define two com­ple­men­tary com­pos­ite mea­sures L and C for the degree of pat­tern in a design, and com­pute them here for 2×2 and 6×6 arrays. The results dis­tin­guish sim­ple from high-​​variation cases. We sus­pect that the mea­sure L cor­re­sponds to the degree that human beings intu­itively feel a design to be “inter­est­ing”, so this model would aid in quan­ti­fy­ing the visual con­nec­tion of two– dimen­sional designs with view­ers. The other com­pos­ite mea­sure C based on these numer­i­cal prop­er­ties char­ac­ter­izes the extent of ran­dom­ness of an array. Com­bin­ing sym­bol vari­ety with sym­me­try cal­cu­la­tions allows us to employ hier­ar­chi­cal scal­ing to count the rel­a­tive impact of dif­fer­ent lev­els of scale. By iden­ti­fy­ing sub­struc­tures we can dis­tin­guish between orga­nized pat­terns and dis­or­ga­nized com­plex­ity. The mea­sures described here are related to ver­bal descrip­tors derived from work by psy­chol­o­gists on responses to visual environments.”

    cog­ni­tion aes­thet­ics experimental-​​psychology nudge-​​targets learning-​​by-​​watching
  • [1106.5264] Acquir­ing Cor­rect Knowl­edge for Nat­ural Lan­guage Generation

    “Nat­ural lan­guage gen­er­a­tion (NLG) sys­tems are com­puter soft­ware sys­tems that pro­duce texts in Eng­lish and other human lan­guages, often from non-​​linguistic input data. NLG sys­tems, like most AI sys­tems, need sub­stan­tial amounts of knowl­edge. How­ever, our expe­ri­ence in two NLG projects sug­gests that it is dif­fi­cult to acquire cor­rect knowl­edge for NLG sys­tems; indeed, every knowl­edge acqui­si­tion (KA) tech­nique we tried had sig­nif­i­cant prob­lems. In gen­eral terms, these prob­lems were due to the com­plex­ity, nov­elty, and poorly under­stood nature of the tasks our sys­tems attempted, and were wors­ened by the fact that peo­ple write so dif­fer­ently. This meant in par­tic­u­lar that corpus-​​based KA approaches suf­fered because it was impos­si­ble to assem­ble a siz­able cor­pus of high-​​quality con­sis­tent man­u­ally writ­ten texts in our domains; and struc­tured expert-​​oriented KA tech­niques suf­fered because experts dis­agreed and because we could not get enough infor­ma­tion about spe­cial and unusual cases to build robust sys­tems. We believe that such prob­lems are likely to affect many other NLG sys­tems as well. In the long term, we hope that new KA tech­niques may emerge to help NLG sys­tem builders. In the shorter term, we believe that under­stand­ing how indi­vid­ual KA tech­niques can fail, and using a mix­ture of dif­fer­ent KA tech­niques with dif­fer­ent strengths and weak­nesses, can help devel­op­ers acquire NLG knowl­edge that is mostly correct.”

    natural-​​language-​​processing artificial-​​intelligence interesting-​​problems high-​​hanging-​​fruit machine-​​learning nudge-​​targets
  • [1105.2423] Cytoskele­ton and Cell Motility

    “The present arti­cle is an invited con­tri­bu­tion to the Ency­clo­pe­dia of Com­plex­ity and Sys­tem Sci­ence, Robert A. Mey­ers Ed., Springer New York (2009). It is a review of the bio­phys­i­cal mech­a­nisms that underly cell motility.…”

    bio­physics biol­ogy review i-​​used-​​to-​​do-​​this-​​stuff lovely
  • [1110.0671] Width Dis­tri­b­u­tions for Con­vex Reg­u­lar Polyhedra

    “The mean width is a mea­sure on three-​​dimensional con­vex bod­ies that enjoys equal sta­tus with vol­ume and sur­face area [Rota]. As the phrase sug­gests, it is the mean of a prob­a­bil­ity den­sity f. We ver­ify for­mu­las for mean widths of the reg­u­lar tetra­he­dron and the cube. Higher-​​order moments of f_​tetra and f_​cube have not been exam­ined until now. Assume that each poly­he­dron has edges of unit length. We deduce that the mean square width of the reg­u­lar tetra­he­dron is 1/3+(3+sqrt(3))/(3*pi) and the mean square width of the cube is 1+4/pi.”

    geom­e­try mathematical-​​recreations nudge-​​targets
  • [cs/​0305036] Using Dynamic Sim­u­la­tion in the Devel­op­ment of Con­struc­tion Machinery

    “As in the car indus­try for quite some time, dynamic sim­u­la­tion of com­plete vehi­cles is being prac­ticed more and more in the devel­op­ment of off-​​road machin­ery. How­ever, spe­cific ques­tions arise due not only to com­pany struc­ture and size, but espe­cially to the type of prod­uct. Tightly cou­pled, non-​​linear sub­sys­tems of dif­fer­ent domains make pre­dic­tion and opti­mi­sa­tion of the com­plete system’s dynamic behav­iour a chal­lenge. Fur­ther­more, the demand for ver­sa­tile machines leads to some­times con­tra­dic­tory tar­get require­ments and can turn the design process into a hunt for the least painful com­pro­mise. This can be avoided by pro­found sys­tem knowl­edge, assisted by simulation-​​driven prod­uct devel­op­ment. This paper gives an overview of joint research into this issue by Volvo Wheel Load­ers and Linkop­ing Uni­ver­sity on that mat­ter, lists the results of a related lit­er­a­ture review and intro­duces the term “oper­ate­abil­ity”. Rather than giv­ing detailed answers, the prob­lem space for ongo­ing and future research is exam­ined and pos­si­ble solu­tions are sketched.”

    engineering-​​design design-​​automation mod­el­ing dynamical-​​systems man­u­fac­tur­ing nudge-​​targets