Items of some interest…

These are my recent Pin​board​.in links:

  • [1108.4135] Complex-​​Valued Autoencoders

    “Autoen­coders are unsu­per­vised machine learn­ing cir­cuits whose learn­ing goal is to min­i­mize a dis­tor­tion mea­sure between inputs and out­puts. Lin­ear autoen­coders can be defined over any field and only real-​​valued lin­ear autoen­coder have been stud­ied so far. Here we study complex-​​valued lin­ear autoen­coders where the com­po­nents of the train­ing vec­tors and adjustable matri­ces are defined over the com­plex field with the $L_​2$ norm. We pro­vide sim­pler and more gen­eral proofs that unify the real-​​valued and complex-​​valued cases, show­ing that in both cases the land­scape of the error func­tion is invari­ant under cer­tain groups of trans­for­ma­tions. The land­scape has no local min­ima, a fam­ily of global min­ima asso­ci­ated with Prin­ci­pal Com­po­nent Analy­sis, and many fam­i­lies of sad­dle points asso­ci­ated with orthog­o­nal pro­jec­tions onto sub-​​space spanned by sub-​​optimal sub­sets of eigen­vec­tors of the covari­ance matrix. The the­ory yields sev­eral iter­a­tive, con­ver­gent, learn­ing algo­rithms, a clear under­stand­ing of the gen­er­al­iza­tion prop­er­ties of the trained autoen­coders, and can equally be applied to the hetero-​​associative case when exter­nal tar­gets are pro­vided. Par­tial results on deep archi­tec­ture as well as the dif­fer­en­tial geom­e­try of autoen­coders are also pre­sented. The gen­eral frame­work described here is use­ful to clas­sify autoen­coders and iden­tify gen­eral com­mon prop­er­ties that ought to be inves­ti­gated for each class, illu­mi­nat­ing some of the con­nec­tions between infor­ma­tion the­ory, unsu­per­vised learn­ing, clus­ter­ing, Heb­bian learn­ing, and auto encoders.”

    neural-​​networks machine-​​learning clas­si­fi­ca­tion encod­ing algo­rithms nudge-​​targets
  • [1108.5685] Pre­dict­ing flow rever­sals in chaotic nat­ural con­vec­tion using data assimilation

    “A sim­pli­fied model of nat­ural con­vec­tion, sim­i­lar to the Lorenz (1963) sys­tem, is com­pared to com­pu­ta­tional fluid dynam­ics sim­u­la­tions in order to test data assim­i­la­tion meth­ods and bet­ter under­stand the dynam­ics of con­vec­tion. The ther­mosyphon is rep­re­sented by a long time flow sim­u­la­tion, which serves as a ref­er­ence “truth”. Fore­casts are then made using the Lorenz-​​like model and syn­chro­nized to noisy and lim­ited obser­va­tions of the truth using data assim­i­la­tion. The result­ing analy­sis is observed to infer dynam­ics absent from the model when using short assim­i­la­tion win­dows. Fur­ther­more, chaotic flow rever­sal occur­rence and res­i­dency times in each rota­tional state are fore­cast using analy­sis data. Flow rever­sals have been suc­cess­fully fore­cast in the related Lorenz sys­tem, as part of a per­fect model exper­i­ment, but never in the pres­ence of sig­nif­i­cant model error or unob­served vari­ables. Finally, we pro­vide new details con­cern­ing the fluid dynam­i­cal processes present in the ther­mosyphon dur­ing these flow reversals.”

    chaos dynamical-​​systems exper­i­ment pre­dic­tion numerical-​​methods algo­rithms nudge-​​targets
  • [1108.1320] Com­pressed Matrix Multiplication

    “Moti­vated by the prob­lems of com­put­ing sam­ple covari­ance matri­ces, and of trans­form­ing a col­lec­tion of vec­tors to a basis where they are sparse, we present a sim­ple algo­rithm that com­putes an approx­i­ma­tion of the prod­uct of two n-​​by-​​n real matri­ces A and B.…”

    approx­i­ma­tion algo­rithms nudge-​​targets
  • [1110.5296] Com­put­ing a Longest Com­mon Palin­dromic Subsequence

    “The {em longest com­mon sub­se­quence (LCS)} prob­lem is a clas­sic and well-​​studied prob­lem in com­puter sci­ence. Palin­drome is a word which reads the same for­ward as it does back­ward. The {em longest com­mon palin­dromic sub­se­quence (LCPS)} prob­lem is an inter­est­ing vari­ant of the clas­sic LCS prob­lem which finds the longest com­mon sub­se­quence between two given strings such that the com­puted sub­se­quence is also a palin­drome. In this paper, we study the LCPS prob­lem and give effi­cient algo­rithms to solve this prob­lem. To the best of our knowl­edge, this is the first attempt to study and solve this inter­est­ing problem.”

    com­bi­na­torics strings algo­rithms nudge-​​targets

Items of some interest…

These are my recent Pin​board​.in links:

  • [1110.0585] Dis­crim­i­nately Decreas­ing Dis­crim­inabil­ity with Learned Image Filters

    “In machine learn­ing and com­puter vision, input images are often fil­tered to increase data dis­crim­inabil­ity. In some sit­u­a­tions, how­ever, one may wish to pur­posely decrease dis­crim­inabil­ity of one clas­si­fi­ca­tion task (a “dis­trac­tor” task), while simul­ta­ne­ously pre­serv­ing infor­ma­tion rel­e­vant to another (the task-​​of-​​interest): For exam­ple, it may be impor­tant to mask the iden­tity of per­sons con­tained in face images before sub­mit­ting them to a crowd­sourc­ing site (e.g., Mechan­i­cal Turk) when label­ing them for cer­tain facial attrib­utes. Another exam­ple is inter-​​dataset gen­er­al­iza­tion: when train­ing on a dataset with a par­tic­u­lar covari­ance struc­ture among mul­ti­ple attrib­utes, it may be use­ful to sup­press one attribute while pre­serv­ing another so that a trained clas­si­fier does not learn spu­ri­ous cor­re­la­tions between attrib­utes. In this paper we present an algo­rithm that finds opti­mal fil­ters to give high dis­crim­inabil­ity to one task while simul­ta­ne­ously giv­ing low dis­crim­inabil­ity to a dis­trac­tor task. We present results show­ing the effec­tive­ness of the pro­posed tech­nique on both sim­u­lated data and nat­ural face images.”

    machine-​​learning data-​​preparation fil­ter­ing algo­rithms nudge-​​targets
  • [1110.5183] Dif­fu­sion of Infor­ma­tion in Robot Swarms

    “This work is devoted to com­mu­ni­ca­tion approaches, which spread infor­ma­tion in robot swarms. These mech­a­nisms are use­ful for large-​​scale sys­tems and also for such cases when a lim­ited com­mu­ni­ca­tion equip­ment does not allow rout­ing of infor­ma­tion pack­ages. We focus on two approaches such as vir­tual fields and epi­demic algo­rithms, dis­cuss sev­eral aspects of hard­ware imple­men­ta­tion and demon­strate exper­i­ments per­formed with micro­ro­bots “Jasmine”.”

    agent-​​based swarms com­mu­ni­ca­tion complex-​​systems epi­demi­ol­ogy dynamical-​​systems exper­i­ment
  • [1109.5229] Dis­trib­uted Algo­rithms for Opti­mal Power Flow Problem

    “Opti­mal power flow (OPF) is an impor­tant prob­lem for power gen­er­a­tion and it is in gen­eral non-​​convex. With the employ­ment of renew­able energy, it will be desir­able if OPF can be solved very effi­ciently so its solu­tion can be used in real time. With some spe­cial net­work struc­ture, e.g. trees, the prob­lem has been shown to have a zero dual­ity gap and the con­vex dual prob­lem yields the opti­mal solu­tion. In this paper, we pro­pose a pri­mal and a dual algo­rithm to coor­di­nate the smaller sub­prob­lems decom­posed from the con­vex­i­fied OPF. We can arrange the sub­prob­lems to be solved sequen­tially and cumu­la­tively in a cen­tral node or solved in par­al­lel in dis­trib­uted nodes. We test the algo­rithms on IEEE radial dis­tri­b­u­tion test feed­ers, some ran­dom tree-​​structured net­works, and the IEEE trans­mis­sion sys­tem bench­marks. Sim­u­la­tion results show that the com­pu­ta­tion time can be improved dra­mat­i­cally with our algo­rithms over the cen­tral­ized approach of solv­ing the prob­lem with­out decom­po­si­tion, espe­cially in tree-​​structured prob­lems. The com­pu­ta­tion time grows lin­early with the prob­lem size with the cumu­la­tive approach while the dis­trib­uted one can have size-​​independent com­pu­ta­tion time.”

    operations-​​research algo­rithms network-​​theory infra­struc­ture com­po­si­tion nudge-​​targets
  • Coupon collector’s prob­lem — Wikipedia, the free encyclopedia

    “In prob­a­bil­ity the­ory, the coupon collector’s prob­lem describes the “col­lect all coupons and win” con­tests. It asks the fol­low­ing ques­tion: Sup­pose that there are n coupons, from which coupons are being col­lected with replace­ment. What is the prob­a­bil­ity that more than t sam­ple tri­als are needed to col­lect all n coupons? An alter­na­tive state­ment is: Given n coupons, how many coupons do you expect you need to draw with replace­ment before hav­ing drawn each coupon at least once. The math­e­mat­i­cal analy­sis of the prob­lem reveals that the expected num­ber of tri­als needed grows as Θ(nlog(n)). For exam­ple, when n = 50 it takes about 225 tri­als to col­lect all 50 coupons.”

    probability-​​theory slic-​​representation nudge-​​targets
  • [1107.2379] Data Sta­bil­ity in Clus­ter­ing: A Closer Look

    “This paper con­sid­ers the model intro­duced by Bilu and Linial (2010), who study prob­lems for which the opti­mal clus­ter­ing does not change when the dis­tances are per­turbed by mul­ti­plica­tive fac­tors. They show that even when a prob­lem is NP-​​hard, it is some­times pos­si­ble to obtain polynomial-​​time algo­rithms for instances resilient to large per­tur­ba­tions, e.g. on the order of $O(sqrt{n})$ for max-​​cut clus­ter­ing. Awasthi et al. (2010) extend this line of work by con­sid­er­ing center-​​based objec­tives, and Bal­can and Liang (2011) con­sider the $k$-median and min-​​sum objec­tives, giv­ing effi­cient algo­rithms for instances resilient to cer­tain con­stant mul­ti­plica­tive per­tur­ba­tions. Here, we are moti­vated by the ques­tion of to what extent these assump­tions can be relaxed while allow­ing for effi­cient algo­rithms. We show there is lit­tle room to improve these results by giv­ing NP-​​hardness lower bounds for both the $k$-median and min-​​sum objec­tives. On the other hand, we show that mul­ti­plica­tive resilience para­me­ters, even only on the order of $Theta(1)$, can be so strong as to make the clus­ter­ing prob­lem triv­ial, and we exploit these assump­tions to present a sim­ple one pass stream­ing algo­rithm for the $k$-median objec­tive. We also con­sider a model of addi­tive per­tur­ba­tions and give a cor­re­spon­dence between addi­tive and mul­ti­plica­tive notions of sta­bil­ity. Our results pro­vide a close exam­i­na­tion of the con­se­quences of assum­ing, even con­stant, sta­bil­ity in data.”

    clus­ter­ing sta­tis­tics algo­rithms robust­ness nudge-​​targets
  • [1109.5641] Energy-​​Aware Load Bal­anc­ing in Con­tent Deliv­ery Networks

    “Internet-​​scale dis­trib­uted sys­tems such as con­tent deliv­ery net­works (CDNs) oper­ate hun­dreds of thou­sands of servers deployed in thou­sands of data cen­ter loca­tions around the globe. Since the energy costs of oper­at­ing such a large IT infra­struc­ture are a sig­nif­i­cant frac­tion of the total oper­at­ing costs, we argue for redesign­ing CDNs to incor­po­rate energy opti­miza­tions as a first-​​order prin­ci­ple. We pro­pose tech­niques to turn off CDN servers dur­ing peri­ods of low load while seek­ing to bal­ance three key design goals: max­i­mize energy reduc­tion, min­i­mize the impact on client-​​perceived ser­vice avail­abil­ity (SLAs), and limit the fre­quency of on-​​off server tran­si­tions to reduce wear-​​and-​​tear and its impact on hard­ware reli­a­bil­ity. We pro­pose an opti­mal offline algo­rithm and an online algo­rithm to extract energy sav­ings both at the level of local load bal­anc­ing within a data cen­ter and global load bal­anc­ing across data cen­ters. We eval­u­ate our algo­rithms using real pro­duc­tion work­load traces from a large com­mer­cial CDN. Our results show that it is pos­si­ble to reduce the energy con­sump­tion of a CDN by more than 55% while ensur­ing a high level of avail­abil­ity that meets cus­tomer SLA require­ments and incur­ring an aver­age of one on-​​off tran­si­tion per server per day. Fur­ther, we show that keep­ing even 10% of the servers as hot spares helps absorb load spikes due to global flash crowds with lit­tle impact on avail­abil­ity SLAs. Finally, we show that redis­trib­ut­ing load across prox­i­mal data cen­ters can enhance ser­vice avail­abil­ity sig­nif­i­cantly, but has only a mod­est impact on energy savings.”

    algo­rithms load-​​balancing infra­struc­ture nudge-​​targets operations-​​research
  • [1110.0264] Face Recog­ni­tion using Opti­mal Rep­re­sen­ta­tion Ensemble

    “Recently, the face rec­og­niz­ers based on lin­ear rep­re­sen­ta­tions have been shown to deliver state-​​of-​​the-​​art per­for­mance. In real-​​world appli­ca­tions, how­ever, face images usu­ally suf­fer from expres­sions, dis­guises and ran­dom occlu­sions. The prob­lem­atic facial parts under­mine the valid­ity of the linear-​​subspace assump­tion and thus the recog­ni­tion per­for­mance dete­ri­o­rates sig­nif­i­cantly. In this work, we address the prob­lem in a learning-​​inference-​​mixed fash­ion. By observ­ing that the linear-​​subspace assump­tion is more reli­able on cer­tain face patches rather than on the holis­tic face, some Bayesian Patch Rep­re­sen­ta­tions (BPRs) are ran­domly gen­er­ated and inter­preted accord­ing to the Bayes’ the­ory. We then train an ensem­ble model over the patch-​​representations by min­i­miz­ing the empir­i­cal risk w.r.t the “leave-​​one-​​out mar­gins”. The obtained model is termed Opti­mal Rep­re­sen­ta­tion Ensem­ble (ORE), since it guar­an­tees the opti­mal­ity from the per­spec­tive of Empir­i­cal Risk Min­i­miza­tion. To han­dle the unknown pat­terns in test faces, a robust ver­sion of BPR is pro­posed by tak­ing the non-​​face cat­e­gory into con­sid­er­a­tion. Equipped with the Robust-​​BPRs, the infer­ence abil­ity of ORE is increased dra­mat­i­cally and sev­eral record-​​breaking accu­ra­cies (99.9% on Yale-​​B and 99.5% on AR) and desir­able effi­cien­cies (below 20 ms per face in Mat­lab) are achieved. It also over­whelms other mod­u­lar heuris­tics on the faces with ran­dom occlu­sions, extreme expres­sions and dis­guises. Fur­ther­more, to accom­mo­date immense BPRs sets, a boosting-​​like algo­rithm is also derived. The boosted model, a.k.a Boosted-​​ORE, obtains sim­i­lar per­for­mance to its pro­to­type. Besides the empir­i­cal supe­ri­or­i­ties, two desir­able fea­tures of the pro­posed meth­ods, namely, the training-​​determined model-​​selection and the data-​​weight-​​free boost­ing pro­ce­dure, are also the­o­ret­i­cally verified.”

    image-​​analysis face-​​recognition algo­rithms nudge-​​targets
  • [1110.0957] Dic­tio­nary Learn­ing for Deblur­ring and Dig­i­tal Zoom

    “This paper pro­poses a novel approach to image deblur­ring and dig­i­tal zoom­ing using sparse local mod­els of image appear­ance. These mod­els, where small image patches are rep­re­sented as lin­ear com­bi­na­tions of a few ele­ments drawn from some large set (dic­tio­nary) of can­di­dates, have proven well adapted to sev­eral image restora­tion tasks. A key to their suc­cess has been to learn dic­tio­nar­ies adapted to the recon­struc­tion of small image patches. In con­trast, recent works have pro­posed instead to learn dic­tio­nar­ies which are not only adapted to data recon­struc­tion, but also tuned for a spe­cific task. We intro­duce here such an approach to deblur­ring and dig­i­tal zoom, using pairs of blurry/​sharp (or low-​​/​high-​​resolution) images for train­ing, as well as an effec­tive sto­chas­tic gra­di­ent algo­rithm for solv­ing the cor­re­spond­ing opti­miza­tion task. Although this learn­ing prob­lem is not con­vex, once the dic­tio­nar­ies have been learned, the sharp/​high-​​resolution image can be recov­ered via con­vex opti­miza­tion at test time. Exper­i­ments with syn­thetic and real data demon­strate the effec­tive­ness of the pro­posed approach, lead­ing to state-​​of-​​the-​​art per­for­mance for non-​​blind image deblur­ring and dig­i­tal zoom.”

    image-​​processing image-​​analysis algo­rithms nudge-​​targets hyper­res­o­lu­tion
  • [1110.0477] Dis­trib­uted Evo­lu­tion­ary Graph Partitioning

    “We present a novel dis­trib­uted evo­lu­tion­ary algo­rithm, KaFF­PaE, to solve the Graph Par­ti­tion­ing Prob­lem, which makes use of KaFFPa (Karl­sruhe Fast Flow Par­ti­tioner). The use of our mul­ti­level graph par­ti­tioner KaFFPa pro­vides new effec­tive crossover and muta­tion oper­a­tors. By com­bin­ing these with a scal­able com­mu­ni­ca­tion pro­to­col we obtain a sys­tem that is able to improve the best known par­ti­tion­ing results for many inputs in a very short amount of time. For exam­ple, in Walshaw’s well known bench­mark tables we are able to improve or recom­pute 76% of entries for the tables with 1%, 3% and 5% imbalance.”

    algo­rithms graph-​​theory evolutionary-​​algorithms nudge-​​targets
  • [1104.1605] Effi­cient Top-​​K Retrieval in Online Social Tag­ging Networks

    “We con­sider in this paper top-​​k query answer­ing in social tag­ging sys­tems, also known as folk­sonomies. This prob­lem requires a sig­nif­i­cant depar­ture from exist­ing, socially agnos­tic tech­niques. In a network-​​aware con­text, one can (and should) exploit the social links, which can indi­cate how users relate to the seeker and how much weight their tag­ging actions should have in the result build-​​up. We pro­pose an algo­rithm that has the poten­tial to scale to cur­rent appli­ca­tions. While the prob­lem has already been con­sid­ered in pre­vi­ous lit­er­a­ture, this was done either under strong sim­pli­fy­ing assump­tions or under choices that can­not scale to even moderate-​​size real world appli­ca­tions. We first con­sider a key aspect of the prob­lem, which is access­ing the clos­est or most rel­e­vant users for a given seeker. We describe how this can be done on the fly (with­out any pre-​​computations) for sev­eral pos­si­ble choices — arguably the most nat­ural ones — of prox­im­ity com­pu­ta­tion in a user net­work. Based on this, our top-​​k algo­rithm is sound and com­plete, while address­ing the scal­a­bil­ity issues of the exist­ing ones. Impor­tantly, our tech­nique is instance opti­mal in the case when the search relies exclu­sively on the social weight of tag­ging actions. To fur­ther reduce response times, we then con­sider direc­tions for effi­ciency by approx­i­ma­tion. Exten­sive exper­i­ments on real world data show that our tech­niques can dras­ti­cally improve the response time, with­out sac­ri­fic­ing precision.”

    folk­son­omy tag­ging network-​​theory search-​​algorithms nudge-​​targets data-​​access

Items of some interest…

These are my recent Pin​board​.in links:

  • Hebrew Typog­ra­phy

    beau­ti­ful lettering

    typog­ra­phy hebrew graphic-​​design cal­lig­ra­phy let­ter­ing
  • [1110.5376] A Quan­ti­ta­tive Test of Pop­u­la­tion Genet­ics Using Spatio-​​Genetic Pat­terns in Bac­te­r­ial Colonies

    “It is widely accepted that pop­u­la­tion genet­ics the­ory is the cor­ner­stone of evo­lu­tion­ary analy­ses. Empir­i­cal tests of the the­ory, how­ever, are chal­leng­ing because of the com­plex rela­tion­ships between space, dis­per­sal, and evo­lu­tion. Crit­i­cally, we lack quan­ti­ta­tive val­i­da­tion of the spa­tial mod­els of pop­u­la­tion genet­ics. Here we com­bine ana­lyt­ics, on and off-​​lattice sim­u­la­tions, and exper­i­ments with bac­te­ria to per­form quan­ti­ta­tive tests of the the­ory. We study two bac­te­r­ial species, the gut microbe Escherichia coli and the oppor­tunis­tic pathogen Pseudomonas aerug­i­nosa, and show that spatio-​​genetic pat­terns in colony biofilms of both species are accu­rately described by an exten­sion of the one-​​dimensional stepping-​​stone model. We use one empir­i­cal mea­sure, genetic diver­sity at the colony periph­ery, to para­me­ter­ize our mod­els and show that we can then accu­rately pre­dict another key vari­able: the degree of short-​​range cell migra­tion along an edge. More­over, the model allows us to esti­mate other key para­me­ters includ­ing effec­tive pop­u­la­tion size (den­sity) at the expan­sion fron­tier. While our exper­i­men­tal sys­tem is a sim­pli­fi­ca­tion of nat­ural micro­bial com­mu­nity, we argue it is a proof of prin­ci­ple that the spa­tial mod­els of pop­u­la­tion genet­ics can quan­ti­ta­tively cap­ture organ­is­mal evolution.”

    bacterial-​​genetics evo­lu­tion micro­bi­ol­ogy exper­i­ment cute
  • NDFD Data­base Contents

    “You can access NDFD ele­ments via file trans­fer pro­to­col (ftp), http, eXten­si­ble Markup Lan­guage (XML), or web browser. Links to the data, sup­port­ing infor­ma­tion and soft­ware are listed below:…”

    weather data raw-​​data-​​now government2.0 nudge-​​targets ref­er­ence fore­casts
  • The Per­for­ma­tiv­ity of Net­works — Kieran Healy

    “The “per­for­ma­tiv­ity the­sis” is the claim that parts of con­tem­po­rary eco­nom­ics and finance, when car­ried out into the world by pro­fes­sion­als and pop­u­lar­iz­ers, refor­mat and reor­ga­nize the phe­nom­ena they pur­port to describe, in ways that bring the world into line with the­ory. Prac­ti­cal tech­nolo­gies, cal­cu­la­tive devices and portable algo­rithms give actors tools to imple­ment par­tic­u­lar mod­els of action. I argue that social net­work analy­sis is per­for­ma­tive in the same sense as the cases stud­ied in this lit­er­a­ture. Social net­work analy­sis and finance the­ory are sim­i­lar in key aspects of their devel­op­ment and effects. For the case of eco­nom­ics, evi­dence for weaker ver­sions of the per­for­ma­tiv­ity the­sis in quite good, and the strong for­mu­la­tion is cir­cum­stan­tially sup­ported. Net­work the­ory eas­ily meets the evi­den­tial thresh­old for the weaker ver­sions; I offer empir­i­cal exam­ples that sup­port the strong (or “Bar­ne­sian”) for­mu­la­tion. Whether these par­al­lels are a mark in favor of the the­sis or a strike against it is an open ques­tion. I argue that the social net­work tech­nolo­gies and mod­els now being “per­formed” build out sys­tems of gen­er­al­ized reci­procity, con­nec­tiv­ity, and commons-​​based pro­duc­tion. This is in con­trast both to an ear­lier net­work imagery that empha­sized self-​​interest and entre­pre­neur­ial exploita­tion of struc­tural oppor­tu­ni­ties, and to the model of action typ­i­cally con­sid­ered to be per­formed by eco­nomic technologies.”

    network-​​theory network-​​culture eco­nom­ics cultural-​​dynamics theory-​​and-​​practice-​​sitting-​​in-​​a-​​tree

Items of some interest…

These are my recent Pin​board​.in links:

Items of some interest…

These are my recent Pin​board​.in links:

  • Vaguery/​collegeJournalOfMedicalScienceSeptember1857 — GitHub

    “I’ve scanned pages of the Sep­tem­ber 1857 issue of The Col­lege Jour­nal of Med­ical Sci­ence, OCRed these to html (to pre­serve some for­mat­ting) with ABBYY FineReader 9, and am con­vert­ing those html files into LaTeX files. The lat­est PDF ver­sion (as con­structed on my com­puter) can be down­loaded (via “view raw”) from http://​github​.com/​V​a​g​u​e​r​y​/​c​o​l​l​e​g​e​J​o​u​r​n​a​l​O​f​M​e​d​i​c​a​l​S​c​i​e​n​c​e​S​e​p​t​e​m​b​e​r​1​8​5​7​/​b​l​o​b​/​m​a​s​t​e​r​/​w​o​r​k​.​pdf A col­lec­tion of scripts and check­lists is com­ing out of this: scripts to do the heavy lift­ing trans­lat­ing self-​​contained HTML to TeX intended to be strung together into a sin­gle work, and check­lists of small proof­read­ing and hand-​​formatting tasks that need to be com­pleted on each page. The indi­vid­ual page TeX files are stitched together and type­set in ./work.tex, using XeLa­TeX. Be sure to check the font assign­ments; I’m using pur­chased post­script fonts I own.”

    project type­set­ting dig­i­ti­za­tion exper­i­ment proof­read­ing