These are my recent Pinboard.in links:
- A little color theory for you
hsl hsv lab colorbrewer design graphic-design color-theory color visualization via:nelson
These are my recent Pinboard.in links:
These are my recent Pinboard.in links:
[1112.1116] Approximating the Diameter of Planar Graphs in Near Linear Time
“We present a $(1+epsilon)$-approximation algorithm running in $O(f(epsilon)cdot n log^2 n)$ time for finding the diameter of an undirected planar graph with non-negative edge lengths.”[1111.3996] Complexity of the path avoiding forbidden pairs problem revisited
“Let G = (V, E) be a directed acyclic graph with two distinguished vertices s, t and let F be a set of forbidden pairs of vertices. We say that a path in G is safe, if it contains at most one vertex from each pair {u, v} in F. Given G and F, the path avoiding forbidden pairs (PAFP) problem is to find a safe s-t path in G. We systematically study the complexity of different special cases of the PAFP problem defined according to the mutual positions of fobidden pairs. Fix one topological ordering of vertices; we say that pairs {u, v} and {x, y} are disjoint, if u, v < x, y, nested, if u < x, y < v, and halving, if u < x < v < y. The PAFP problem is known to be NP-hard in general or if no two pairs are disjoint; we prove that it remains NP-hard even when no two forbidden pairs are nested. On the other hand, if no two pairs are halving, the problem is known to be solvable in cubic time. We simplify and improve this result by showing an O(M(n)) time algorithm, where M(n) is the time to multiply two n times n boolean matrices.”These are my recent Pinboard.in links:
[1110.0585] Discriminately Decreasing Discriminability with Learned Image Filters
“In machine learning and computer vision, input images are often filtered to increase data discriminability. In some situations, however, one may wish to purposely decrease discriminability of one classification task (a “distractor” task), while simultaneously preserving information relevant to another (the task-of-interest): For example, it may be important to mask the identity of persons contained in face images before submitting them to a crowdsourcing site (e.g., Mechanical Turk) when labeling them for certain facial attributes. Another example is inter-dataset generalization: when training on a dataset with a particular covariance structure among multiple attributes, it may be useful to suppress one attribute while preserving another so that a trained classifier does not learn spurious correlations between attributes. In this paper we present an algorithm that finds optimal filters to give high discriminability to one task while simultaneously giving low discriminability to a distractor task. We present results showing the effectiveness of the proposed technique on both simulated data and natural face images.”[1109.5229] Distributed Algorithms for Optimal Power Flow Problem
“Optimal power flow (OPF) is an important problem for power generation and it is in general non-convex. With the employment of renewable energy, it will be desirable if OPF can be solved very efficiently so its solution can be used in real time. With some special network structure, e.g. trees, the problem has been shown to have a zero duality gap and the convex dual problem yields the optimal solution. In this paper, we propose a primal and a dual algorithm to coordinate the smaller subproblems decomposed from the convexified OPF. We can arrange the subproblems to be solved sequentially and cumulatively in a central node or solved in parallel in distributed nodes. We test the algorithms on IEEE radial distribution test feeders, some random tree-structured networks, and the IEEE transmission system benchmarks. Simulation results show that the computation time can be improved dramatically with our algorithms over the centralized approach of solving the problem without decomposition, especially in tree-structured problems. The computation time grows linearly with the problem size with the cumulative approach while the distributed one can have size-independent computation time.”Coupon collector’s problem — Wikipedia, the free encyclopedia
“In probability theory, the coupon collector’s problem describes the “collect all coupons and win” contests. It asks the following question: Suppose that there are n coupons, from which coupons are being collected with replacement. What is the probability that more than t sample trials are needed to collect all n coupons? An alternative statement is: Given n coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once. The mathematical analysis of the problem reveals that the expected number of trials needed grows as Θ(nlog(n)). For example, when n = 50 it takes about 225 trials to collect all 50 coupons.”[1109.5641] Energy-Aware Load Balancing in Content Delivery Networks
“Internet-scale distributed systems such as content delivery networks (CDNs) operate hundreds of thousands of servers deployed in thousands of data center locations around the globe. Since the energy costs of operating such a large IT infrastructure are a significant fraction of the total operating costs, we argue for redesigning CDNs to incorporate energy optimizations as a first-order principle. We propose techniques to turn off CDN servers during periods of low load while seeking to balance three key design goals: maximize energy reduction, minimize the impact on client-perceived service availability (SLAs), and limit the frequency of on-off server transitions to reduce wear-and-tear and its impact on hardware reliability. We propose an optimal offline algorithm and an online algorithm to extract energy savings both at the level of local load balancing within a data center and global load balancing across data centers. We evaluate our algorithms using real production workload traces from a large commercial CDN. Our results show that it is possible to reduce the energy consumption of a CDN by more than 55% while ensuring a high level of availability that meets customer SLA requirements and incurring an average of one on-off transition per server per day. Further, we show that keeping even 10% of the servers as hot spares helps absorb load spikes due to global flash crowds with little impact on availability SLAs. Finally, we show that redistributing load across proximal data centers can enhance service availability significantly, but has only a modest impact on energy savings.”[1110.0264] Face Recognition using Optimal Representation Ensemble
“Recently, the face recognizers based on linear representations have been shown to deliver state-of-the-art performance. In real-world applications, however, face images usually suffer from expressions, disguises and random occlusions. The problematic facial parts undermine the validity of the linear-subspace assumption and thus the recognition performance deteriorates significantly. In this work, we address the problem in a learning-inference-mixed fashion. By observing that the linear-subspace assumption is more reliable on certain face patches rather than on the holistic face, some Bayesian Patch Representations (BPRs) are randomly generated and interpreted according to the Bayes’ theory. We then train an ensemble model over the patch-representations by minimizing the empirical risk w.r.t the “leave-one-out margins”. The obtained model is termed Optimal Representation Ensemble (ORE), since it guarantees the optimality from the perspective of Empirical Risk Minimization. To handle the unknown patterns in test faces, a robust version of BPR is proposed by taking the non-face category into consideration. Equipped with the Robust-BPRs, the inference ability of ORE is increased dramatically and several record-breaking accuracies (99.9% on Yale-B and 99.5% on AR) and desirable efficiencies (below 20 ms per face in Matlab) are achieved. It also overwhelms other modular heuristics on the faces with random occlusions, extreme expressions and disguises. Furthermore, to accommodate immense BPRs sets, a boosting-like algorithm is also derived. The boosted model, a.k.a Boosted-ORE, obtains similar performance to its prototype. Besides the empirical superiorities, two desirable features of the proposed methods, namely, the training-determined model-selection and the data-weight-free boosting procedure, are also theoretically verified.”[1110.0957] Dictionary Learning for Deblurring and Digital Zoom
“This paper proposes a novel approach to image deblurring and digital zooming using sparse local models of image appearance. These models, where small image patches are represented as linear combinations of a few elements drawn from some large set (dictionary) of candidates, have proven well adapted to several image restoration tasks. A key to their success has been to learn dictionaries adapted to the reconstruction of small image patches. In contrast, recent works have proposed instead to learn dictionaries which are not only adapted to data reconstruction, but also tuned for a specific task. We introduce here such an approach to deblurring and digital zoom, using pairs of blurry/sharp (or low-/high-resolution) images for training, as well as an effective stochastic gradient algorithm for solving the corresponding optimization task. Although this learning problem is not convex, once the dictionaries have been learned, the sharp/high-resolution image can be recovered via convex optimization at test time. Experiments with synthetic and real data demonstrate the effectiveness of the proposed approach, leading to state-of-the-art performance for non-blind image deblurring and digital zoom.”[1110.0477] Distributed Evolutionary Graph Partitioning
“We present a novel distributed evolutionary algorithm, KaFFPaE, to solve the Graph Partitioning Problem, which makes use of KaFFPa (Karlsruhe Fast Flow Partitioner). The use of our multilevel graph partitioner KaFFPa provides new effective crossover and mutation operators. By combining these with a scalable communication protocol we obtain a system that is able to improve the best known partitioning results for many inputs in a very short amount of time. For example, in Walshaw’s well known benchmark tables we are able to improve or recompute 76% of entries for the tables with 1%, 3% and 5% imbalance.”[1104.1605] Efficient Top-K Retrieval in Online Social Tagging Networks
“We consider in this paper top-k query answering in social tagging systems, also known as folksonomies. This problem requires a significant departure from existing, socially agnostic techniques. In a network-aware context, one can (and should) exploit the social links, which can indicate how users relate to the seeker and how much weight their tagging actions should have in the result build-up. We propose an algorithm that has the potential to scale to current applications. While the problem has already been considered in previous literature, this was done either under strong simplifying assumptions or under choices that cannot scale to even moderate-size real world applications. We first consider a key aspect of the problem, which is accessing the closest or most relevant users for a given seeker. We describe how this can be done on the fly (without any pre-computations) for several possible choices — arguably the most natural ones — of proximity computation in a user network. Based on this, our top-k algorithm is sound and complete, while addressing the scalability issues of the existing ones. Importantly, our technique is instance optimal in the case when the search relies exclusively on the social weight of tagging actions. To further reduce response times, we then consider directions for efficiency by approximation. Extensive experiments on real world data show that our techniques can drastically improve the response time, without sacrificing precision.”These are my recent Pinboard.in links:
These are my recent Pinboard.in links: