Projects:graph openstack

From IIIS-Systems
Jump to: navigation, search

项目名称

用Graph的方法检测Openstack故障问题

项目介绍

It is hard to operate and debug systems like OpenStack that integrate many independently developed modules with multiple levels of abstractions. A major challenge is to navigate through the complex dependencies and relationships of the states in different modules or subsystems, to ensure the correctness and consistency of these states. We present a system that captures the runtime states and events from the entire OpenStack-Ceph stack, and automatically organizes these data into a graph that we call system operation state graph. With SOSG we can use intuitive graph traversal techniques to solve problems like reasoning about the state of a virtual machine. Also, using graph-based anomaly detection, we can automatically discover hidden problems in OpenStack. We have a scalable implementation of SOSG, and evaluate the approach on a 125-node production OpenStack cluster, finding a number of interesting problems.

参与人员

Yong Xiang

Wei Xu

相关资料

Xy1.jpg

File:XY.pdf

项目进展

2014年11月15日

项目开始

2016年8月5日

In Proceedings of ACM SIGOPS Asia-Pacific Workshop on Systems (APSys'16) [BEST PAPER AWARD] Hong Kong, China, 2016

相关文献

[1] https://www.openstack.org/

[2] https://www.rabbitmq.com/

[3] https://libvirt.org/

[4] http://www.linux-kvm.org/page/Main Page [5] http://openvswitch.org/

[6] http://ceph.com/

[7] Xu W, Huang L, Fox A, et al. “Detecting large-scale system problems by mining console logs[C].” Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009: 117-132.

[8] Yuan, Ding, et al. “SherLog: error diagnosis by connecting clues from run-time logs.” ACM SIGARCH computer architec- ture news. Vol. 38. No. 1. ACM, 2010.

[9] Ranking I A A. “Finding Patterns in Static Analysis Alerts[J]”. 2014.

[10] http://neo4j.com/

[11] Gonzalez, Joseph E., et al. “Graphx: Graph processing in a distributed dataflow framework.” 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014.

[12] Turner D, Levchenko K, Snoeren A C, et al. “California fault lines: understanding the causes and impact of network failures[J]”. ACM SIGCOMM Computer Communication Review, 2011, 41(4): 315-326.

[13] Mace, Jonathan, Ryan Roelke, and Rodrigo Fonseca. “Pivot tracing: dynamic causal monitoring for distributed systems.” Proceedings of the 25th Symposium on Operating Systems Principles. ACM, 2015.