Burning Bitcoins for Censorship Resistance

 : : : : : : : : : : : : : : v LIST OF FIGURES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : vi ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : vii CHAPTER 1. INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Scope : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.2 Contribution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 CHAPTER 2. BACKGROUND AND RELEVANT LITERATURE : : : : : 6 2.1 Existing censorship circumvention techniques : : : : : : : : : : : : : 6 2.1.1 DynaWeb : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.1.2 UltraReach : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.1.3 Message in the bottle : : : : : : : : : : : : : : : : : : : : : : 9 2.1.4 Tor : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 Public Key Steganography : : : : : : : : : : : : : : : : : : : : : : : 12 2.3 Bootstraping Mechanism : : : : : : : : : : : : : : : : : : : : : : : : 14 2.3.1 TELEX : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 2.3.2 Cirripedde : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 2.4 Bitcoin Network : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 2.4.1 Bitcoin Blockchain : : : : : : : : : : : : : : : : : : : : : : : 17 2.4.2 Bitcoin Transactions : : : : : : : : : : : : : : : : : : : : : : 19 2.5 Cryptography Assumptions : : : : : : : : : : : : : : : : : : : : : : 25 2.5.1 Cryptographic Hash Function : : : : : : : : : : : : : : : : : 26 2.5.2 Di_e-Hellman Assumption : : : : : : : : : : : : : : : : : : : 26 CHAPTER 3. CONCEPT : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28 3.1 System Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28 3.2 Threat Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28 3.3 Goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29 3.4 Design : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 CHAPTER 4. FRAMEWORK AND METHODOLOGY : : : : : : : : : : : 32 4.1 Storing Data in Bitcoin Transaction : : : : : : : : : : : : : : : : : : 32 4.2 Encryption Scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : 35 4.3 Implementation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 36 CHAPTER 5. SECURITY ANALYSIS : : : : : : : : : : : : : : : : : : : : : 41 iv Page CHAPTER 6. FUTURE WORK : : : : : : : : : : : : : : : : : : : : : : : : 43 6.1 Associating with existing anti-censorship systems : : : : : : : : : : 43 6.2 Other crypto-currency systems : : : : : : : : : : : : : : : : : : : : : 44 CHAPTER 7. SUMMARY : : : : : : : : : : : : : : : : : : : : : : : : : : : 46 BIBLIOGRAPHY : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 47 v LIST OF TABLES 2.1 List of opcodes [53] : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 4.1 MultiSig Transaction: Parameter 2 [43] : : : : : : : : : : : : : : : : : : 33 4.2 MultiSig Transaction: Parameter 1 [43] : : : : : : : : : : : : : : : : : : 34 4.3 Comparison between data storing/hiding techniques : : : : : : : : : : : 37 vi LIST OF FIGURES 1.1 Simple illustration of Censorship Circumvention : : : : : : : : : : : : : 2 2.1 Bitcoin Blockchain : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 2.2 Bitcoin Transaction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21 3.1 Protocol design : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 4.1 Transaction rate per second : : : : : : : : : : : : : : : : : : : : : : : : 39 vii ABSTRACT This thesis proposes a new technique to circumvent Internet censorship by using the bitcoin network. It provides a bootstrapping mechanism for existing circumvention tools, that enable access to the restricted information and communication over the Internet, at the cost of “burning” bitcoins. It deploys the concept of public key steganography to hide data in the bitcoin transactions and make them look as legitimate transactions hence, making it impossible for the censor to detect. It aims at designing an e_cient cryptographic scheme to store messages in the bitcoin blockchain with a strong focus on space e_ciency and implement the scheme to design protocol for steganographic communication. 1 CHAPTER 1. INTRODUCTION The Article 19 of the Universal Declaration of Human rights adopted by the United Nations General Assembly in the year 1948 states that – \Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers” [1]. Today, the invasion of Internet over the information landscape has completely changed the way we seek, receive and impart information. And with this growth of Internet there have been attempts made to control the use of it. The authority who controls the Internet and the censorship of its content can be wide ranging driven by various factors including social, political or economic. A prominent example of political inuence is the censorship of Internet in various countries of the Arab League after the events of Arab Spring [2]. Meanwhile, China has been a strong propagator of Internet censorship on its citizens since 1994 and had deployed a very e_ective tool called the “Great Firewall of China” [3] for that purpose. Free exchange of information and ideas through Internet is also a_ected by the religious ideologies of various authorities. For example, in Saudi Arabia all Internet tra_c _rst goes through a government run _lter which, according to the country’s Internet Services Unit, blocks all material of an “o_ensive or harmful nature to the society, and which violate the tenants of the religion or societal norms” [4]. Apart from this, blocking various tools and applications of the Internet that can be used to assist users in accessing and sharing sensitive material is common in many countries [5], this includes blocking very popular platforms like Youtube, Facebook, Twitter, Google and many more. Despite all these restrictions imposed on the free use of Internet, there exist numerous technologies that can be used under di_erent circumstances to circumvent 2 Figure 1.1. Simple illustration of Censorship Circumvention The _gures shows the various components involved the process of censorship circumvention.There is a user inside the censored network which is trying to access a blocked destination with the help of a forwarder. the censorship. Some of the simple and trivial methods include { cached pages, mirror and archive sites, Web to E{mail services, RSS aggregators, Alternate IP addresses and Domain names, Proxy websites, VPNs, etc.. Other forms of sophisticated and more complex tools for circumventing Internet censorship include encrypted tunnels and proxies, such as Dynaweb [6], Ultrasurf [7] and Tor [8]. Figure 1.1 gives a simple demonstration of how a censorship circumvention works. Here, we have a user whose tra_c is being monitored and blocked by a censor. The user attempts to access a blocked destination by _rst establishing a connection with a forwarder, which is not blocked by the censor and can be accessed by the user. While these designs can be quite e_ective at sneaking client connections past the censor, these systems inevitably lead to a cat-and-mouse game in which the censor attempts to discover and block these services [9]. A classic example of this problem is the conict between TOR and the Great Firewall of China, which in numerous ways, appropriately represents the larger world of censorship and circumvention. A series of counter actions began when Tor- a tool initially meant to 3 be used for providing anonymity, became an e_ective tool for circumvention; And as a result, it was being used to evade the Great Firewall; The _rewall in return, blocked the Tor website and the servers that were part of its network. Tor responded by using mirrors [10], bridges [11] and secret entry servers [12]; the _rewall went a step ahead and started identifying Tor by protocol features. Tor deployed protocol encapsulation; the _rewall started active scanning for Tor servers to which Tor responded by introducing protocols immune to scanning [13]. To overcome the above mentioned problem, this thesis proposes a technique to circumvent Internet censorship using the Bitcoin Network and public key steganography which is based on providing a bootstrapping mechanism for existing circumvention techniques. A similar technique has been previously used by a circumvention tool called Telex [9]. Telex operates as an end-to-middle proxy that has no IP address and is located withing the network infrastructure,e.g. in an ISP, and uses the public key steganography to bypass the censor. Bitcoin maintains a public append-only log called the block chain, which is essentially stored by every full Bitcoin node in the world. This makes transactions in the Bitcoin blockchain di_cult to censor under the assumption that blocking read access to the Bitcoin network comes with (economic) cost for the censor. Furthermore, data written to the blockchain cannot be removed, because it is stored and replicated by a lot of Bitcoin nodes. This thesis proposes to design a cryptographic scheme that can be used to store messages in the bitcoin blockchain. Keeping in mind the lack of space to store data in the block chain, this scheme will give a strong emphasis on space-e_ciency. Since the transaction involve actual money and cost fees, this scheme will also inculcate an interesting trade-o_, i.e. burning bitcoins to get additional storage per transaction. The fact that the bitcoin transactions use the secp256k1 [14] signatures that are made by using ECDSA [15] cryptographic formula will help this scheme to be further used to design protocols for steganographic communication over the blockchain. This steganographic communication can be used in particular for 4 bootstrapping access to other censorship circumvention systems, i.e using the steganographic channel to provide access to other existing censorship circumvention systems so that they cannot be detected by the censor and do not come under under the threat of getting blocked. 1.1 Scope This thesis proposes to design a cryptographic scheme that can be used to store messages in the bitcoin blockchain. Keeping in mind the lack of space to store data in the block chain, this scheme will give a strong emphasis on space-e_ciency. Since the transaction involve actual money and cost fees, this scheme will also inculcate an interesting trade-o_, i.e. burning bitcoins to get additional storage per transaction. The fact that the bitcoin transactions use the secp256k1 [14] signatures that are made by using ECDSA [15] cryptographic formula will help this scheme to be further used to design protocols for steganographic communication over the blockchain. This steganographic communication can be used in particular for bootstrapping access to other censorship circumvention systems, i.e using the steganographic channel to provide access to other existing censorship circumvention systems so that they cannot be detected by the censor and do not come under under the threat of getting blocked. 1.2 Contribution Michael Carl Tschantz, Sadia Afroz and Vern Paxson in their work SoK:Towards Grounding Censorship Circumvention in Empiricism claim that”system designers tend to emphasize censor capabilities that may become important in the future, but not seen in practice today, with little assessment on actual detection techniques used by current censors” [13]. Based on their research it can be concluded that the attacks carried out by real censors target the process of discovering and setting up channels , however, researchers tend to align their 5 research on channel usage. Also, the means adapted by the real sensors include cheap passive monitoring or active probing, contrary to it, the researchers have been digging deep into complex monitoring and tra_c manipulations. In simple words, censors(almost) only try to block identity distribution (e.g. bridge addresses, public keys)and connection establishment but academic work often focuses on all the other complicated aspects. This indeed provides a very strong motivation for our project as our bootstrapping mechanism aims at providing a very e_ective solution to the problem of setting up a channel for censorship free exchange of information over the Internet. 6 CHAPTER 2. BACKGROUND AND RELEVANT LITERATURE 2.1 Existing censorship circumvention techniques There is a list of 7 dimensions for an anti-censorship de_ned in ‘A Taxonomy of Internet Censorship and Anti Censorship’ [16]. These include: _ Availability: The anti-censorship technology should be readily accessible to the users. _ User friendliness: This is an often under-explored dimension [16] as many a times we tend to ignore the large population of users that are not technology-savvy. _ Veri_ability: the user should be able to verify that the software that they are using is not a monitoring tool. _ Scope: modes of communication that are being covered. _ Security: The success of an anti-censorship technology is highly dependent on the level of security it provides. _ Deniability: the user should be able to deny his involvement, if caught. _ Performance: throughput and delay caused by the use of this technology. Apart from these seven dimensions, the goals of an anti-censorship technology should include unobservability and undetectability. Undetectability of an item of interest (IOI) from an attacker’s perspective means that the attacker cannot su_ciently distinguish whether it exists or not [17]. Unobservability of an item of interest (IOI) means: 7 _ Undetectability of the IOI against all subjects uninvolved in it and _ Anonymity of the subject(s) involved in the IOI even against the other subject(s) involved in that IOI [17]. Anonymity of a subject from an attacker’s perspective means that the attacker cannot su_ciently identify the subject within a set of subjects, the anonymity set [17]. In this Section, we discuss some of the most prominent tools that are build upon these dimensions and goals and are being used to circumvent Internet censorship. These include Dynaweb, UltraReach, Tor and TELEX. We look at how these existing tools work and what are their advantages and disadvantages over each other. 2.1.1 DynaWeb DynaWeb FreeGate is an HTTP proxying tool designed speci_cally for circumvention use in China. It is a product of Dynamic Internet Technology Inc. (DIT). DIT was founded originally in 2001 to provide email delivery services to China for U.S. government agencies and NGOs. In 2002, DIT started to provide anti-censorship services under the framework of DynaWeb and became a top contender of the Great Firewall penetration e_ort [18]. The DynaWeb project hosts its own collection of proxy servers and uses a variety of proprietary methods to keep its proxy servers unblocked in the face of aggressive, ongoing attempts to block them. DynaWeb FreeGate provides an easy to use interface that successfully bypasses most types of _ltering in the face of active blocking attempts. Today DynaWeb o_ers the widest range of options for users to access Internet freely, and supports more than 50 million web hits per day on average from Chinese users alone [18]. It is a web-based portal and requires no use of any software, nor any changes to the settings on the user’s system. The DynaWeb needs to work in a very dynamic way in order to avoid the interference by Chinese authorities. They keep a 8 continuous watch on the Dynaweb’s portal websites and block them as soon as they are identi_ed. DynaWeb also tackles the issue of IP blocking and DNS hijacking by maintaining hundreds of mirror sites at any time, with each one of them having a di_erent IP and DNS domain name. To keep users connected to such a dynamic infrastructure, DynaWeb has a variety of channels to keep users updated. For example, a user can send a message to one of DynaWeb’s instant messenger (IM) accounts, and will get an instant reply showing the newest addresses of DynaWeb portals. Similar things are being done with emails. By these many, dynamic channels, DynaWeb outsmarts any attempt to collect all DynaWeb addresses by the censors, because each user receives only a (di_erent) subset of DynaWeb’s addresses [18]. Having said that, DynaWeb is de_nelty not the perfect tools available to us as it has a very poor performance for image heavy sites but okay for simpler text oriented sites. Most importantly, the DynaWeb tool uses an error prone encryption mechanism that fails to encrypt some of its tra_c, allowing snooping by the network and making the tool vulnerable to keyword block _lters. [19] 2.1.2 UltraReach UltraReach has been developing anticensorship technologies since the year 2002. It uses its unique Global Internet Freedom platform to expand its o_erings and its anticensorship tool is called UltraSurf. UltraSurf also uses HTTP proxying and is designed speci_cally for circumvention use in China. More commonly known by its Chinese name ‘Wujie’- which means borderless, it has become very popular among the Internet users in China. UltraSurf provides an easy to use interface that is able to access all content in the face of active blocking attempts. UltraSurf’s performance is the best of any tool tested in _ltering countries, the only tool to display okay speed for both image heavy and simple, text oriented sites [19]. UltraSurf has its own criteria for blocking o_ensive material and for that purpose uses its own _lters. UltraSurf can be recommended for widespread use as the best 9 performing of all the tested tools, though users concerned about anonymity should be warned to disable browser support for active content. UltraSurf has evolved from an enduring conict between GreatFirewall and UltraReach and is considered to be one of the most robust tools available today. The freely available software has been analyzed, mutilated and spoofed, and the supporting network infrastructure has been constantly attacked. Without doubt, these facTors have accelerated UltraSurf reaching its level of sophistication and fame [18]. The latest version of UtraSurf, deploys a complex proxy system which provides thorough transparency and uses a high level encryption but can only be used on Microsoft Internet Explorer Platform. It has strong support for load balancing and fault tolerance, and it even employs a decoying mechanism to thwart any tracing e_ort of its communication with its infrastructure. [18] 2.1.3 Message in the bottle “Message In A Bottle: Sailing Past Censorship” [20] follows a very di_erent approach to circumvent Internet censorship. If A is a user inside the censored network and B is the blocked destination it wants to communicate to, then the protocol is designed in such a way that A knows the least about B and there is no rendezvous point between them. This is done to prevent the censor from discovering the entry points, which is a very fundamental problem faced by existing censorship-resistant tools. The protocol uses the concepts of blog pings [21] and digital photo metadata [22] to circumvent censorship. Blog ping is a mechanism which is used to notify the server of updated or new content on a blog. These pings are messages sent to the centralized network server from a blog. Introduced in 2001, its popularity has been on a rise and had lead to the development of various protocols that compete for performance and scalability [20]. Some of these include FriendFeed’s SUP [23] , Google’s PubSubHubbub [24] and rssCloud [25]. Blog pings provide an e_ective way for the 10 search engines to index new content in real time. Hence, the use of pings helps the blog writers increase their exposure. As a result, almost all the blogging platforms support this mechanism. Digital Photo Metadata is information that Digital cameras have embedded within each photo. This metadata keeps records of a variety of information, including the condition of the shot, GPS coordinates, user comments, copyright, post-processing, and thumbnails. Using these two components, The message in the bottle (MIAB) protocol works as follows [20]: _ A authors a blog post of arbitrary content, trying to keep the content as innocuous as possible. _ A snaps one or more photos to include in the post. _ A uses the MIAB client software to embed a message M into the photos. The message is hidden using shared-key steganography [26]. The shared key is identi_ed as Ks and is chosen arbitrarily by A. _ Ks is encrypted with the public key of B Kp. This cipher-text is hidden in the photos’ metadata, using tags that appear inconspicuous to the Censor. _ A publishes the blog post, containing the processed photos. Alice can choose the blog arbitrarily, provided it supports blog pings. _ The blog emits a ping to some ping servers. _ Meanwhile, B is moniToring some of the ping servers, looking for steganographic content encrypted with his public key. Within minutes, he discovers A’s post, and decrypts the message. _ Bob reads the message, and acts upon its content. 2.1.4 Tor Tor is an anonymity tool that uses randomized, encrypted re-routing to hide the combination of the source and destination of each data packet from all parts of 11 the network, including the individual proxies. Tor randomly assigns a set of routers to each connection. No single Tor proxy knows the origin, the destination, or even what other proxies are involved in the network other than the ones immediately connected to it. Even though the primary purpose of the tool is to provide anonymity for the user, the tool also is an e_ective circumvention tool, since hiding of source and destination and encrypting the content e_ectively prevents _ltering. The Tor re-routing servers are hosted by volunteers. Anyone can host a Tor server { the anonymity of the connection relies not on the trustworthiness of the individual servers but rather on the network design, which prevents a given router from knowing both the origin and the destination or even which other routers it would need to cooperate with to get that information. Tor provides an easy to use interface that allows the user to bypass all tested forms of _ltering, though the performance for international connections is very poor for both image intensive and simpler text oriented sites [19]. Tor provides strong anonymity only if the user is careful to submit data to HTTPS protected servers. Tor can be recommended for widespread circumvention use, but only for expert users concerned about anonymity. These tools have so far managed to stay ahead of the sensors. However, there is a worry that such systems will not be able to withstand continued research and development on the part of censors (e.g., Sybil attacks for proxy IP discovery) [9]. The Great Firewall of China has been making continuous e_orts to stop Tor services in China. It took the _rst action against Tor in the year 2008, when it blocked the Tor website- “www.Torproject.org” [27]. In late 2009, the Great Firewall also blocked the IP addresses of direcTory authorities and relays [28] by using very simple measures – _rstly, download the list of all relays, which is publicly available, and add their addresses to a blacklist and then, dynamically block any access to those addresses whenever they are observed in a tra_c stream [13]. Tor responded to this by introducing Bridges [11] and Pluggable tranports [29]. Tor Bridges are Tor relays that have not been listed in the public direcTory. The ISP will mostly target and _lter the connection to all the known Tor relays and probably 12 won’t be able block the bridges as these are not listed in a public direcTory. A typical bridge entry consists of three parts: IP address of the bridge, Port number on which it is operating, and a unique identi_er of the bridge, often known as the _ngerprint of the bridge. Even though Tor bridges have provided a good alternative when the Tor relays have been blocked by the censor. But over the last few years, the continuous research and e_orts on the part of censors have enabled them to _nd new ways to block the bridges as well. They usually do this by installing special boxes at ISPs that peek into network tra_c, often called deep Packet Inspection [30], and detect Tor; when Tor is detected they block the tra_c ow [11]. The Great Firewall went after the bridges, enumerating them from the centralized bridge database one by one [31]. In order to tackle suck level of sophisticated censorship, Tor went a step ahead and started manipulating all the tra_c between the user and the _rst hop by using Pluggable Transports. These manipulate the tra_c in such a way that it is not identi_ed as Tor connection and is able to trick the censor/ISP. As a result, the censor cannot decide if the tra_c belongs to a Tor connection and is less likely to block it. However, even the use of pluggable transports is not the perfect solution as these are also vulnerable to detection. In the past, obfs and obfs2 were promoted as safe transports but these are now deprecated and were replaced by obfs3, scramblesuit, fte, and obfs4 [11]. Sadly, pluggable transports are not immune to detection, if a censor is given enough time. In the past, obfs and obfs2 were promoted as safe transports. These are now deprecated and were replaced by obfs3, scramblesuit, fte, and obfs4. 2.2 Public Key Steganography Steganography refers to the problem of sending messages hidden in \innocent looking” communications over a public channel so that an adversary eavesdropping on the channel cannot even detect the presence of the hidden messages [32]. 13 De_nition 2.2.1 Public-key Steganography: A public-key stegosystem is a triple of probabilistic algorithms S = (SG, SE, SD). SG(1k) generates a key pair (PK, SK) 2 PK x SK. SE takes a (public) key PK 2 PK, a string m 2 f0; 1g(the hiddentext), and a message history h. SE also has access to a channel oracle for some channel C, which can sample from Ch for any h. SE(PK, m, h) returns a sequence of documents s1; s2; :::; sl (the stegotext) from the support of Clh . SD takes a (secret) key SK 2 SK, a sequence of documentss1; s2; :::; sl, and a message history h, and returns a hiddentext m. Additionally, for every polynomial p there must exist a negligible _ such that 8m 2 f0; 1gp(k) : Pr(PK;SK) SG(1k)[SD(SK; SE(PK; m; h); h) = m] _ 1 _(k), where the randomization is also over any coin tosses of SE, SD, SG and the oracle to Ch. The de_nition above is the _rst important contribution in this _eld, which can be traced back to the model for steganography by Cachin [33] and some similar models [34{36]. However, these frameworks were restricted in a way that performing steganography between two parties was not possible without a shared key. Hopper, Langford, and von Ahn [37] came up with a theoretical framework that was based on computational security that addressed only the shared-key settings. Later Hopper and von Ahn [32] came up with a formal framework to prove that public key steganography is practically possible. They show that the public key steganographic key exchange is possible under the Decisional Di_e-Hellman [38] assumption and provide a protocol that is secure in the random oracle model against adversaries that have access to a decoding oracle. The de_nition given by them is stated below. De_nition 2.2.2 Stegosystem: A steganographic protocol, or stegosystem, is a pair of probabilistic algorithms S = (SE, SD). SE takes a key K 2 f0; 1gk, a string m 2 f0; 1g* (the hiddentext), a message history h, and an oracle M(h) which samples blocks according to a channel distribution Cbh . SEM(K, m, h) returns a sequence of blocks c1 k c2 k ::: k cl (the stegotext) from the support of Cl_b h . SD takes a key K, a 14 sequence of blocks c1 k c2 k ::: k cl, a message history h, and an oracle M(h), and returns a hiddentext m. There must be a polynomial p(k) > k such that SEM and SDM also satisfy the relationship: 8m; jmj < p(k) : Pr(SDM(K; SEM(K; m; h); h) = m) _ 2 3 , where the randomization is over any coin tosses of SEM; SDM; and M. 2.3 Bootstraping Mechanism As mentioned earlier, our scheme aims to provide a bootstrapping mechanism for anti-censorship systems. Here we have a look at two systems – Telex [9] and Cirripede [39] which use the concept of stegnography and provide an e_ective mechanism to allow access to blocked websites by working at the ISP level. 2.3.1 TELEX Telex operates in the network infrastructure at any ISP between the censor’s network and non-blocked portions of the Internet. Unlike following the typical \end-to-end” approach, it is an \end-to-middle” proxy with no IP address, located within the network infrastructure. Clients invoke the proxy by using public-key steganography to \tag” otherwise ordinary sessions destined for uncensored websites [9]. Telex applies this idea to a widely used encrypted transport, such as TLS, and carefully avoiding observable deviations from the behavior of non-proxied connections and can construct a service that allows users to robustly bypass network-level censorship without being detected. To test functionality of Telex on a real censor’s network, a Telex client was ran on a PlanetLab [40] node located in Beijing and attempted connections to each of the Alexa top 100 websites [41] using the model Telex station located at the University of Michigan. As a control, _rst these sites were loaded without using Telex and apparent censorship behavior for 17 of them was noted, including 4 from the top 10: facebook.com, youtube.com, 15 blogspot.com and twitter.com. The blocking techniques that were observed included forged RST packets, false DNS results, and destination IP black holes, which are consistent with previous _ndings [42] . Telex was able to load all 100 sites [9]. 2.3.2 Cirripedde Cirripede is an anti-censorship system that is used for unobservable communication with the blocked destinations. Similar to Telex, it is also deployed at the ISPs. It works by intercepting the connections to the allowed destinations, and then redirecting it to the blocked destination requested by the user. The concept of steganograhy is used here as well and the data is encoded in such a way that it is impossible for the censor to distinguish these communications from normal ones, without having the master secret key. The use of public key cryptography also ensures that that there is no need to share any secret information with the Cirripede user. It is a scalable system, which is designed to handle large volumes of tra_c and at the same time ensuring minimal overhead on ISPs and not disrupting the day to day tra_c. This allows Cirripede proxies to be strategically deployed at central locations, making access to Cirripede very di_cult to block. The following are the main components of the Cirripede anti-censorship system [39]: 1. Client(C): The user who wants to access a covert destination(CD). 2. Warden ISP: Network provider hosting C. 3. Covert Destination(CD): a website, access to which is blocked. 4. Overt destination(OD): any website, access to which is allowed. 5. participating ISP: an ISP who participates in Cirripede by deecting network tra_c to Cirripede servers. This ISP must be on the (forward) network path from C to OD. 16 6. Cirripede registration server (RS): it is used for registering clients who wish to avail Cirripede service. 7. Cirripede service proxy (SP): : a server that connects with CD and proxies communications with it over the overt communication stream between C and OD. 8. Deecting router (DR): an Internet router, owned by a participating ISP, that deects tra_c from a registered client C to the service proxy SP. The entire process that Cirripede follows consists of three major operations [39]: 1. Client registration: Before a client starts using Cirripede, he need to register _rst with Cirripede. The client uses a covert channel in TCP headers to show its willingness to register and establish a secret key. This is done because the tra_c is constantly monitored by the warden. The key generated is shared with Cirripede and is used later for communication. DR mirrors part of all TCP tra_c it sees to the registration server RS, which detects the covert registration message and instructs DR to redirect HTTPS tra_c from C to the service proxy SP. 2. Cover connection:C makes a TCP connection with the overt destination OD, which is deected by DR to SP. 3. Covert communication: C performs a TLS handshake with OD, which SP interposes upon. After the handshake, SP disconnects from OD and takes over the TLS connection, which is used to tunnel communication between C and CD. 17 2.4 Bitcoin Network In this section, we have a look at how the bitcoin network operates. We shed a light on two of its most important components- the bitcoin blockchain and the bitcoin transactions. Out of the two, the later one is of utmost signi_cance for us. 2.4.1 Bitcoin Blockchain The blockchain is Bitcoin’s public ledger, which is record of transactions maintained in an orderly manner with timestamps. Each full node in the Bitcoin network independently stores a block chain containing only blocks validated by that node. The nodes are considered to be in consensus all of them have the same blocks in their block chain, The validation rules these nodes follow to maintain consensus are called consensus rules. This section describes many of the consensus rules used by Bitcoin Core. Every block must include one or more transactions. The _rst one of these transactions must be a coinbase transaction, also called a generation transaction, which should collect and spend the block reward (comprised of a block subsidy and any transaction fees paid by transactions included in this block). All transactions, including the coinbase transaction, are encoded into blocks in binary raw transaction format. The raw transaction format is hashed to create the transaction identi_er (txid). From these txids, a merkle tree is constructed by pairing each txid with one other txid and then hashing them together.If there are an odd number of txids, the txid without a partner is hashed with a copy of itself.The resulting hashes themselves are each paired with one other hash and hashed together. Any hash without a partner is hashed with itself. The process repeats until only one hash remains, the merkle root [43].A Merkle tree is a hash based data structure that is a generalization of the hash list. It is a tree structure in which each leaf node is a hash of a block of data, and each non-leaf node is a hash of its children. Typically, Merkle trees have a branching factor of 2, meaning that each node has up to 2 children [44]. In case of blockchain, the leaves are almost always 18 Figure 2.1. Bitcoin Blockchain The _gure illustrates a simpli_ed version of a block chain. A block of one or more new transactions is collected into the transaction data part of a block. Copies of each transaction are hashed, and the hashes are then paired, hashed, paired again, and hashed again until a single hash remains, the merkle root of a merkle tree [43] transactions from a single block. The root node of a merkle tree is called a merkle root, a descendant of all the hashed pairs in the tree. Block headers must include a valid merkle root descended from all transactions in that block [45]. Proof-of-work: Bitcoin uses the concept of proof-of-work as a requirement for adding new sets of transactions to the transaction database. Proof-of-work is basically data that is di_cult to produce, in terms of cost and time, but easy to verify. Bitcoin uses the Hashcash [46] proof of work system. Hashcash proofs of work are used in Bitcoin for block generation. In order for a block to be accepted by network participants, miners must complete a proof of work which covers all of the data in the block. The di_culty of this work is adjusted so as to limit the rate at which new blocks can be generated by the network to one every 10 minutes. Due to 19 the very low probability of successful generation, this makes it unpredictable which worker computer in the network will be able to generate the next block [47]. The proof of work used in Bitcoin takes advantage of the apparently random nature of cryptographic hashes. A good cryptographic hash algorithm converts arbitrary data into a seemingly-random number. If the data is modi_ed in any way and the hash re-run, a new seemingly-random number is produced, so there is no way to modify the data to make the hash number predictable [43]. New blocks will only be added to the block chain if their hash is at least as challenging as a di_culty value expected by the consensus protocol. 2.4.2 Bitcoin Transactions A bitcoin is a chain of signed messages. Each owner transfers the coin to the next by digitally signing a hash of the previous transaction and the public key of the next owner and adding these to the end of the coin. A payee can verify the signatures to verify the chain of ownership [48]. Each transaction is constructed out of several parts which enable both simple direct payments and complex transactions. Each transaction has at least one input and one output. Each input spends the bitcoins paid to a previous output. Each output then waits as an Unspent Transaction Output (UTXO) until a later input spends it [43]. As shown in Fig 2.1, a transaction consists of four main parts: _ Version: There exist di_erent set of rules on which a transaction is validated. Version is a four-byte number, pre_xed to the transaction, which indicates the peers which set of rules to follow for that particular transaction. This is useful in the way thay new rules can be created for future transactions without invalidating the previous transactions. _ Inputs: refers to the data that is fed into the transaction. It further has three di_erent _eld: 20 1. Outpoint: A data structure used to refer to a particular transaction output, consisting of a 32-byte TXID and a 4-byte output index number (vout) [49] 2. Signature Script: Data generated by a spender which sets the conditions that must be ful_lled for the satoshis to be spent. This is always used as variables to satisfy a pubkey script. Signature Scripts are called scriptSig in code. [50] 3. Sequence Number: It is a number that was intended to allow an uncon_rmed time-locked transaction to be updated before being _nalized. However, currently its only use is to disable locktime in a transaction. [51] A number intended to allow uncon_rmed time-locked transactions to be updated before being _nalized; not currently used except to disable locktime in a transaction _ Outputs: The ouput of an transaction consists of two _elds: 1. Value _eld: It speci_es the amount of satoshis intended to be transfer via the transaction. 2. Pubkey Script: A script included in outputs which sets the conditions that must be ful_lled for those satoshis to be spent. Data for ful_lling the conditions can be provided in a signature script. Pubkey Scripts are called a scriptPubKey in code. [52] _ Locktime: Locktime indicates the earliest time a transaction can be added to the blockchain. This allows the users to create transactions which are time-locked and only become valid in the future. Hence, proving the signers a scope to cancel the transaction if they change their minds. A user must _rst generate a private/public key pair before a transaction cab be created. Bitcoin uses the Elliptic Curve Digital Signature Algorithm (ECDSA) [15] with secp256k1 [14] curve. Secp256k1 private keys are 256 bits of 21 Figure 2.2. Bitcoin Transaction The _gure above shows the structure of a bitcoin transaction, consisting of four main parts- version, inputs, outputs and locktime. The outputs of one transactions are used as inputs for the next transaction. [43] random data. A copy of that data is deterministically transformed into a secp256k1 public key. Because the transformation can be reliably repeated later, the public key does not need to be stored [43]. After this, the public key is cryptographically hashed. There is no need to store this public key hash as well, as it can also be reliably repeated later. The hash shortens and obfuscates the public key, making manual transcription easier and providing security against unanticipated problems which might allow reconstruction of private keys from public key data at some later point [43]. Public key hashes are always sent encoded as Bitcoin addresses, which are base58-encoded strings containing an address version number, the hash, and an error-detection checksum to catch typos. The address can be transmitted through any medium, including one-way mediums which prevent the spender from communicating with the receiver, and it can be further encoded into another format [43]. 22 Table 2.1. List of opcodes [53] Opcode Description OP RIPEMD160 The input is hashed using RIPEMD-160. OP SHA1 The input is hashed using SHA-1. OP SHA256 The input is hashed using SHA-256. OP HASH160 The input is hashed twice: _rst with SHA-256 and then with RIPEMD-160. OP HASH256 The input is hashed two times with SHA-256. OP CODESEPARATOR All of the signature checking words will only match signatures to the data after the most recently-executed OP CODESEPARATOR. OP CHECKSIG The entire transaction’s outputs, inputs, and script are hashed. The signature used by OP CHECKSIG must be a valid signature for this hash and public key. If it is, 1 is returned, 0 otherwise. OP CHECKSIGVERIFY Same as OP CHECKSIG, but OP VERIFY is executed afterward. OP CHECKMULTISIG Compares the _rst signature against each public key until it _nds an ECDSA match. OP CHECKMULTISIGVERIFY Same as OP CHECKMULTISIG, but OP VERIFY is executed afterward. Bitcoin transactions use a scripting system. Much like Forth [54], these scripts are also very simple, stack-based and are processed from left to right. A script is recorded with each and every transaction and is nothing but a list of instructions that describe how the next person wanting to spend the bitcoins being transferred can gain access to them [53]. The script included in a bitcoin transaction restricts the future spending of the bitcoins by asking the spender to provide the following two things: 1. A public key, which is basically the destination address, after it is hashed. 2. A signature as a proof of the private key corresponding to the public key just provided. 23 The scripting system incorporated by bitcoin transaction works in a very exible way, allowing the spender to change the required perimeters(for a successful transaction) within the script. A transaction is valid if nothing in the combined script triggers failure and the top stack item is true (non-zero). The party who originally sent the Bitcoins now being spent, dictates the script operations that will occur last in order to release them for use in another transaction. The party wanting to spend them must provide the input(s) to the previously recorded script that results in those operations occurring last leaving behind true (non-zero) [53]. A script uses various ‘words’, commonly known as opcodes to perform actions. A list of commonly used cryto opcodes is presented in table 2.1. Based on the type of script, there exist four di_erent types of standard bitcoin transaction: _ Pay-To-Public-Key-Hash (P2PKH): The most commonly used transaction type is the common form of the standard Pay-To-Public-Key-Hash (P2PKH) transaction type. When spending a UTXO, a signature script is created, which is a collection of data parameters which satisfy the conditions placed in the previous output’s public key script. Public key scripts and signature scripts then combine secp256k1 public keys and signatures with conditional logic, creating a programmable authorization mechanism [43]. The public key hash ,i.e the address always starts with 1. The format of the transaction is as follows: Pubkey script: OP DUP OP HASH160 <PubKeyHash> OP EQUALVERIFY OP CHECKSIG Signature script: <sig> <pubkey> _ Pay To Script Hash (P2SH): Instead of sending sending transactions to public key hash, these allow transactions to be sent to a script hash, which is similar to public key hash but starts with 3. To spend bitcoins sent via P2SH, the recipient must provide a script matching the script hash and data which makes the script evaluate to true. Using P2SH, you can send bitcoins to an 24 address that is secured in various unusual ways without knowing anything about the details of how the security is set up. You just send bitcoins to the 34-character P2SH address. The recipient might need the signatures of several people to spend these bitcoins, or a password might be required, or the requirements could be completely unique [55]. Each of the standard pubkey scripts can be used as a P2SH redeem script, but in practice only the multisig pubkey script makes sense until more transaction types are made standard [43]. The format of the transaction is as follows: Pubkey script: OP HASH160 <Hash160(redeemScript)> OP EQUAL Signature script: <sig> [sig] [sig…] <redeemScript> _ Multisig: Multisig or Multi-signature transactions are those which require more than one key to authorize a transaction. In multisig pubkey scripts, called m-of-n, m is the minimum number of signatures which must match a public key; n is the number of public keys being provided [43]. The format of the transaction is as follows: Pubkey script: <m> <A pubkey> [B pubkey] [C pubkey…] <n> OP CHECKMULTISIG Signature script: OP 0 <A sig> [B sig] [C sig…] _ Null Data: These transactions are used for the purpose of adding arbitrary data to a provable unspendable public key script so that the full nodes don’t have to store any data in their database. Current rules by consensus allow null data outputs up to the maximum allowed pubkey script size of 10,000 bytes provided they follow all other consensus rules, such as not having any data pushes larger than 520 bytes [43]. The format of the transaction is as follows: Pubkey Script: OP RETURN <0 to 40 bytes of data> There is no signature script for null data transactions as these are unspendable. 25 2.5 Cryptography Assumptions De_nition 2.5.1 Negligible Function: We call a function _ : N ! R negligible if for every positive polynomial p(_) there exists an N such that for all n > N, _(n) < 1 p(n) A one-way function is a function that is easy to compute but hard to invert. However, there are several de_nition of a one-way function. De_nition 2.5.2 Strong one-way function: A function f : f0; 1g*!f0; 1g* is called (strongly) one-way if the following two conditions hold: 1. Easy to compute: There exists a (deterministic) polynomial-time algorithm A such that on input x algorithm A outputs f (x) (i.e., A(x) = f (x)). 2. Hard to invert:For every probabilistic polynomial-time algorithm A, every positive polynomial p(_), and all su_ciently large n’s, Pr[A'(f(Up),1n) 2 f 1(f(Un))] < 1 p(n) De_nition 2.5.3 Weak one-way function: A function f : f0; 1g*!f0; 1g* is called a weakly one-way if the following two conditions hold: 1. Easy to compute: There exists a (deterministic) polynomial-time algorithm A such that on input x algorithm A outputs f (x) (i.e., A(x) = f (x)). 2. Slightly Hard to invert: There exists a polynomial p(_) such that for every probabilistic polynomial-time algorithm A and all su_ciently large n’s,, Pr[A'(f(Up),1n) =2 f 1(f(Un))] > 1 p(n) 26 2.5.1 Cryptographic Hash Function A cryptographic Hash function(h: X !Y) is a function that maps a message of an arbitrary length(x) to a m-bit output(y) with the following security requirements: _ Pre-image resistant (one-way): if given y2Y it is computationally infeasible to _nd a value x2X such that h(x) = y _ 2nd Pre-image resistant (weak collision resistant): if given x2X it is computationally infeasible to _nd a value x’2X, such that x’6=x and h(x’) = h(x) _ Collision resistant (strong collision resistant): if it is computationally infeasible to _nd two distinct values x’,x 2 X, s.t. h(x’) = h(x) 2.5.2 Di_e-Hellman Assumption The security of Di_e-Hellman [38] is based on three hard problems: _ Discrete Log Problem: Let G be a multiplicative group and for g 2 G, let hgi be a cyclic subgroup generated by g. The discrete logarithm problem for group G is stated as: Given g 2 G and a 2 hgi, compute an integer x such that gx= a _ Computational Di_e-Hellman problem is stated as: Given g 2 G and (ga,gb), where (a, b) 2 hgi, compute gab _ Decision Di_e-Hellman problem is stated as: Given g 2 G, distinguish (ga,gb,gab) from (ga,gb,gc), where (a, b, a) 2 hgi and are chosen randomly and independently. 27 In addition to these three problems, these is a new class of problems de_ned by Okamoto and Pointcheval [56], called the Gap problem. The Gap Di_e Hellman problem is stated as: Given g 2 G and (ga,gb), compute gab with the help of Decision Di_e Hellman oracle (which answers whether a given quadruple of (g,ga,gb,gc) is a Di_e-Hellman quadruple or not) 28 CHAPTER 3. CONCEPT 3.1 System Model Bitcoin network is a decentralized payment system that maintains a public ledger in a distributed manner. The ledger is maintained by participants, whom we call \miners” and whose identity is anonymous. These miners execute a protocol that maintains and extends a distributed data structure { block chain. It is required for the miners to solve a \proof of work” which essentially amounts to brute-forcing a hash inequality based on SHA-256, in order to generate new blocks for block chain. Miners receive bitcoins as rewards for maintaining the block chain. Apart from miners, there are other participants who use the bitcoin network to spend or receive bitcoins. Our systems consists of the miners and users, inside and outside of the censored network. 3.2 Threat Model The adversary in this case is the censor that represents the authority that holds the power to inhibit access to online information and communication. This can be achieved by IP and DNS blacklisting as well as blocking connections based on the observed content.Bitcoin maintains a public append-only log (the blockchain), which is essentially stored by every full Bitcoin node in the world. This makes transactions in the Bitcoin blockchain di_cult to censor under the assumption that blocking read access to the Bitcoin network comes with economic cost for the censor. Hence we assume that this economic costs provides some motivation for the censor for connecting to the internet all. Furthermore, it is ensured that the censor 29 has no control over the data in the block chain as data written to the block chain cannot be removed, because it is stored and replicated by a lot of Bitcoin nodes. Along with that, the censor is in control of the entire infrastructure of the network and has the ability to eavesdrop, block, alter or inject tra_c in between. Although the censor is subjected to realistic constraints while performing such activities. Another general assumption taken into account is that the end hosts within the network completely abide to their user’s commands and the censor has no control over them. Also, the supremacy of the censor is limited outside the scope of the network and it has absolutely no control over any external entities including the bitcoin network and bitcoin tansactions. 3.3 Goals Our protocol should satisfy the following properties: Space-e_cient Given the fact that Bitcoin transactions have a very less space to store any arbitrary data, the protocol should allow to store and send as much useful information as possible in a bitcoin transaction. Con_dentiality As per the de_nition by ISO “Con_dentiality is a characteristic that applies to information. To protect and preserve the con_dentiality of information means to ensure that it is not made available or disclosed to unauthorized entities. In this context, entities include both individuals and processes” [57]. In accordance with this de_nition, our protocol should be able to store and transmit data in such a way that the transaction looks like any other bitcoin transaction and and the data in the transaction should not be available or disclosed to the censor. Availability The sensor should not be able to deny the ow of information through this protocol. Since the protocol stores the data in the bitcoin transactions, it becomes practically impossible for the censor to block our information from passing through it. 30 Figure 3.1. Protocol design The _gure illustrates the working of our protocol in context to Figure 1.1. The user here aims to reach a blocked destination with the help of an anti censorship system. By hiding the data in the bitcoin transactions the user is able to bypass the censor and further establish a connection to an anti censorship system(in this case Tor). Undetectability The data should be hidden in such a way that the censor is not able to distinguish between and normal transaction and transaction storing the data. Easy to use The protocol should not require the user to perform tasks that are di_cult to execute and hinder with normal network or system operations. 3.4 Design The protocol needs to be designed keeping in mind all the constraints imposed by the threat model and also ful_lling all the goals mentioned in the previous section. Figure 3.1 shows an overview of the design as the steps proceed as follows: _ The user, inside the censored network, uses the bitcoin network and creates a new bitcoin transaction. _ This transaction contains the data that is needed to establish a connection to an anti censorhip system. The data is encrypted and then stored in the 31 transaction by the user. The details of storing data have been explained in section 4.1. _ This transaction is now on the blockchain and is publicly accessible. _ There is forwarder outside the censored network, which extracts the hidden data from the transaction and decrypts it. _ The forwarded then uses the credentials of the TOR bridge to create a connection to it. 32 CHAPTER 4. FRAMEWORK AND METHODOLOGY In this Chapter, we discuss the three steps on which our protocol is based upon. First, we ananlyze the bitcoin transaction to search for possible _elds for storing/hiding our data. Then, we explain the encryption algorithm that we will use to encrypt our data and analyze how it suits for our purpose. At last, we provide a detailed comparison of the four techniques that can be used to send our data. 4.1 Storing Data in Bitcoin Transaction As shown in _g 3.2, the bitcoin transaction consists of an input and an output _eld. The input is something that it receives from the previous transactions but the output is something which it generates itself. Hence, this output _eld provides us with the best options to store or hide our data as a part of the bitcoin transaction. In this section, we try to explore the _elds in a bitcoin transaction that can o_er us a space e_cient and a con_dential way of storing our data. 1. The value _eld: Every bitcoin transaction has a value _eld where the amount of bitcoins(BTC) being spent is speci_ed. The length of this _eld is 8 bytes. Although very small, this space can be used to store some arbitrary data. The data in value _eld is stored as signed integer type with width of exactly 64 bits with no padding bits and using 2’s complement for negative values,i.e.int64 t. It is possible to use 3 bytes from this 8 byte _eld, if we are willing to burn 0.16 BTC (aprox. $117)1. This is the maximum amount one will have to burn to get the space of 3 bytes. 1Coversion rate of 1 BTC = $732.60 as on 11/28/2016 33 2. Public Key: Every bitcoin transaction has the bitcoin address of the receiver. This address is the hash of the public key of the receiver. If we replace this address with some arbitrary data that looks similar to the address, the transaction will be considered a valid transaction, however, the bitcoins will not be received by any one. The actual size of the key is 65 bytes, compressed to a size of 33 bytes. Bitcoin address = version + RIPEMD-160(SHA-256( Public Key )) + checksum In a standard “Pay-to-Pubkey hash” transaction, the key is hashed using RIPEMD 160 (SHA 256) to base58 check encoding. The length of the hashed key is 25 bytes, where the _rst byte is always 1 and the last four bytes are the checksum. This gives us a total of 20 bytes to store data. 3. MultiSig transactions:A multisig transaction, referred to as m-of-n, requires a minimum of m signatures to match from n public keys that are provided. If we have m valid keys to match with m signatures, then we can use the remaining (n-m) keys to store our data. The key size here is 33 bytes as it is not hashed rather they are in hexadecimal form. Therefore, using the most basic 2-of-3 multisig transaction provides us with a storage space of 33 bytes. Here is an example of multisig transaction from the bitcoin develepor’s guide: [43] Table 4.1. MultiSig Transaction: Parameter 2 [43] Name Type Presence Description Keys or addresses array Required (exactly 1) An array of strings with each string being a public key or address Keys or addresses string Required (1 or more) A public key against which signatures will be checked. 34 Table 4.2. MultiSig Transaction: Parameter 1 [43] Name Type Presence Description Required Number (int) Required (exactly 1) The minimum (m) number of signatures required to spend this m-of-n multisig script Multisig outputs have two parameters, the minimum number of signatures required (m) and the number of public keys to use to validate those signatures. This is called m-of-n, and in this case we’ll be using 2-of-3. First we need to create three P2PKH [58] addresses, but in this case and address cannot be used to create an output thus, we need to provide full public key. Then a multisig transaction is initiated using 2 parameters – the number (m) of signatures required and a list of addresses or public keys as described in table 4.1 and 4.2 4. Null-data transactions: In 2013, the bitcoin developers introduced an entirely di_erent type of transaction into the blockchain. This new type of transaction called the “Nulldata transaction” provided an option to add additional data to the bitcoin blockchain by incurring the basic transaction fee. It uses the script OP RETURN which accepts an input of 40 bytes. The format of the transaction is as follows: Pubkey Script: OP RETURN <0 to 40 bytes of data> This storage space of 40 bytes can be used to add any arbitrary data into the blockchain and this data is completely secured just like the data in other standard transactions. However, what separates such transactions from other standard transactions is that the output of transactions using OP RETURN is unspendable. Hence, the OP RETURN output bytes from these transactions are 35 not stored in the UTXO set and the value in the amount _eld of these transactions is set to zero. 4.2 Encryption Scheme The fact that we have very less space to store our data, creates the need to deploy a very space e_cient encryption scheme with short ciphertexts. The public-key encryption schemes with the shortest ciphertexts are typically those where the ciphertext contains a group element (i.e.,some point on an elliptic curve) and a random-looking bitstring in the length of the message. Hiding an elliptic curve point in a random-looking bitstring is not trivial. But as Bitcoin transactions already contain elliptic curve points, it becomes easy for us to hide our elliptic curve point there. This plays to our advantage as we do not need to put in additional e_ort to hide the group element. One such space e_cient encryption scheme that caters to our needs is the “Miniature CCA2 PK Encryption” proposed by Xavier Boyen. [59] This encryption scheme has a very simple algebraic structure and is secure under the Gap Di_e-Hellman assumption in the random-oracle model. This scheme uses ElGamal one-time pads, homomorPhically related to the same secret decryption key, to blind the message twice and hence generates a ciphertext having no explicit redundancy. This is because it allows to construct the second key from the _rst key without having to include any inforamtion about it. [59]. the encryption scheme is as follows: Let _ and   be two random oracles, _ be collision resistant function, _ 2 N be an arbitrary security parameter, G = hUi be a cyclic prime-order group, generated by U, of prime order p, such that 22_ 1 < p < 22_+1, Fp be the _nite _eld of size p, F_p = Fp nf0g denote its multiplicative group of order p – 1, 36 M = f0; 1gl be the set of all bit strimgs of length l, for any _xed l _ 2_ _: M! F_p be an arbitrary injection or collision resistant hash function. : G _ G !M be two cryptograPhic hash functions. Key generation: Draw a secret random exponent s 2 F_p , and calculate V  Us The public key is Pk   (U,V) 2 G2. The private key is Sk   s 2 F_p . Encryption: Given Pk and plaintext Msg 2M, pick a randomizer r 2$ F_p , and let, A   Vr B    (A)_Msg C   Vr=_(B) D   Ur=_(B) E   _(D,C) _ B Decryption: Given Sk and a ciphertext Ctx = (_D ; _ C), check that 1 6= _D 2 G, and let _ C   _D Sk _B _E _ _(_D ; _ C) _ A   _ C_(_B ) _M _B _  (A_) The decrypted plaintext is _ Msg = _M 2M 4.3 Implementation As mentioned previously, this protocol to store data in bitcoin transactions acts as a bootstrapping mechanism for existing circumvention techniques. Section 4.1 discussed various techniques to store data in a bitcoin transaction. In this section, we expands over to discuss the scenarios of practical implementations of the mentioned data storing techniques and try to answer the question that which data storing technique provides us with the most e_ective and e_cient way of storing data in the transactions. 37 In section 2.3, we de_ned the parameters of evaluation for our protocol. Based on all those parameters, we present a detailed analysis and comparison of each data storing technique. This comparison will not only allow us to highlight the pros and cons of each technique but will also be helpful in determining the best and worst case scenarios for using a particular technique. Table 4.3. Comparison between data storing/hiding techniques Data Storing Technique Storage Space Detectability Storing Data Implement- ation Value/Amount _eld Less Di_cult to detect Easy Easy Public key Good Di_cult to detect Moderately Di_cult Di_cult MultiSig Transaction Good/ Excellent Di_cult to detect Easy Easy Null-Data Transaction Excellent Can be suspicious Very Easy Moderately Di_cult The table above provides a comparison between the data storing techniques and uses the following parameters for comparison- the amount of storage space available, how di_cult it is for the sensor to detect hidden data, the ease of storing data and practical implementation of the technique or ease of sending/receiving data. Here, the second parameter does not indicate the ability of sensor to extract the hidden data from the transaction but its ability to infer if the transaction is being used for hiding data. Before we discuss the practical implementation of these techniques, we would like to discuss the two possible scenarios that will be used. After we have successfully stored our data in the transaction, an important part of this protocol would be to make that data available to the forwarder so it can facilitate the connection to TOR bridge. One way of doing it would be to make the forwarder the recipient of our bitcoins. This way, we can directly send the data to it, without much complications. The other way, where its not possible to send data directly, would be to initiate a transaction and publish it to the blockchain. Then 38 the forwarder will have to scan through all the transaction in the block chain to _nd the transaction with the hidden data, we call this as the indirect way. An important thing to mention here is that we limit our study to the process of hiding and sending data. The process of scanning and _nding transactions is beyond our scope. Storing data in the amount/value _eld of the transaction provides us with a very easy of achieving our goal. The data is stored as a part of the value of bitcoins to be spent hence, resulting in a transaction that makes it makes it impossible for the sensor to di_erentiate our transaction from a standard transaction. Another bene_t this technique provides is the ease of storing data as we just need to add our data bits to the amount _eld. As far its implementation is concerned , it would follow the direct approach wherein the forwarder will be the recipient in the transaction. This way the forwarder will have an easy access to the transaction and hence, can extract the hidden data from it. This technique convincingly satis_es three parameters of evaluation but it falls short when it comes to the storage space it provides. It provides us with a very less space that too with spending a signi_cantly large amount of bitcoins. The second option for hiding data is in the public key _eld of the transaction. As mentioned before, it provides us with a storage space of 20 bytes which is signi_cantly more than the previous scheme. However, process of storing data is much more complicated than the other schemes we discuss. Since we are storing our data as the address, we need to ensure that the address we produce is a valid address else it will not be accepted by the client. The term “valid” here means that the address that is being generated should adhere to the proper format of the address as mentioned in section 4.1,i.e The Bitcoin network only veri_es that an address is in the right form, length, and has the right checksum when “validating it”. The existence of the address is not a criteria to be considered as the address would be considered as valid as long as it follows the proper format. If the generated address does not exists then any coins sent to the address cannot be spent and are e_ectively destroyed. For sending the data, we will have to follow the 39 Figure 4.1. Transaction rate per second The _gure shows the number of bitcoin transactions that are generated per second. Such a high number of transactions makes it di_cult to send data in the public key _eld [60]. indirect method in this case. Since we are using the address for hiding the data we cannot make forwarder the recipient of the transaction. Therefore, the forwarder needs to scan through all the transaction in the block chain to _nd the transaction. Figure 4.1 shows the rate of transactions that are being generation per second. Here, we can see that the average is about 2-4 transactions per second and with transactions being added at such an enormous rate the task of scanning and _nding becomes really tedious for the forwarder. Multisig transactions o_er a very interesting way of storing data. As mentioned before, these transactions use a criteria of matching m out of n public keys to validate a transaction. This can be exploited in a way that we use m valid keys to ensure that the transaction does take place and use the remaining keys(n-m) to store our data. This provides us with a very good space of 33 bytes if we perform a 2-of-3 transaction. Which is more than what the previous schemes over. This space can be further increased if we use a higher order of transaction, e.g 3-of-5 , in 40 which we get 66 bytes of data. However, in the present scenario, 2-of-3 transaction is the one which is used commonly. In fact, many wallets do not even support any other multisig transaction except 2-of-3. If in future, the use of higher order multisig transaction increases then this scheme would be very useful for our purpose , but for now it can o_er only 33 bytes of data, which still is much better compared to other options. This also comes with the ease of storing data since the keys are stored as raw decimals , they do not need to adhere to any correct format as was the case with public key addresses. Even sending data is easier with this option as it follows the direct approach and the forwarder will be the recipient in the transaction. Storing data in null data transactions is perhaps the most convenient and a space e_ective way to achieve our goal. This method provides us with a storage space of 40bytes surpassing the space provided by all the other methods. Not only that, the process of storing data is as simple and straight forward as it could be. This is so because these transactions were created for the purpose of storing arbitrary data in the block chain. However, this characteristic of such transactions proves to be both advantageous as well as disadvantageous for us. It provides us with an easy to to store data but at the same time makes it more suspicious to the sensor that some data is being sent through the bitcoin transactions. Considering the fact that the ratio of null data transactions to the total number of bitcoin transactions is very less, it becomes easier for the sensor to look out for null data transactions and a sudden increase in the number of such transactions is bound to get the sensor suspicious. This scheme also follows the indirect way of sending data. But since the number of null data transactions is relatively much less, it makes it easier for the forwarder to scan and _nd the relevant transactions. 41 CHAPTER 5. SECURITY ANALYSIS In this section, we analyze the security of our proposed bootstrapping mechanism under the threat model described in section 2.2. We start with the security of the bitcoin network. Being one of the most prominent distributed crypto currency network, it has maintained strong security record over all these years. But we know that no system is perfect and many security aws have been found in the bitcoin network as well. Bad actors are someone who are inevitable on any network therefore, the security model of the bitcoin network does not exclude them and depends entirely on computation through proof of work. With respect to our threat model, we are more concerned about the security of data in the bitcoin transactions. In tranactions, Bitcoin uses the Elliptic Curve Digital Signature Algorithm (ECDSA) with the secp256k1 curve; secp256k1 private keys are 256 bits of random data. A copy of that data is deterministically transformed into an secp256k1 public key [43]. Because the transformation can be reliably repeated later, the public key does not need to be stored. The public key is then cryptographically hashed. This pubkey hash can also be reliably repeated later, so it also does not need to be stored. The hash shortens and obfuscates the public key, making manual transcription easier and providing security against unanticipated problems which might allow reconstruction of private keys from public key data at some later point [43]. Even in the signature script, An secp256k1 signature is made by using the ECDSA cryptographic formula to combine certain transaction data with sender’s private key. This secp256k1 signature doesn’t just prove the sender controls his private key; it also makes the non-signature-script parts of his transaction tamper-proof so sender can safely broadcast them over the peer-to-peer network. [43] 42 Considering the scenario where the sensor is somehow able to get hold of the information that we had hidden in the transaction, we analyze the security of encryption algorithm that we have used to see how our data is protected from the sensor. The security of the ‘Miniature CCA2 PK Encryption’ scheme has been proven and is a very standard sense of IND-CCA2 security or in simple words it is indistinguishable under an adaptive chosen-ciphertext. The author in the original work has proved that “This crypto system is IND-CCA2 secure in the random oracle model, provided that GAP di_e-Hellman assumption holds in G.” [59] Almost all CCA2-secure cryptosystems that are based on redundancy, perform an integrity check while decrypting the data so as to secure againt any active attacks. Since the Miniature CCA2 PK Encryption scheme does not include any redundancy, the ciphetext is not authenticated in this case. As a result, if any ciphertext gets tampered with, it will not be able to detect that and would decrypt to a useless plaintext. However, the security mechanisms implied in the bitcoin network ensure that the transaction data is fully secure against any external tampering. Hence, covering up for a major drawback that this encryption scheme has and making it a perfect _t for our protocol. 43 CHAPTER 6. FUTURE WORK Our work proposes a bootstrapping mechanism for existing anti censorship system that face a very fundamental problem of adversary-resistant communication bootstrapping. Despite having, what looks like an e_cient scheme, we are yet to prove its practical e_ciency and e_ectiveness. So the _rst and foremost things that needs to be accomplished is the practical implementation of the mechanism. The processes of storing data and later extracting it from the bitcoin transactions might seem easy on paper but could actually cause some concerns when it is actually being done. Once this has been achieved we need move on to the next step of associating it with existing anti censorship networks. 6.1 Associating with existing anti-censorship systems Once we are successfully able to store and extract data from bitcoin transactions, the next step in the implementation process is to connect to an existing anti censorship system. Considering the present scenario, one of the better candidates for this would be Tor. Adversary-resistant communication bootstrapping is a fundamental problem faced by Tor. Censoring regimes actively harvest and block published Tor entry points and bridge nodes. More recently, some authorities have resorted to reactive probing of the destination hosts of outbound encrypted tra_c to identify unpublished Tor nodes [61]. Connecting to a Tor bridge requires some essential information. TOR bridge entry consists of three di_erent elements – IP address of the bridge, Port Number, Fingerprint of the bridge. It looks like as following: 4352e58420e68f5e40bf7c74faddccd9d1349413 Here, 44 is the IP address of the Bridge, 000 is the Port Number, ‘4352e58420e68f5e40bf7c74faddccd9d1349413’ is the unique identi_er or the _ngerprint of the bridge. In order to connect to a TOR bridge, we need to know the _rst 2 elements,i.e the IP address and port number. The size of the IP address is 32 bits and the port number is a 16 bit unsigned integer. Hence the total amount of space needed to for these two elements to be stored in a bit coin transaction would be 48 bits. Once these credentials are know, these can be encrypted using the encryption algorithm explained in section 4.2 and then the data can be stored and sent through a bitcoin transaction using one of the methods. 6.2 Other crypto-currency systems Bitcoin was the _rst crypto-currency to be introduced and since then it has maintained its supremacy other currencies in all aspects. Bitcoin is a very sophisticated system and its _rm platform allows its use beyond currency. It has been proven to be fairly secure, has the biggest market capitalization, has a massive fanbase for its true decentralization and is established and accepted by thousands of merchants. However, the things that concerns us the most is its ever increasing value. The value of bitcoin has been increasing and is expected to continue to rise in future, at a high rate due to multiple reasons [62]: _ Bitcoin value increases over time by design. _ Fiat currency fatigue. _ Big Business hasn’t jumped onto the Bitcoin bandwagon yet. _ Cash is leaving the scene and will be replaced by digital payments anyway. _ The Global Reserve Currency keeps losing value, which inates Bitcoin value, and many more. 45 The high popularity of bitcoins provide an ideal setup for our scheme. However, our scheme involves spending bitcoins to get storage space in the transaction. And as the value of bitcoins keep rising, the users will end up loosing a lot of money if they opt to use our scheme. This leads to possibility of transporting our scheme to other crypto-currencies like Ethereum [63] and Ripple [64]. 46 CHAPTER 7. SUMMARY In this thesis, we try to propose a bootstrapping mechanism for the existing censorship circumventing tools that is unique in many ways. We divert ourselves from the existing pattern of research going on in this _eld to develop a technique that gives more emphasis on the process of setting up a channel that is really di_cult for the censor to detect. We analyse various _elds inside a bitcoin transaction and came out with four possible ways of hiding our data in the transaction – In the Value/Amount _eld, in the public key, through a MultiSig transaction and through null data transaction. These four approaches were evaluated on the parameters that we de_ned at the start, which are – space e_ciency, detectability, ease of storing data and ease of sending/receiving data. Each approach comes with its own advantages and disadvantages and at this point it is very di_cult to conclude that which of them it best among all. As it clearly depends on the situation we are trying to cater to – If one needs a very less space , he can use the value _eld approach, on the other hand, if one needs a lot of space , he will go for null data transaction approach. By proposing these techniques we have delivered our goal of developing an e_ective bootstrapping mechanism that will very useful in connecting to already existing circumvention tools. Having said that, our proposal is still far from perfect and will need some more intensive research to be practically implemented. 47 BIBLIOGRAPHY [1] Universal declaration of human rights. http://www.un.org/en/universal-declaration-human-rights. [2] Wikipedia. Arab spring. https://en.wikipedia.org/wiki/ArabSpring. [3] Wikipedia. Great _rewall. https://en.wikipedia.org/wiki/GreatFirewall. [4] Barclay Ballard. Internet censorship: The worst o_enders. https://betanews.com/2015/01/28/internet-censorship-the-worst-o_enders/. [5] Robert Faris and Nart Villeneuve. Measuring global internet _ltering. 2008. [6] Dynaweb. http://www.dongtaiwang.com/home en.php. [7] Ultrasurf. http://ultrasurf.us. [8] Tor. https://www.torproject.org. [9] E. Wustrow, S. Wolchok, I. Goldberg, and J. Halderman. Telex :anticensorship in network infratructure. 2011. [10] Tor:mirrors. https://www.torproject.org/getinvolved/mirrors.html.en. [11] Tor:bridges. https://www.torproject.org/docs/bridges. [12] Tor:hidden service protocol. https://www.torproject.org/docs/hidden-services. [13] Sadia Afroz Michael Carl Tschantz and Vern Paxson. Sok:towards grounding censorship circumvention in empiricism. 2016. [14] Secp256k1. https://en.bitcoin.it/wiki/Secp256k1. 48 [15] Wikipedia. Ecdsa. https://en.wikipedia.org/wiki/Elliptic Curve DSA. [16] Mung Chiang Christopher S. Leberknight and Felix Ming Fai Wong. A taxonomy of censors and anti-censors: Part i{impacts of internet censorship. International Journal of E-Politics, (3(2)):52{64, 2012. [17] Andreas P_tzmann and Tu Marit Hansen. A terminology for talking about privacy by data minimization: Anonymity, unlinkability, undetectability, unobservability, pseudonymity, and identity management. 2010. [18] Global Internet Freedom Consortium. Defeat internet censorship:overview of advanced technologies and products. 2007. [19] H. Roberts, E. Zuckerman, and J. Palfrey. 2007 circumvention landscape report. 2009. [20] Christopher Kruegel Luca Invernizzi and Giovanni Vigna. Message in a bottle: Sailing past censorship. Proceedings of the 29th Annual Computer Security Applications Conference, pages 39{48, 2013. [21] Wikipedia. Ping (blogging). https://en.wikipedia.org/wiki/Ping (blogging). [22] Douglas Hackney. Digital photography: Meta data overview. 2008. [23] Friendsfeed. https://code.google.com/archive/p/simpleupdateprotocol/. [24] Google. https://github.com/pubsubhubbub/. [25] Wikipedia. Rss cloud. https://en.wikipedia.org/wiki/RSS Cloud. [26] N. Provos and P. Honeyman. Hide and seek: An introduction to steganography. Security Privacy, IEEE, pages 32 { 44, 2003. [27] Anonymous. Tor partially blocked in china. https://blog.torproject.org/blog/tor-partially-blocked-china, 2009. 49 [28] A.Lewman. Torproject.org blocked by gfw in china: Sooner or later? https://blog.torproject.org/blog/torprojectorg-blocked-gfw-china-sooner-or-later, 2008. [29] Tor:pluggable transports. https://www.torproject.org/docs/pluggable-transports.html.en. [30] What is deep inspection? http://www.ranum.com/security/computersecurity=editorials=deepinspect=: [31] A.Lewman. China blocking tor: Round two. ” https://blog.torproject.org/blog/china-blocking-tor-round-two, 2010. [32] L. v. Ahn and N. J. Hopper. Public-key steganography. 2004. [33] C. Cachin. An information-theoretic model for steganography. 1998. [34] T. Mittelholzer. An information-theoretic approach to steganography and watermarking. 2000. [35] J. A. O’Sullivan, P. Moulin, and J. M. Ettinger. Information-theoretic analysis of steganography. 1998. [36] J. Zollner, H. Federrath, H. Klimant, A. Pftizmann, R. Piotraschke, and A. West_eld. Modeling the security of steganographic systems. 1998. [37] J. Langford N. Hopper and L. V. Ahn. Provably secure steganography. 2002. [38] Whit_eld Di_e and Martin E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, VOL. IT-22(No.6), 1976. [39] Matthew Caesar Amir Houmansadr, Giang T. K. Nguyen and Nikita Borisov. Cirripede: Circumvention infrastructure using router redirection with plausible deniability. Proceedings of the 18th ACM conference on Computer and communications security, pages Pages 187{200, 2011. 50 [40] Planetlab. https://www.planet-lab.org/. [41] Alexa topsites. https://www.planet-lab.org/. [42] G.I.F.Consortium. China great _rewall revealed. http://www.internetfreedom.org/_les/WhitePaper/ChinaGreatFirewallRevealed.pdf. [43] Bitcoin developer guide. https://bitcoin.org/en/developer-guide. [44] Merkle tree. https://brilliant.org/wiki/merkle-tree/. [45] Merkle root. https://bitcoin.org/en/glossary/merkle-root. [46] Hashcash. https://en.bitcoin.it/wiki/Hashcash. [47] Proof of work. https://en.bitcoin.it/wiki/Proof of work. [48] S. Nakamoto. A peer-to-peer electronic cash system. 2008. [49] Outpoint. https://bitcoin.org/en/glossary/outpoint. [50] Signature script. https://bitcoin.org/en/glossary/signature-script. [51] Sequence number(transactions). https://bitcoin.org/en/glossary/sequence-number. [52] Pubkey script. https://bitcoin.org/en/glossary/pubkey-script. [53] Script. https://en.bitcoin.it/wiki/ScriptScripts. [54] Wikipedia. Forth (programming language). https://en.wikipedia.org/wiki/Forth(programminglanguage): [55] Pay to script hash. https://en.bitcoin.it/wiki/Pay to script hash. [56] Tatsuaki Okamoto and David Pointcheval. The gap-problems: a new class of problems for the security of cryptographic schemes. International Workshop on Practice and Theory in Public Key Cryptography, page pages 104{118., 2001. 51 [57] Archive of plain english iso 27001 2005 de_nitions. http://www.praxiom.com/iso-27001-de_nitions.htmCon_dentiality, 2005. [58] P2pkh address. https://bitcoin.org/en/glossary/p2pkh-address. [59] Xavier Boyen. Miniature cca2 pk encryption :tight security without redundancy. 2007. [60] Bitcoin stats. https://blockchain.info/. [61] Phillip Porras Vinod Yegneswaran Zachary Weinberg Jeroen Massar William Allen Simpson Paul Vixie Patrick Lincoln, Ian Mason and Dan Boneh. Bootstrapping communications into an anti-censorship system. 2nd USENIX Workshop on Free and Open Communications on the Internet, 2012. [62] Evander Smart. 5 reasons why bitcoin value must increase in future. https://cointelegraph.com/news/5-reasons-why-bitcoin-value-must-increase-in-future, 2016. [63] Wikipedia. Ethereum. https://en.wikipedia.org/wiki/Ethereum. [64] Wikipedia. Ripple (payment protocol). https://en.wikipedia.org/wiki/Ripple(paymentprotocol):

Cite This Work

To export a reference to this article please select a referencing stye below:


Leave a Reply