Hello!
Didn't write to much recently, being busy, but I got something for you: Unrolled Linked List in Java.
This is an Wikipedia article; in my implementation, there is no remove and insert function  it would complicate API too much, imo.
It has O(1) performance with: push, add, pop (removes the first) and popLast. Performance of:
 replace (updates element with the given index) and
 get (returns  not remove an element with the given index)
are O(n / [#of internal arrays])  which is better (with max elements 1000 and n = million would be 500) than pure Linked List and obviously worst than Augumented Array.
There is a bigger memory overhead per node: an array of elements plus some helper fields; for million integers, there would be around 1600 bytes overhead plus whatever comes from arrays management.
Where to use it? Any queues, stacks, et cetera, especially when we do lots of insertions and lookups at the front  in that case ArrayList is much slower. Code as usually on github. Till the next time!
Tuesday, 4 July 2017
Wednesday, 21 June 2017
Java Calculator 2
Following previous post, I've fixed this  now is not nessessery to use fully paranthized expression; the operator precedence is set and can be changed using paranthesis. Instead of binary tree, I parsed expression to postfix  Reverse Polish Notation Rocks :).
Code, of course, on Github. Till the next time!
Code, of course, on Github. Till the next time!
Monday, 12 June 2017
Calculator
Trying to get new skills: objective programming, I've started to code in Java. The simple project: Calculator. Expression string is parsed to Binary Abstract Syntax Tree and then recursively evaluated.
Code on Github. Till the next time!
Code on Github. Till the next time!
Sunday, 4 June 2017
Wednesday, 17 May 2017
Few Links
Hi all, not been posting for quite a while (been busy and still I am); but giving some interesting links found on the web:
 Automatic Testing Platform;
 New, Powerfull Quantum Computer From IBM;
 Nice Scala MOOC;
 Kraków Lambda Days  Why Functional Programming Matters;
 Did North Korea Write WNCRY?
Thats it, thank you.
 Automatic Testing Platform;
 New, Powerfull Quantum Computer From IBM;
 Nice Scala MOOC;
 Kraków Lambda Days  Why Functional Programming Matters;
 Did North Korea Write WNCRY?
Thats it, thank you.
Thursday, 30 March 2017
New Class in C++ Biginteger Library!
I've started developing a C++ wrapper for GMP  high precision arithemtic library in C; updates on github today  bigrationals added! Only basic functions for now, but development in progress:)
Thursday, 23 March 2017
C++ GMP Wrapper
Recently beeing busy  writing, in C++, wrapper for GMP Library. Code here  you are invited to improve it!:)
Wednesday, 8 March 2017
Scala Trees
Recently, I've been playing with functional programming, ADT to be precise, and this is the outcome  trees implementation in Scala.
Data are created in common way as a Scala trait:
All looks Okay, lets create some trees and functions.
No problem, but at least for integer binary trees, we can right now do better: create a function to join elements to the tree:
This is the standard way to add elements to a binary tree, we can use it in a loop and don't need to worry about memory  we use persistent data structure.
Two sample functions operate on a tree:
They do what their names promise: count nodes and maximum depth of the trees. Finally map over the tree:
Data are created in common way as a Scala trait:
sealed trait Tree[+A] case object EmptyTree extends Tree[Nothing] case class Node[A](value: A, left: Tree[A], right: Tree[A]) extends Tree[A]
All looks Okay, lets create some trees and functions.
val tree = Node(10, Node(5, EmptyTree, EmptyTree), Node(15, EmptyTree, EmptyTree))
No problem, but at least for integer binary trees, we can right now do better: create a function to join elements to the tree:
def addToTree(elem: Int, tree: Tree[Int]): Tree[Int] = { if (isTreeEmpty(tree)) Node(elem, EmptyTree, EmptyTree) else if (elem == loadTree(tree)) tree else if (elem < loadTree(tree)) Node(loadTree(tree), addToTree(elem, leftBranch(tree)), rightBranch(tree)) else Node(loadTree(tree), leftBranch(tree), addToTree(elem, rightBranch(tree))) }
This is the standard way to add elements to a binary tree, we can use it in a loop and don't need to worry about memory  we use persistent data structure.
Two sample functions operate on a tree:
def length[A](tree: Tree2[A]): Int = tree match { case EmptyTree => 0 case Node(x, left, right) => length2(left) + 1 + length2(right) }
def maxDepth[A](tree: Tree2[A]): Int = tree match { case EmptyTree => 0 case Node(x, left, right) => var lmax = maxDepth2(left) var rmax = maxDepth2(right) if (lmax > rmax) lmax + 1 else rmax + 1 }
They do what their names promise: count nodes and maximum depth of the trees. Finally map over the tree:
def treeMap[A, B](tree: Tree2[A])(f: A => B): Tree2[B] = tree match { case EmptyTree => EmptyTree case Node(x, left, right) => Node(f(x), tree2Map(left)(f), tree2Map(right)(f)) }We can, in a similar way, all the needed function like foldl, traversing a tree and more. Code, including not implemented here functions helped to design joining element to tree, as usually on github. Till the next time!
Monday, 20 February 2017
Graphs in Python
In one of the post from series Python Data Structures I promised, that graphs come to the family, so here they are.
Design, I decided to adjacency list implementation, as underlying data structures there are: a list of vertices and list of lists (list of adjacent vertices for any given). I could have used a symbol table (more operation), but decided to keep things as simple as possible (graph algorithms are complicted on its own). A sample of code:
Why I did it in the java style? (Additional classes to given graph opertions) Primarly to avoid set and mutate global variables (in that case cnt in class Depth_first_search, I don't know how to solve it using function no object :/), also there is lots of graph tasks and operatons, and, imo, this approach is more clear (Did I really write this?:)). There are going to be updates on this definitely! Code obviously, also on github. Thanks, till the next time!
Design, I decided to adjacency list implementation, as underlying data structures there are: a list of vertices and list of lists (list of adjacent vertices for any given). I could have used a symbol table (more operation), but decided to keep things as simple as possible (graph algorithms are complicted on its own). A sample of code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90  # Simple graph API in Python, implementation uses adjacent lists. # look also here: # Classes: Graph, Depth_first_search, Depth_first_paths # Usage: # Creating new graph: gr1 = Graph(v)  creates new # graph with no edges and v vertices; # Search object: gr2 = Depth_first_search(graph, vertex)  # creates search object, # gr2.marked_vertex(vertex)  returns true if # given vertex is reachable from source(above) # Path object: gr3 = Depth_first_paths(graph, vertex) # creates a new path object, # gr3.has_path(vertex)  thee same as above # gr3.path_to(vertex)  returns path from source vertex (to the given) from collections import deque class Graph: """Class graph, creates a graph  described by integers  number of vertices  V : v0, v1, ..., v(V1)""" def __init__(self, v_in): """constructor  takes number of vertices and creates a graph with no edges (E = 0) and an empty adjacent lists of vertices""" self.V = v_in self.E = 0 self.adj = [] for i in range(v_in): self.adj.append([]) def V(self): """returns number of vertices""" return self.V def E(self): """returns number of edges""" return self.E def add_edge(self, v, w): """adds an edge to the graph, takes two integers (two vertices) and creates an edge v,w  by modifying appropriate adjacent lists """ self.adj[v].append(w) self.adj[w].append(v) self.E += 1 def adj_list(self, v): """Takes an integer  a graph vertex and returns the adjacency lists of it""" return self.adj[v] def __str__(self): """to string method, prints the graph""" s = str(self.V) + " vertices, " + str(self.E) + " edges\n" for v in range(self.V): s += str(v) + ": " for w in self.adj[v]: s += str(w) + " " s += "\n" return s class Depth_first_search: """class depth forst search, creates an object, constructor takes graph and a vertex""" def __init__(self, gr_obj, v_obj): self.marked = [False] * gr_obj.V self.cnt = 0 self.__dfs(gr_obj, v_obj) def __dfs(self, gr, v): """private depth first search, proceed recursively, mutates marked  marks the all possible to reach from given (v) vertices; also mutates cnt  number of visited vert""" self.marked[v] = True self.cnt += 1 for w in gr.adj_list(v): if self.marked[w] == False: self.__dfs(gr, w) def marked_vertex(self, w): """Takes an integer  a graph vertex and returns True if it's reachable from vertex v (source)""" return self.marked[w] def count(self): """returns number of visited verticles (from given in the constructor vertex)""" return self.cnt 
Why I did it in the java style? (Additional classes to given graph opertions) Primarly to avoid set and mutate global variables (in that case cnt in class Depth_first_search, I don't know how to solve it using function no object :/), also there is lots of graph tasks and operatons, and, imo, this approach is more clear (Did I really write this?:)). There are going to be updates on this definitely! Code obviously, also on github. Thanks, till the next time!
Thursday, 9 February 2017
Bit Hacks in Go
Some time ago I wrote about little bit tweddlings in C, I, recently, have tried something similar in GO. It's not easy, for example bit shifts (>>) works only for unsigned integers and other differences. For now I reproduced and tested two things (both work on unsigned ints):
Integer mean without casing an overflow and exponenation by binary decomposition.
Above is about 4 to 5 times faster than math.Pow from library.
Average:
This time the adventage is not speed, but the fact, that we have the average without overflow (works, for ex., for:, 2^32  1 and 2^32  1).
That's it, code also on github, till the next time!
Integer mean without casing an overflow and exponenation by binary decomposition.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  func iexp(x uint32, n uint32) uint32 { if (n == 0) { return 1 } var p, y uint32 y = 1; p = x; for { if (n & 1 != 0) {y = p*y} n >>= 1; if (n == 0) {return y} p *= p; } return 0 } 
Average:
1 2 3  func average(x uint32, y uint32) uint32 { return (x & y) + ((x ^ y) >> 1) } 
This time the adventage is not speed, but the fact, that we have the average without overflow (works, for ex., for:, 2^32  1 and 2^32  1).
That's it, code also on github, till the next time!
Tuesday, 7 February 2017
Python Pollard's rho Algorithm
I've recently was looking for some number theoretic algorithms in Python and Go. While searching, found this on the first position on duckduckgo and google but it's not really good. Even for example used (1200), gives wrong answer (missing factor 600).
Applying small fix:
Where gcd is Greater Common Divisor, 2 * n in line 7, makes it work correctly, but it's still slow.
Anybody needs more efficient factorization (and more), I would recommend this library.
Applying small fix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  def pollard_rho(n): s = set() i = 0 xi = randint(0, n1) y = xi k = 2 while i < 2 * n: i += 1 xi = ((xi^2)  1)%n d = gcd(y  xi, n) if d != 1 and d != n: s.add(d) if i == k: y = xi k *= 2 return sorted(s) 
Where gcd is Greater Common Divisor, 2 * n in line 7, makes it work correctly, but it's still slow.
Anybody needs more efficient factorization (and more), I would recommend this library.
Friday, 3 February 2017
Python Inverted Index Algorithm
As a continuation of interesting algorithms, today another text processing tool: Inverted Index.
How it works, briefly. Let's say, that we have a list of documents, maybe large: thousands books, articles or so and we want effectively query this list, looking for information. It's done in the way, that the corpora is saved on disk (or we can use database) and we create a special data structure to do search and then retrieve matched documents (the database part is not the part of this article, I hope I will implement it soon in this project).
I used Python's dicts and sets, the best would be go through this:
I have 2 tests documents:
After tokenization, we may have:
Now create a, to be searched, documents list.
For really big corporas, this (doc_list) should be a function fetched and prepared document one by one to create the inverted list structure.
Tokens is a set of the all unique words in the all documents, as a keys in the inv_index dict there are tuples: token, frequency: in how many documents the word is present  this is the length of an actual value of the key. Values: as 've said, for every key word it is a list of documents which contains this word, it's implemented here as a list of integers  zero for the first document, one for the next and so on. The functions find and find_key are just helping to create the dict. Searches, in the documents, are now boolean queries on our dict values, for example:
In fact I'll do it in slightly different manner, I will parse it and use own, modified Propositional Logic Parser. Here are boolean functions:
This is the parser part:
Finally, checking how it works, query is:
Making a tree:
As seen parser does the job, and asking it:
Gives as what we expected: the second document (contains 'caesar' and 'i', but not 'julius'). Coplexity, as seen is linear, so it should be pretty quick.
Hopefully updates here soon! Code, as usually on github. Thank you!
How it works, briefly. Let's say, that we have a list of documents, maybe large: thousands books, articles or so and we want effectively query this list, looking for information. It's done in the way, that the corpora is saved on disk (or we can use database) and we create a special data structure to do search and then retrieve matched documents (the database part is not the part of this article, I hope I will implement it soon in this project).
I used Python's dicts and sets, the best would be go through this:
I have 2 tests documents:
1 2 3 4 5  d1 = ['i', 'did', 'enact', 'julius', 'caesar', 'i', 'was', 'kill' , 'i', 'the', 'capitol', 'brutus','me'] d1 = set(d1) d2 = ['i', 'so', 'let', 'it', 'be', 'with' , 'caesar', 'the', 'noble', 'brutus', 'hath', 'told', 'you', 'caesar', 'was', 'ambitious'] d2 = set(d2) 
After tokenization, we may have:
d1 = ['i', 'did', 'enact', 'julius', 'caesar', 'i', 'was', 'kill' , 'i', 'the', 'capitol', 'brutus','me'] d1 = set(d1) d2 = ['i', 'so', 'let', 'it', 'be', 'with' , 'caesar', 'the', 'noble', 'brutus', 'hath', 'told', 'you', 'caesar', 'was', 'ambitious'] d2 = set(d2)
Now create a, to be searched, documents list.
1 2 3  doc_list = [] doc_list.append(d1) doc_list.append(d2) 
For really big corporas, this (doc_list) should be a function fetched and prepared document one by one to create the inverted list structure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  tokens = d1.union(d2) inv_index = {} for x in tokens: tmp, tmp1 = find(x, doc_list) inv_index[(x, tmp1)] = tmp def find(elem, d_list): l_return = [] i = 0 for x in d_list: if elem in x: l_return.append(i) i += 1 return (l_return, len(l_return)) def find_key(x, d): for y in d.keys(): if x in y: return y return "word not in a texts" 
Tokens is a set of the all unique words in the all documents, as a keys in the inv_index dict there are tuples: token, frequency: in how many documents the word is present  this is the length of an actual value of the key. Values: as 've said, for every key word it is a list of documents which contains this word, it's implemented here as a list of integers  zero for the first document, one for the next and so on. The functions find and find_key are just helping to create the dict. Searches, in the documents, are now boolean queries on our dict values, for example:
1  ('cat' AND 'lion') AND NOT 'mouse' 
In fact I'll do it in slightly different manner, I will parse it and use own, modified Propositional Logic Parser. Here are boolean functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  def NOT(l1): indexes = l1 ret_list = [] for x in range(len(doc_list)): if x not in indexes: ret_list.append(x) return ret_list def OR(l1, l2): tmp0 = set(l1) tmp1 = set(l2) indexes = tmp0.union(tmp1) ret_list = [] for x in indexes: ret_list.append(x) return ret_list def AND(l1, l2): tmp0 = set(l1) tmp1 = set(l2) indexes = tmp0.intersection(tmp1) ret_list = [] for x in indexes: ret_list.append(x) return ret_list 
This is the parser part:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47  from porpositional_logic_eval import * import re def query_tree_evaluate(tree): opers = {'': OR, '&&': AND, '~': NOT} leftT = tree.getLeftChild() rightT = tree.getRightChild() #pdb.set_trace() if leftT and not rightT: fn = opers[tree.getRootVal()] return fn(query_tree_evaluate(leftT)) elif leftT and rightT: fn = opers[tree.getRootVal()] return fn(query_tree_evaluate(leftT), query_tree_evaluate(rightT)) else: return tree.getRootVal() def build_query_parse_tree(exp): exp_list = exp.replace('(', ' ( ').replace(')', ' ) ').replace('~', ' ~ ').split() e_tree = BinaryTree('') current_tree = e_tree for token in exp_list: if token == '(': current_tree.insertLeft('') current_tree = current_tree.getLeftChild() elif token in ['','&&', '>', '==', 'XR']: if current_tree.getRootVal() == '~': current_tree.getParent().setRootVal(token) current_tree.insertRight('') current_tree = current_tree.getRightChild() else: current_tree.setRootVal(token) current_tree.insertRight('') current_tree = current_tree.getRightChild() elif token == '~': current_tree.setRootVal('~') current_tree.insertLeft('') current_tree = current_tree.getLeftChild() elif token == ')': current_tree = current_tree.getParent() elif re.search('[azAz]', token): current_tree.setRootVal(inv_index[find_key(token, inv_index)]) current_tree = current_tree.getParent() if current_tree.getRootVal() == '~': current_tree = current_tree.getParent() else: raise ValueError return e_tree 
Finally, checking how it works, query is:
1  exp = "((i && caesar) && ~julius)" 
Making a tree:
1 2 3 4 5 6 7 8 9  tr = build_query_parse_tree(exp) inorder_traversal(tr) # output: [0, 1] && [0, 1] && [0] ~ 
As seen parser does the job, and asking it:
1 2 3  query_tree_evaluate(tr) # output: [1] 
Gives as what we expected: the second document (contains 'caesar' and 'i', but not 'julius'). Coplexity, as seen is linear, so it should be pretty quick.
Hopefully updates here soon! Code, as usually on github. Thank you!
Monday, 23 January 2017
Text Segmentation (Maximum Matching) in Python
Today another algorithm in the set Algorithms in Python, part one here  maximum matching  it's a text segmentation algorithm  separates word in a text, with laguages with no clear word separator, like Chinesse. It's simple, that's why works only for short words texts, again, an example is Chinesse. This is a code:
Let's go through it, dictionary is a simple Python list, algorithm starts at the beginning of a input string and try to match, with the dictionary, the longest word  line 8 (in that case this is a whole string and the reminder is empty), if it succeed returns first word with added space bar separator and recursive call from the remainder; if not, pointer goes one char backwards and so on in this loop. If no word matches, pointer advances one character (creates one character word), and is recursively applied to the rest of the string  lines 12  14.
Line 16 corectrly prints:
But if we add the word 'atcon' to the dictionary, algorithm incorrectly segments to:
Not very usefull to English though. That's it, code of course goes to github, together with: Extended Euclicid, modular exponeneation and the longest palindrome finder. Enjoy!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  D = ['danny', 'condo', 'a', 'the','to','has', 'been', 'unable', 'go', 'at'] def max_match(sentence, dictionary): if not sentence: return "" for i in range(len(sentence), 1, 1): first_word = sentence[:i] remainder = sentence[i:] if first_word in dictionary: return first_word + " " + max_match(remainder, dictionary) first_word = sentence[0] remainder = sentence[1:] return first_word + max_match(remainder, dictionary) print(max_match('atcondogo', D)) 
Let's go through it, dictionary is a simple Python list, algorithm starts at the beginning of a input string and try to match, with the dictionary, the longest word  line 8 (in that case this is a whole string and the reminder is empty), if it succeed returns first word with added space bar separator and recursive call from the remainder; if not, pointer goes one char backwards and so on in this loop. If no word matches, pointer advances one character (creates one character word), and is recursively applied to the rest of the string  lines 12  14.
Line 16 corectrly prints:
1  "at condo go"

1  "atcon dogo"

Not very usefull to English though. That's it, code of course goes to github, together with: Extended Euclicid, modular exponeneation and the longest palindrome finder. Enjoy!
Thursday, 19 January 2017
Algorithms in Python
I've decided to start another repo: Python Algorithms, during my programming works I've collected some amount of code which is stacked in different places as a useful functions, algorithms, even Project Euler Helper  designed to only this website. I'm going to slowly check, test that stuff (my goal is, that this is suppose to be solid and ready to work code) and put it on github. Let's bring some algorithms then!
First edit distance:
This is dynamic programming version, naive, recursive is extremely ineffective. What is does is, according to dp paradigm, first it creates an array to store temporary values (line 2, where m and n are input strings lengths accordingly) and then in a loop starts filling it bottom up, i.e. counting minimum edit distance.
If the first string is empty we have only insert all the characters of the second string (lines 7, 8, #j operations). If the second string is empty, than we insert all the first string characters (line 13, 14, time complexity i).
If the last characters are the same we ignore them and move to remaining string(lines 12, 14).
If the last characters are different, we have to rise distance by at least one plus recursively find minimum between the all possibilities: Insert, remove and replace (lines 17 to 20). Complexity of the whole thing is O(m*n).
Another is simple, but nice algorithm to shuffle elements of a given array.
Maybe it's obvious (in that case sorry for that:)), but I think the reason for that algorithm being published is, that is easy to make a mistake and write the last line like this:
Which is definitely no good:)
Thats it for now, more stuff on github, incuding clasics merge sort, quicksort and binary search; obvioususly more algorithms will come, stay in tuch!
First edit distance:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  def edit_dist_dp(str1, str2, m, n): store = [[0 for x in range(n + 1)] for x in range(m + 1)] for i in range(m + 1): for j in range(n + 1): if i == 0: store[i][j] = j elif j == 0: store[i][j] = i elif str1[i  1] == str2[j  1]: store[i][j] = store[i  1][j  1] else: store[i][j] = 1 + min(store[i][j  1], store[i  1][j], store[i  1][j  1]) return store[m][n] 
This is dynamic programming version, naive, recursive is extremely ineffective. What is does is, according to dp paradigm, first it creates an array to store temporary values (line 2, where m and n are input strings lengths accordingly) and then in a loop starts filling it bottom up, i.e. counting minimum edit distance.
If the first string is empty we have only insert all the characters of the second string (lines 7, 8, #j operations). If the second string is empty, than we insert all the first string characters (line 13, 14, time complexity i).
If the last characters are the same we ignore them and move to remaining string(lines 12, 14).
If the last characters are different, we have to rise distance by at least one plus recursively find minimum between the all possibilities: Insert, remove and replace (lines 17 to 20). Complexity of the whole thing is O(m*n).
Another is simple, but nice algorithm to shuffle elements of a given array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  # algorithm for shuffling given subscriptable data structure (Knuth) import random def swap(alist, i, j): """swaps input lists i, j elements""" alist[i], alist[j] = alist[j], alist[i] def shuffle(data): """randomly shuffles element in the input data""" n = len(data) for token in range(n  1): swap(data, token, random.randrange(token, n)) 
Maybe it's obvious (in that case sorry for that:)), but I think the reason for that algorithm being published is, that is easy to make a mistake and write the last line like this:
1  swap(data, token, random.randrange(token, n))

Which is definitely no good:)
Thats it for now, more stuff on github, incuding clasics merge sort, quicksort and binary search; obvioususly more algorithms will come, stay in tuch!
Friday, 13 January 2017
Web Page Crawler in Python
Again, here is another jupyter notebook guest post  this time it's simple recursive web crawler and html parser (using n ary trees). Code also here. Cheers!
PS Python Data Structures updated as well
PS Python Data Structures updated as well
Tuesday, 10 January 2017
Python (Numpy) Multivariable Linear Regression Update!
I've found some bugs and other issues with my previous version of Gradient Descent Algorithm, so I updated, tested(I'm convinced, that this time is OK, feel free to test it folks), and published it here. Github code changed too!
Saturday, 7 January 2017
Python (Numpy) Multivariable Linear Regression
As from New Year, I started look into the Machine Learning, and here is a jupyter notebook, code also on github.
Tuesday, 3 January 2017
Daily Notes
From New Year I've decided to create a github repo and put programming works, notes, etc daily; this maybe a cool idea for learning  possibility to easy come back, make a backup of drafts. It's here.
Subscribe to:
Posts (Atom)