Grokking (machine learning)

In machine learning, grokking, or delayed generalization, is a phenomenon observed in some settings where a model abruptly transitions from overfitting (performing well only on training data) to generalizing (performing well on both training and test data), after many training iterations with little or no improvement on the held-out data. This contrasts with what is typically observed in machine learning, where generalization occurs gradually alongside improved performance on training data. == Origin == Grokking was introduced by OpenAI researcher Alethea Power and colleagues in the January 2022 paper "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets". It is derived from the word grok coined by Robert Heinlein in his novel Stranger in a Strange Land. In ML research, "grokking" is not used as a synonym for "generalization"; rather, it names a sometimes-observed delayed‑generalization training phenomenon in which training and held‑out performance do not improve in tandem, and in which held‑out performance rises abruptly later. Authors also analyze the "grokking time", the epoch or step at which this transition occurs in those scenarios. == Interpretations == Grokking can be understood as a phase transition during the training process. In particular, recent work has shown that grokking may be due to a complexity phase transition in the model during training. While grokking has been thought of as largely a phenomenon of relatively shallow models, grokking has been observed in deep neural networks and non-neural models and is the subject of active research. One potential explanation is that the weight decay (a component of the loss function that penalizes higher values of the neural network parameters, also called regularization) slightly favors the general solution that involves lower weight values, but that is also harder to find. According to Neel Nanda, the process of learning the general solution may be gradual, even though the transition to the general solution occurs more suddenly later. Recent theories have hypothesized that grokking occurs when neural networks transition from a "lazy training" regime where the weights do not deviate far from initialization, to a "rich" regime where weights abruptly begin to move in task-relevant directions. Follow-up empirical and theoretical work has accumulated evidence in support of this perspective, and it offers a unifying view of earlier work as the transition from lazy to rich training dynamics is known to arise from properties of adaptive optimizers, weight decay, initial parameter weight norm, and more. This perspective is complementary to a unifying "pattern learning speeds" framework that links grokking and double descent; within this view, delayed generalization can arise across training time ("epoch‑wise") or across model size ("model‑wise"), and the authors report "model‑wise grokking".

Zamzar

Zamzar is an online file converter and compressor, created by brothers Mike and Chris Whyley in England in 2006. It allows users to convert files online, without downloading a software tool, and supports over 1,200 different conversion types. Since its formation, the service has converted over 510 million files for users from 245 different countries. The service supports the conversion of documents, images, audio, video, e-Books, CAD files and compressed file formats. Users can type in a URL or upload one or more files (if they are all of the same format) from their computer; Zamzar will then convert the file(s) to another user-specified format, such as an Adobe PDF file to a Microsoft Word document. Once conversion is complete, users can immediately download the file from their web browser. Users can also choose to receive an email with a link to download the converted file. In February 2021 Zamzar expanded their tool and announced a new file compression service. The compressor is visually similar to the conversion tool with a drag and drop download feature. As with the converter, users have the option to subscribe for a paid plan if they wish to compress multiple or larger files than the free service permits == File conversion API == in 2015 Zamzar launched a file conversion API, allowing users to integrate file conversion capabilities into their own websites and applications. Sample code is provided to allow users to integrate file conversion capabilities in C#, Java, Node.js, PHP, Python and cURL. Zamzar also maintains a project on GitHub which allows users to perform file conversion from the command line on Linux, MacOS or Windows systems. == Email file conversion == It is also possible to send files for conversion by emailing them to Zamzar. Zamzar launched this capability in 2012, allowing users to email files to dedicated email addresses for the file to be automatically converted to a different format. A link is then emailed back to the end user to allow them to download their converted file. == User privilege levels == Zamzar is currently free to use, but there is a limit of two conversions per hour for files up to 100MB. Users can pay a monthly subscription in order to access preferential features, such as unlimited file conversions, online file management, shorter response and queuing times and other benefits. == Name == Its name comes from Franz Kafka's The Metamorphosis. Its main character is called Gregor Samsa and it is from his surname that Zamzar is derived. The founders of the service considered three other names – Konvertieren, Khamailen and Obrogo – before settling on Zamzar.

Noam Shazeer

Noam Shazeer (born 1975 or 1976) is an American computer scientist and entrepreneur known for his contributions to the field of artificial intelligence and deep learning, particularly in the development of transformer models and natural language processing. He lives in Palo Alto, California. == Career == Noam Shazeer joined Google in 2000. One of his first major achievements was improving the spelling corrector of Google's search engine. In 2017, Shazeer was one of the lead authors of the seminal paper "Attention Is All You Need", which introduced the transformer architecture. At Google, Shazeer and his colleague Daniel de Freitas built a chatbot named Meena. Following the refusal of Google to release the chatbot to the public, Shazeer and Freitas left the company in 2021 to found Character.AI. In September 2023, Time Magazine chose Shazeer as one of the 100 most influential people in the AI world. In August 2024, it was reported that Shazeer would be returning to Google to co-lead the Gemini AI project. Shazeer was appointed as technical lead on Gemini, along with Jeff Dean and Oriol Vinyals. It was part of a $2.7 billion deal for Google to license Character's technology. Since he owns 30-40% of the company, it is estimated he netted $750 million-$1 billion. In 2026, he was elected a member of the National Academy of Engineering. == Views == Shazeer said about artificial general intelligence that he doesn't "particularly care about AGI in the sense of wanting something that can do absolutely everything a person can do”. When asked in 2023 if he is afraid that AGI will destroy the world, he said: "No. Not yet. [...] We’re going to work on it as the technology improves". When asked why do large language models work he answered: "My best guess is divine benevolence [...] Nobody really understands what’s going on. This is a very experimental science [...] It’s more like alchemy or whatever chemistry was in the Middle Ages.” Shazeer has stated, "I do not believe that humans have an attribute called gender... I do not believe that G-d puts people in the wrong bodies. I do not believe that it is okay to sterilize children." == Personal life == Shazeer is an orthodox Jew. His grandparents escaped the Holocaust into the Soviet Union and later lived some time in Israel before emigrating to the USA. His father, Dov Shazeer, was a math teacher who became an engineer and his mother was a homemaker. His sister was ordained as a rabbi by Hebrew College. Shazeer was born in Philadelphia, attended grade school at Cohen Hillel Academy in Marblehead, Massachusetts, and attended Swampscott High School in Swampscott, Massachusetts. He won a gold medal with perfect score at International Mathematical Olympiad 1994 as a member of the USA team. He went on to study math and computer science at Duke University in Durham, North Carolina from 1994 to 1998. At Duke he was a recipient of the Angier B. Duke Memorial Scholarship, and, as part of the Duke math team, won prizes in several math tournaments. He started studying in a graduate program in Berkeley but did not finish it. He is a father of three and is married to Yael Shacham Shazeer

Framework Convention on Artificial Intelligence

The Framework Convention on Artificial Intelligence and Human Rights, Democracy and the Rule of Law (also called Framework Convention on Artificial Intelligence or AI convention) is an international treaty on artificial intelligence. It was adopted under the auspices of the Council of Europe (CoE) and signed on 5 September 2024. The treaty aims to ensure that the development and use of AI technologies align with fundamental human rights, democratic values, and the rule of law, addressing risks such as misinformation, algorithmic discrimination, and threats to public institutions. More than 50 countries, including the EU member states, have endorsed the Framework Convention on Artificial Intelligence. == Background == The development of the Framework Convention on AI emerged in response to growing concerns over the ethical, legal, and societal impacts of artificial intelligence. The Council of Europe, which has historically played a key role in setting human rights standards across Europe, initiated discussions on AI governance in 2020, leading to the drafting of a binding legal framework. The process of creating the Framework Convention began in 2019 with the ad hoc Committee on Artificial Intelligence (CAHAI) assessing the feasibility of the instrument. In 2022, the Committee on Artificial Intelligence (CAI) took over the process, drafting and negotiating the text of the Convention. The treaty is designed to complement existing international human rights instruments, including the European Convention on Human Rights and the Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data. == Structure and content == The Convention establishes fundamental principles for AI governance, including transparency, accountability, non-discrimination, and human rights protection through eight chapters and 26 articles. Adopted in 2024, this landmark treaty addresses AI governance through seven core principles and detailed implementation mechanisms. It mandates risk and impact assessments to mitigate potential harms and provides safeguards such as the right to challenge AI-driven decisions. It applies to public authorities and private entities acting on their behalf but excludes national security and defense activities. Implementation is overseen by a Conference of the Parties, ensuring compliance and international cooperation. Activities within the AI system lifecycle must adhere to seven fundamental principles, ensuring compliance with human rights, democracy, and the rule of law. The treaty also establishes remedies, procedural rights and safeguards, and risk and impact management requirements to promote accountability, transparency, and responsible AI development. The treaty consists of five chapters. Chapter I contains general provisions. Chapter II states the general obligation to protect human rights and the integrity of democratic processes and respect of the rule of law. The main principles and rights are contained in Chapter III, which consists of Articles 6 to 13. Chapter IV (Articles 14 to 15) sets up the legal remedies. Chapter V states the risk and impact management framework. Chapter VI facilitates the implementation criteria of the treaty. Chapter VII sets the co-operation and oversight mechanisms. Chapter VIII contains various concluding clauses. Article 1 declares the objectives of the treaty, to ensure that activities within the lifecycle of artificial intelligence systems are fully consistent with human rights, democracy and the rule of law. == Entry into force == The treaty will enter into force on the first day of the month following the expiration of a period of three months after the date on which five ratification made by five countries, including three member states of the Council of Europe. == Competing approaches == While the CoE's AI Convention represents a multilateral effort to regulate AI through a human rights-based approach, alternative frameworks have also been proposed. One notable example is the Munich Draft for a Convention on AI, Data and Human Rights, an initiative led by legal scholars and policymakers in Germany. The Munich Draft advocates for stronger safeguards against AI-related risks, emphasizing stricter data protection measures, accountability for AI developers, and explicit prohibitions on high-risk AI applications, such as mass surveillance and autonomous lethal weapons. Unlike the CoE convention, which focuses on balancing innovation with regulation, the Munich Draft takes a more precautionary stance, calling for tighter controls over AI deployment in sensitive domains. Other competing international efforts include the OECD’s AI Principles, the GPAI (Global Partnership on AI), and the European Union's AI Act, each of which offers different regulatory strategies to govern AI at regional and global levels. == Signatories == Signatories include Andorra, Canada, the European Union, Georgia, Iceland, Israel, Japan, Liechtenstein, the Republic of Moldova, Montenegro, Norway, San Marino, Switzerland, Ukraine, the United Kingdom, the United States, and Uruguay. == Endorsement == The treaty was widely endorsed by leading AI policy experts, including Stuart J. Russell, Virginia Dignum, Emma Ruttkamp-Bloem, Pascal Pichonnaz, Maria Helen Murphy, Angella Ndaka, Hannes Werthner, Katja Langenbucher, Gry Hasselbalch, Ricardo Baeza-Yates, Kutoma Wakunuma, Gianclaudio Malgieri, Oreste Pollicino, Nagla Rizk, Giovanni Sartor, Lee Tiedrich, Ingrid Schneider, Eduardo Bertoni, Garry Kasparov, Merve Hikcok, and Marc Rotenberg. The treaty was also endorsed by notable political leaders, including Theodoros Roussopoulos, President of the Parliamentart Assembly in the Council of Europe, and Christopher Holmes, Member of the House of Lords of the United Kingdom, and by the International Bar Association (IBA), and personally by Almudena Arpón de Mendívil, President of the IBA. The Center for AI and Digital Policy (CAIDP) has been carrying out a campaign to promote endorsement of the treaty by urging various countries to sign and ratify the treaty. The CAIDP further urged the countries to make a clear and firm commitment to ensure the full inclusion of the private sector under the treaty’s provisions.

REEM

REEM is a prototype humanoid robot built by PAL Robotics in Spain. It is a 1.70 m high humanoid robot with 22 degrees of freedom, with a mobile base with wheels, allowing it to move at 4 km/hour. The upper part of the robot consists of a torso with a touch screen, two motorized arms, which give it a high degree of expression, and a head, which is also motorized. REEM-A and REEM-B are the first and second prototypes of humanoid robots created by PAL Robotics. REEM-B can recognize, grasp and lift objects and walk by itself, avoiding obstacles through simultaneous localization and mapping. The robot accepts voice commands and can recognize faces. == Specifications ==

ISSCO Graphics

Integrated Software Systems Corporation (ISSCO), doing business as ISSCO Graphics, was an American software developer and publisher based in San Diego, California, and active from 1970 to 1986. They were best known for their enterprise graphics software packages, including Tellagraf, CueChart and Disspla. == History == ISSCO Graphics had considered acquiring Breakthrough Software, whose software focus involved PC DOS, as a means of getting into the PC arena, but backed off when Computer Associates made an offer to acquire ISSCO. By early 1987 it was reported that "Issco users breathe sigh of relief" that all was well. The ISSCO User's Group was founded in 1976. ISSCO, which was founded in 1970 by Peter Preuss, was acquired by Computer Associates in 1986. == Notable products == === Tellagraf === ISSCO's Tellagraf is an early software package designed to allow end-users to "turn out full color, professional quality charts" with initial results displayed on a screen, modified as needed, and then "a final 'hard-copy' can be made .. or made into 35mm color transparencies for projection onto a screen." Users of Tellagraf often had access to CueChart and Disspla software. Often computer sites having one had all three. Terminals with varying degrees of graphics, such as the DEC's VT100 and Tektronix's Tektronix 4xxx family of text and graphics terminals. were supported, and the software ran on popular computing platforms. Four years are important to Tellagraf's early history: 1978: ease of use 1980: graphic-artist quality 1982: introduction of CueChart, and recognition by IEEE. 1983: "quality graphics enters the mainstream of data processing with ..." Tellegraf was eventually acquired by Computer Associates and renamed CA-Tellegraf. SAS users found it helpful. Universities, research institutes and financial services firms were among early users. === Disspla === Disspla is a package of data plotting subroutines that can be used from high level languages. It was also acquired by Computer Associates. === Tellaplan === In 1983 ISSCO introduced Tellaplan, "a project planning, report and schedule charting system for Tell-A- Graf users in IBM MVS or CMS or Digital Equipment Corp. VAX computers" atop which they built "two visual project management software packages" three years later.

IPO underpricing algorithm

IPO underpricing is the increase in stock value from the initial offering price to the first-day closing price. Many believe that underpriced IPOs leave money on the table for corporations, but some believe that underpricing is inevitable. Investors state that underpricing signals high interest to the market which increases the demand. On the other hand, overpriced stocks will drop long-term as the price stabilizes so underpricing may keep the issuers safe from investor litigation. == IPO underpricing algorithms == Underwriters and investors and corporations going for an initial public offering (IPO), issuers, are interested in their market value. There is always tension that results since the underwriters want to keep the price low while the companies want a high IPO price. Underpricing may also be caused by investor over-reaction causing spikes on the initial days of trading. The IPO pricing process is similar to pricing new and unique products where there is sparse data on market demand, product acceptance, or competitive response. Thus it is difficult to determine a clear price which is compounded by the different goals issuers and investors have. The problem with developing algorithms to determine underpricing is dealing with noisy, complex, and unordered data sets. Additionally, people, environment, and various environmental conditions introduce irregularities in the data. To resolve these issues, researchers have found various techniques from artificial intelligence that normalizes the data. == Evolutionary models == Evolutionary programming is often paired with other algorithms e.g. artificial neural networks to improve the robustness, reliability, and adaptability. Evolutionary models reduce error rates by allowing the numerical values to change within the fixed structure of the program. Designers provide their algorithms the variables, they then provide training data to help the program generate rules defined in the input space that make a prediction in the output variable space. In this approach, the solution is made an individual and the population is made of alternatives. However, the outliers cause the individuals to act unexpectedly as they try to create rules to explain the whole set. === Rule-based system === For example, Quintana first abstracts a model with 7 major variables. The rules evolved from the Evolutionary Computation system developed at Michigan and Pittsburgh: Underwriter prestige – Is the underwriter prestigious in role of lead manager? 1 for true, 0 otherwise. Price range width – The width of the non-binding reference price range offered to potential customers during the roadshow. This width can be interpreted as a sign of uncertainty regarding the real value of the company and a therefore, as a factor that could influence the initial return. Price adjustment – The difference between the final offer price and the price range width. It can be viewed as uncertainty if the adjustment is outside the previous price range. Offering price – The final offer price of the IPO Retained stock – Ratio of number of shares sold at the IPO divided by post-offering number of shares minus the number of shares sold at the IPO. Offering size – Logarithm of the offering size in millions of dollars excluding the over-allotment option Technology – Is this a technology company? 1 for true, 0 otherwise. Quintana uses these factors as signals that investors focus on. The algorithm his team explains shows how a prediction with a high-degree of confidence is possible with just a subset of the data. === Two-layered evolutionary forecasting === Luque approaches the problem with outliers by performing linear regressions over the set of data points (input, output). The algorithm deals with the data by allocating regions for noisy data. The scheme has the advantage of isolating noisy patterns which reduces the effect outliers have on the rule-generation system. The algorithm can come back later to understand if the isolated data sets influence the general data. Finally, the worst results from the algorithm outperformed all other algorithms' predictive abilities. == Agent-based modelling == Currently, many of the algorithms assume homogeneous and rational behavior among investors. However, there's an approach alternative to financial modeling, and it's called agent-based modelling (ABM). ABM uses different autonomous agents whose behavior evolves endogenously which lead to complicated system dynamics that are sometimes impossible to predict from the properties of individual agents. ABM is starting to be applied to computational finance. Though, for ABM to be more accurate, better models for rule-generation need to be developed.